Skip to main content
留学咨询

辅导案例-POLI 175

By May 15, 2020No Comments

POLI 175 Problem Set 1 Due 7:59AM Thursday April 16, 2020 Please turn your homework in by emailing your html and code files to Bertrand ([email protected]) before the due time. Your homework will be graded based on completeness, accuracy, and readability of code. The point allocation in this problem set is given by: Q1.1 Q1.2 Q1.3 Q1.4 Q1.5 Q1.6 Q1.7 Q1.8 Q1.9 5 5 10 5 5 5 15 5 5 Q2.1 Q2.2 Q2.3 Q2.4 Q2.5 Q2.6 Total Bonus 5 5 5 10 5 10 100 10 This assignment will analyze vote returns for California House elections and vote choice in a presidential election. Q1: 2006 California Congressional Election Re- sults Our goal in this exercise is to predict the proportion of votes that a Demo- cratic candidate for a House seat wins in a “swing district”: one where the support for Democratic and Republican candidates is about equal and the incumbent is a Democrat. 1) Load the data set ca2006.csv, a slightly modified version of the 2006 House election return data from the PSCL library – The data set contains the following variables: district: California Congressional district 1 prop d: proportion of votes for the Democratic candidate dem pres 2004: proportion of two-party presidential vote for Democratic candidate in 2004 in Congressional district dem pres 2000: proportion of two-party presidential vote for Democratic candidate in 2000 in Congressional district dem inc: An indicator equal to 1 if the Democrat is the in- cumbent contested: An indicator equal to 1 if the election is contested 2) Create a plot of the proportion of votes for the Democratic candidate (prop d), against the proportion of the two-party vote for the Demo- cratic presidential candidate in 2004 (dem pres 2004) in the district. Be sure to clearly label the axes and provide an informative title for the plot 3) Regress the proportion of votes for the Democratic candidate, against the proportion of the two-party vote for the Democratic presidential candidate in 2004 in the district. Print the results and add the bivariate regression line to the plot. 4) Using the bivariate regression and a function you have written yourself (not the predict() function!), report the predicted vote share for the Democratic candidate if dem pres 2004 = 0.5 5) Now, regress prop d against: dem pres 2004, dem pres 2000, and dem inc. 6) Using the multivariate regression from 5) and a function you have writ- ten yourself, report the predicted vote share for the Democratic candi- date if: dem pres 2004 = 0.5 dem pres 2000 = 0.5 dem inc = 1 7) We are often interested in characterizing the uncertainty in our es- timates. Throughout this class we will often use the bootstrap to provide uncertainty for the estimates. Here, we will walk through the steps to implement the bootstrap to characterize the uncertainty for 2 our response variable predictions. Do the following 10000 times (in a for loop): a) Using sample, randomly select 53 rows, the number of districts in California in 2006, with replacement. b) Using the randomly selected (“bootstrapped”) data set, fit the bivariate and multivariate regressions specified earlier. c) Using the fitted regressions, predict the expected vote share for the Democratic candidate for each regression, using the values and functions from 4) and 6). d) Store the predictions from both regressions. 8) Report 95% Confidence Intervals for both predictions. In addition, create histograms for both predictions. 9) We will say the model predicts that the Democrat wins if the predicted vote share is greater than 50%. Based on the results of the bootstrap, what proportion of time does each model predict the Democrat will win? Q2: Predicting Support for Bill Clinton in 1992 This problem will use a data set (again, modified from the PSCL package) to predict whether a voter will vote for Bill Clinton. The data comes from self-reported voting behavior in the 1992 Presidential election 1) Load the data set vote92.csv. It contains clintonvote: an indicator equal to 1 if the voter supports Clinton and 0 otherwise dem: an indicator equal to 1 if the voter is a Democrat female: an indicator equal to 1 if the voter is a woman clintondist: a measure of the candidate’s self assessed ideologi- cal distance from Clinton 2) What proportion of respondents report voting for Bill Clinton? 3 3) Using a logistic regression, regress clintonvote on dem, female, and clintondist 4) Write a function to predict the probability that a voter supports Clinton based on a logistic regression. 5) Using your function from 4) report the probability a female, Democrat, with clintondist = 1 votes for Clinton. 6) Now use a linear regression to predict clintonvote as a function of dem, female, and clintondist. For all voters (rows) in the data, use the fitted linear regression to compute their predicted probabilities of voting for Clinton. Do the same for the logistic regression. Plot the predicted probabilities from the logistic regression (on the x-axis) against those from the linear regression (on the y-axis). Bonus) For this question, we’re going to use the predicted probabilities for all voters from logistic regression, and we’re going to visualize how well they perform using a calibration plot. We will construct the calibration plot “from scratch” (i.e. without using any specialized libraries). To do this, we will construct 10 bins of data, where each bin corresponds to an interval of width 0.1, starting with the bin [0.0, 0.1). This first bin corresponds to all data points with a predicted probability greater than or equal to 0 AND less than 0.1. The next bin is [0.1, 0.2), and so on. For each bin, compute (a) the mean predicted probability in that bin and (b) the actual proportion of positives (proportion of data points whose true response variable value is 1) in that bin. For each bin, plot (a) on the x-axis and (b) on the y-axis, creating a plot with 10 points on it. Connect the points with a line. In addition, add a dashed “identity line” (the y = x line) to the plot. The closeness with which the plotted points trace along the identity line is a rough visualization of how well the predicted probabilities are “calibrated.” 4

admin

Author admin

More posts by admin