辅导案例-ECMT6002

ECMT6002: Econometric Applications School of Economics, University of Sydney, Semester 2, 2019 Practice Questions (for Final Exam) – Answer key INSTRUCTIONS – The final exam is comprehensive and accounts for 50% of the final mark of ECMT6002. – This is a closed-book exam. Non-programmable calculators are permitted but smart phones should be turned off. – Answer five (and five only) out of six questions clearly and concisely in the provided answer booklet. – Write down you SID on the exam paper and on the answer booklet. – Return the exam paper with the answer booklet when you complete the exam. Note: These sample questions are not meant to be exhaustive, covering every aspects of the unit; rather they illustrate the format and type of questions to expect in the exams. 1 1. Let save denote the fraction of annual family income devoted to savings. We are interested in determining whether the rich save a larger fraction of their income, other things equal. A simple model is ln(save) = β0 + β1age+ β2children+ β3earnings+ u (1.1) where age denotes the age of the family head, children is the number of dependent children in the household, and earning is the annual earnings of the family head (in $100,000). Based on a sample, the following OLS estimates are as follows: ln(save) = 0.039 (0.004) + 0.038 (0.017) age− 0.125 (0.051) children+ 0.025 (0.001) earnings (1.2) n = 366, R2 = 0.212 (a) Based on the model (1.1), what is the interpretation of β1? The coefficient β1 measures the change in expected log of annual family income devoted to savings due to an increase of the age of the family head, holding other factors (children, earning) constant. Equivalently, 100β1 is the approximate percentage change in saving due to an increase of the age of the family head, holding other factors (children, earning) constant. (b) Based on the estimates (1.2), what is the exact percentage effect of an extra child on the expected level of save? The exact percentage change due to an increase in children by 1 is given by 100(eβ24x − 1)% = 100(e−.125 − 1)% = 100(0.88− 1)% = −12% that is, the exact percentage effect of an extra child on the expected level of save is −12%. (c) Based on the estimates (1.2), test whether β2 is statistically significant at the 5% level, against a two-sided alternative? To test the hypotheses H0 : β2 = 0 H1 : β2 6= 0 we use the t test statistic t = βˆ2 sd(βˆ2) ∼ tn−k−1 Decision Rule: Reject H0 in favor of H1 if |t| > c, where c = tn−k−1,α is the critical value for the t distribution with df = n − k − 1 at a 5% significance level. In this case, df = n− k − 1 = 366=3=1 = 362 and c = 1.96. Decision: Since t = |=0.125/0.051| = 2.45 > 1.96, we reject the null H0 : β2 = 0 in favor of the alternative H1 : β2 6= 0 at the 5% significance level. Conclusion: The number of dependent children in the household, holding other factors constant, has a significant effect on expected saving. (d) Test whether model (1.2) has any explanatory power. Perform a test of the overall signif- icance of the regression using a 1% level of significance. Note the F-test statistic is given by F = ( R2ur −R2r ) /q (1−R2ur)/(n− k − 1) ∼ Fq,n−k−1 2 To test the hypotheses H0 :β1 = 0, · · · β3 = 0 H1 :H0 is false We use F test statistic F = R2UR/k (1−R2UR)/(n− k − 1) ∼ Fk,n−k−1 Decision Rule: Reject H0 in favor of H1 if F > c, where c = Fk,n−k−1,α is the critical value for the F distribution with df (k, n− k− 1) at α = 1% significance level. In this case, k = 3, (n− k − 1) = 362 and c = 3.78. Decision: Since F = 0.212/3(1−0.212)/362 = 32.46 > c, we reject the null H0 in favor of H1 at the 1% significance level. Conclusion: Therefore model (1.2) has significant explanatory power. (e) We are concerned the model contains heteroskedasticity. Outline the steps required to estimate the model by generalized least squares (GLS). The FGLS procedure to correct for heteroskedasticity i. Estimate the model (1.1) by OLS and generate the squared residuals uˆ2; ii. Create log(uˆ2) by squaring the residuals and then taking their natural log; iii. Regress log(uˆ2) on age, children, earning; and obtain the fitted values εˆ; iv. Exponentiate the fitted values to form hˆ = exp(εˆ); v. Using 1/ √ hˆ as the weight to estimate the transformed model. 2. The following time series regression model relates the growth in real per capita consumption (gcon) to the growth in real per capita income (ginc), the real interest rate (rint) and the expected inflation rate (infle ): gcont = β0 + δ0ginct + δ1ginct−1 + β1rintt + β2inflet + ut (2.1) (a) What is the interpretation of δ0 and δ1? δ0 measures the expected change in the growth in real per capita consumption in period t due to one unit increase of the growth in real per capita income in the same period, holding other factors constant. δ1 measures the expected change in the growth in real per capita consumption in period t due to one unit increase of the growth in real per capita income in the previous period (t− 1), holding other factors constant. (b) Briefly explain the reason for including the lagged term ginct−1. The growth in real per capita income in the previous period (t− 1) may affect the growth in real per capita consumption in this period (t). This is likely to happen as consumers try to smooth consumption over time. (c) What conditions must the time process gcont satisfy for it to be weakly dependent? What are the consequences of estimating a time series model using OLS with data that are not weakly dependent? Weak dependence restricts the strength of relationship between elements of the time series process, gcont and gcont+h as the distance between them, h, gets large. Weak dependence requires that gcont and gcont+h are “almost independent” as h → ∞. Applying OLS to gcont when it is not weakly dependent can lead to “spurious regression” results. 3 (d) Why the time trend may not be necessary in (2.1)? Explain briefly. In (2.1), the dependent variable gcont and the key regressors ginct and ginct−1 are the growth from one period to the next period (that is, first difference). (e) If the variables in (2.1) are measured quarterly, how the model (2.1) can be improved? If the variables in (2.1) are measured quarterly, we need to control for seasonality. A simple way is to add a set of dummies for the first three quarters into (2.1): gcont =β0 + δ0ginct + δ1ginct−1 + β1rintt + β2inflet + δ1Q1t + δ2Q2t + δ3Q3t + ut where Qjt = 1 if t is in the j-th quarter. 3. We are interested in analyzing the effect of the government building a new hospital on housing prices in the suburb of Sydenham. Rumors that a new hospital would be built in Sydenham began after 2006, and the hospital was built and began operating in 2008. We have data on the prices of houses sold in Sydenham in 2006 and another sample on houses that sold in Sydenham in 2010. The hypothesis we wish to test is that the price of houses located near the site of new hospital would rise above the price of more distant houses. The data for each year includes the dummy variable near which is equal to one if the house is located within 2 kilometers of the new hospital. House prices, for both years of data, were measured in 2010 prices. The variable rprice denotes the real house price (scaled by $100,000). The following simple regression model was estimated using only the year 2010 sample of data ̂rprice = 10.131 (0.309) + 2.688 (0.788) near (3.1) n = 96, R2 = 0.199 while the following was estimated using only the 2006 sample of data ̂rprice = 9.252 (0.265) + 1.412 (0.671) near (3.2) n = 105, R2 = 0.106 (a) Explain one by one the interpretation of the estimates in model (3.2)? The estimate 9.252 ($925,200) is the average selling price of houses in 2006 (without any location effect); the coefficient estimate 1.412 is the location effect, that is, being near the particular location (for the new hospital) is expected to increase the price by $141,200, holding other factors constant. Also see lecture notes 8. (b) Based
on the estimates in (3.1) and (3.2), from 2006 to 2010, what is the average price change for all houses in Sydenham? Note that the average selling price for houses is 9.252 in 2008 and 10.131 in 2010. The average price change for all houses in Sydenham is then 10.131 − 9.252 = 0.879, that is $87,900. (c) Explain why we cannot infer from the estimates in (3.1) that the location of the hospital caused the price of houses located nearby to increase? What evidence from model (3.2) supports this conclusion? Before the rumor, for the houses nearby the location of the hospital to be built, on average, 4 the price was already higher than other houses, which is evident from (3.2). Therefore, we cannot infer from the estimates in (3.1) that the location of the hospital caused the price of houses located nearby to increase. (d) Using the information from models (3.1) and (3.2), calculate the difference-in-differences estimate of the impact of the new hospital on the price of nearby houses? The difference-in-differences estimate of the impact of the new hospital on the price of nearby houses is 2.688− 1.412 = 1.276, that is $126,600. (e) Propose a linear regression model that can directly estimate the effect of new hospital on housing price. We define a new dummy for timing (of treatment): d2010it = 1 for t = 2010 and d2010it = 0 for t = 2006, the linear regression model can be written as rprice = β0 + β1nearit + β2d2010it + β3nearit · d2010it + uit 4. An influential study tested for evidence of racial discrimination in the market for mortgages (home loans). The dependent variable in the study was approve (an indicator variable =1 if the loan is approved) and the explanatory variables considered were white (=1 if the applicant is white), obrat (other financial obligations as a % of income), loanprc (the amount of loan / price of the property), unem (the unemployment rate in applicants industry of employment), male (=1 if the applicant is male), and cosign (=1 if there is a cosigner on the loan). The study estimated the equation Pr(approve = 1) = G(β0 + β1white+ β2obrat+ · · ·+ β6cosign) (4.1) using the Probit and Logit models. The table below presents coefficient estimates for several specification. Table 4.1. Estimates of Binary Choice Models Dependent Variable: employi Probit (1) Probit (2) Logit white .714 (.120) .660 (.123) 1.178 (.217) obrat −.082 (.005) −.038 (.011) loanprc −1.345 (.315) −2.414 (.583) unem −.046 (.023) −.083 (.040) male −.019 (.009) −.024 (.010) cosign .081 (.355) .156 (.659) constant .566 (.105) 2.536 (.362) 4.570 (.679) Observations (n) 1000 1000 1000 Log-Likelihood (LLF) -369.29 -348.46 -348.29 (a) The Probit and Logit models were estimated using the Maximum Likelihood Estimator (MLE). What is the basic idea of Maximum Likelihood Estimation for obtaining coefficient estimates? The MLE for β is to find the value of β such that the log-likelihood function is maximized where the log-likelihood is derived from (4.1). 5 (b) Based on the estimates in Table 4.1, test the null hypothesis that the set of explanatory variables obrat, loanprc, unem, male and cosign are jointly insignificant in the Probit model after controlling for white. Use the Likelihood Ratio Test and a 1% significance level. The Probit (1) contains the estimation results for the restricted model, and Probit (2) contains the unrestricted estimates. The null and alternative hypotheses respectively are H0 : β2 = · · · = β6 = 0 and H1 : H0 is false. We use the LR Test Statistic: LRT = −2(lnLR − lnLUR) ∼ χ2q = −2 ((−369.29)− (−348.46)) = 41.66 Rejection Rule: Reject H0 in favour of H1 if LRT > c, where c is the critical value for the χ2 distribution with q = 5 and a 1% significance level. Now LRT = 41.66 and c = 15.09. Decision: Since LRT > c, we reject the null at the 1% significance level. Conclusion: The set of explanatory variables obrat, loanprc, unem, male and cosign are jointly insignificant in the Probit model, after controlling for white. (c) What is the interpretation of the coefficient β1 in Probit (2)? Is the partial effect of white on the probability of loan approval comparable across the Probit (2) and Logit, respectively? Explain. We know that β1 itself cannot be interpreted as the marginal effect since the marginal effect is given by ∂p(x) ∂xj = f ( x′β )× βj where f (x′β) is the density of standard normal (for Probit) or the density of logistic (for Logit). The partial effect of white on the probability of loan approval across the Probit (2) and Logit can be compared using the rule of thumb βˆLPM ≈ 0.4βˆProbit βˆLPM ≈ 0.25βˆLogit βˆLogit ≈ 1.6βˆProbit (d) Based on the estimates in Table 4.1, would you conclude that there is strong evidence of racial discrimination in mortgage approvals? Explain briefly. Although β1 cannot be interpreted as the marginal effect but they have the same positive sign. Furthermore, β1 is significant in Probit (1), Probit (2) or Logit (as suggested by the corresponding t values). Therefore, race has a statistically significant impact on the likelihood of mortgage approvals. This is a strong evidence of racial discrimination in mortgage approvals. 5. A recent survey of retirees asked individuals whether their overall happiness had changed follow- ing retirement from the labour force. We are interested understanding what are the important factors contributing to changes in well-being with retirement. The survey included measures of individual’s Income (in $100), Wealth (in $10000), PoorHealth (which is an indicator of whether health has declined since retirement) and Married (an indicator of marital status). Let the change in overall happiness following retirement, Hi be a function of individual characteristics and an idiosyncratic error term ui: H∗i = β1Marriedi + β2Incomei + β3Wealthi + β4PoorHealthi + ui. Although actual H∗i is not observed, individuals report Hi which indicates that H ∗ i falls into one 6 of 3 ordered categories {worse off, the same, better off}: Hi = 1 if H ∗ i ≤ c1 Hi = 2 if c1 < H ∗ i ≤ c2 Hi = 3 if c2 < H ∗ i Using the Ordered Logit model for Hi, the estimates are presented below: Table 5.1. Ordered Logit Estimates. Dependent Variable: Pr(Hi = j), j = 1, 2, 3 β̂ (se) Married .306 (.080) Income .009 (.010) Wealth .118 (.103) PoorHealth −.314 (.078) c1 −1.213 (.198) c1 .853 (.115) Observations (n) 1344 R˜2 0.078 Log-Likelihood (LLF) -1031.09 (a) What is the interpretation of β̂1, the coefficient on the Married indicator variable? What do we learn from the estimated value of 0.306 about the effects of Married on happiness in retirement? The coefficient in an Ordered Logit model cannot be directly interpreted by itself. For the ordered Logit model with 3 categories, the marginal effects are given by ∂ Pr(Hi = 1) ∂x1 = −f (c1 − x′iβ)β1 ∂ Pr(Hi = 2) ∂x1 = [ f ( c1 − x′iβ )− f (c2 − x′iβ)]β1 ∂ Pr(Hi = 3) ∂x1 = f ( c2 − x′iβ ) β1 where f is the density of logistic distribution. From the estimated value of 0.306 about the effects of Married on happiness in retirement, we know β̂1 has the same sign as the marginal effect of Married on the top category better off (Hi = 3). This suggests that being married has a positive impact on the likelihood of being better off after retirement. The magnitude of the marginal effect is proportional to β̂1 = 0.306. (b) Based on the set of estimates in Table 5.1, what can we conclude about the relative impor- tance of the different explanatory variables (Married; Income; Wealth and PoorHealth) in determining post-retirement well-being? Explain. As shown above, the magnitude of the marginal effect is proportional to the corresponding coefficient. So the relative importance of the different explanatory variables depends on their coefficient estimates. Based on the set of estimates in Table 5.1, in terms of magni- tude, PoorHealth has the largest effect, followed by Married, and then by Wealth, Income has the smallest effect. 7 (c) Using Stata to compute marginal effects, I found that the marginal effect of PoorHealth on the probability of being better-off in retirement was -0.105 (with a standard error of 0.026). Construct the 95% confidence interval for marginal effect of PoorHealth on the probability of being better-off in retirement. Is zero in the confidence interval? Hint: M̂E4/se(M̂E4) follows a tn−k−2 distribution (where -2 is for estimating the cutoff points). The 95% confidence interval [ME4,ME4] for β4 is constructed by ME4 = M̂E4 − se(M̂E4)tn−k−1,α/2 = −1.105− 0.051 = −1.156 ME4 = M̂E4 + se(M̂E4)tn−k−1,α/2 = −1.105 + 0.051 = −1.054 where M̂E4 = −0.105, se(βˆ2) = 0.026, and tn−k−1,α/2 = 1.96. So the 95% confidence interval for ME4 is [−1.156,−1.054] and zero is not in the confidence interval. (d) An alternative to the Ordered Logit model is the Multinomial Logit (MNL) model specifica- tion. What are the advantages of the MNL specification? What, if any, are the limitations of the MNL estimator for analyzing the determinants of changes in happiness with retire- ment, H∗? The main advantage of the MNL is that the MNL allows explanatory variables to have different effects on different categories. The limitation of the MNL is that the MNL will be less efficient than the Ordered Logit if the categories are indeed ordered. 6. Suppose that annual earnings (earnings) and alcohol consumption (alcohol) are determined by the simultaneous equations models (SEM) below: log(earnings) = β0 + β1alcohol + β2educ+ u1 (6.1) alcohol = γ0 + γ1log(earnings) + γ2educ+ γ3log(price) + u2 (6.2) where educ is the year of schooling, and price is a local price index for alcohol (including state and local taxes). Assume that educ and price are determined outside of the SEM, and that β1, β2, γ1, γ2, and γ3 are all different from zero. Both equations (6.1) and (6.2) are treated as structural equations. (a) Explain briefly why the regressors alcohol in (6.1) and earnings in (6.2) are both endoge- nous. To see alcohol in equation (6.1) is endogenous (that is, correlated to u1), plug (6.1) into equation (6.2): alcohol = γ0 + γ1 (β0 + β1alcohol + β2educ+ u1) + γ2educ+ γ3log(price) + u2 ⇒ (1− γ1β1)alcohol = (γ0 + γ1β0) + (γ1β2 + γ2)educ+ γ3log(price) + (γ1u1 + u2), (6.3) and we see that alcohol is correlated to u1 with coefficient γ1/(1− γ1β1) given that γ1 6= 0. Similarly, plug (6.2) into equation (6.1) and it is clear that earnings in equation (6.2) is correlated to u2 and therefore endogenous. (b) Is equation (6.1) identified? How would you estimate the equation if it is identified? Since alcohol in equation (6.1) is endogenous, at least one instrument is necessary to identify (6.1). One candidate is log(price) from equation (6.2). It is exogenous (by assumption), relevant to earnings (as can be seen from the reduced form for log(earnings) in (c)), and excluded from (6.1). Therefore, equation (6.1) is identified. Usually we use the 2SLS to estimate (6.1). Briefly state the estimation procedure of the 2SLS (see lecture notes). 8 (c) Find the reduced form equation for log(earnings) and explain briefly why the OLS applied to this equation is consistent. To find the reduced form equation for log(earnings), plug (6.2) into equation (6.1) log(earnings) = β0 + β1alcohol + β2educ+ u1 = β0 + β1(γ0 + γ1log(earnings) + γ2educ+ γ3log(price) + u2) + β2educ+ u1 ⇒ (1− β1γ1)log(earnings) = (β0 + β1γ0) + (β1γ2 + β2) educ+ β1γ3log(price) + (u1 + β1u2) ⇒ log(earnings) = (β0 + β1γ0) (1− β1γ1) + (β1γ2 + β2) (1− β1γ1) educ+ β1γ3 (1− β1γ1) log(price) + (u1 + β1u2) (1− β1γ1) given that (1 − β1γ1) 6= 0. The OLS applied to this equation is consistent because all the regressors (educ, log(price)) are exogenous (by assumption). (d) How is the interpretation of β1 in (6.1) on educ different from that of the coefficient on educ in the reduced form equation for log(earnings) in (c)? Explain briefly. Since equation (6.1) is treated as a structural equation, we may interpret (6.1) in a causal, ceteris paribus fashion. In contrast, the reduced form equation (6.3) does not have a causal interpretation. (e) Is equation (6.2) identified? How would you estimate the equation if it is identified? Similarly, log(earnings) in equation (6.2) is endogenous. Different from the case with (6.1), there is no instrument for log(earnings). Therefore (6.2) is not identified (under identified). To estimate the under identified equation (6.2), we need to find additional variables which can be used as the instrument for log(earnings). End of Exam Paper 9 10 11 Table for Critical Values of the Chi-Squared Distribution 12

辅导案例-ECMT6002

Related

Previous Post辅导案例-COMP 352-Assignment 2

Next Post辅导案例-INFS1609-Assignment 2

Author admin