- May 15, 2020

Homework 2 STAT 425 – Yu Due: Sept 26, 2019 11:59:00pm Question 1 – Teen gambling (11 points) The dataset teengamb (from the faraway library) concerns a study of teenage gambling in Britain. Fit a regression model with the expenditure on gambling as the response and sex, status, income and verbal scores as predictors. a) What percentage of variation in the response is explained by these predictors? (1 pt) b) Give the case (observation) number that corresponds to the highest positive residual, and the one corresponds to the lowest negative residual. What are the mean and median of the residuals? (1 pt) c) When all other predictors are held constant, what would be the difference in the predicted expenditure on gambling for a male compared to a female? (1 pt) d) Predict the amount that a male with average status, income and verbal score would gamble along with a 95% prediction interval. (1 pt) e) Generate 95% prediction bands. (1 pt) f) Fit a model with just the variables that are significant at the 0.05 significance level. What percentage of variation in the response is explained by this new model? Use an F-test to formally compare it to the full model. (2 points) g) Fit a simple linear regression model with the expenditure on gambling as the response and one of sex, status, income and verbal scores as predictors. Which predictor gives you the highest R2? Compare the selected model with the full model (i.e., the model with all four predictors) via an F-test. What is your (statistical) conclusion? (2 points) h) From the model you chose in part (g), record R2. Then fit 3 more models by adding predictors back in – 1 at a time. (i.e. you should have a model with 2 predictors and 3 predictors, and the full model). Record R2 for all of these models. Make a graph of R2 vs the number of (non-intercept) predictors in the model. Comment on the trend in this plot. (2 points) Question 2 – A fun test question (9 points) The following are outputs from R and some have been removed on purpose. Answer the following questions based on the provided information: a) What is R2 for myfit? Show all your work/steps for full credit. (4 points) b) What’s the value of the test statistic for the following command? What’s its distribution under H0? > anova(newfit1, myfit) (3 points) c) What is the value and null distribution for the F statistic under the following test?: = Full model vs = Y ~ X1 + X2 (in other words, what is the value and null distribution of F if I ran the following command:) anova(myfit, newfit2) (2 points)