Skip to main content
留学咨询

辅导案例-2PM

By May 15, 2020No Comments

Stat 481 Project Due Time: 2PM on Friday, April 3. Instructions: • Project must be typed for credit. Write your final answers as COMPLETE SENTENCES. Projects submitted using R-markdown will not receive full credit. • Show all work. Attach your code at the end of the project. You may use R or SAS, or another statistical software. • Do NOT just turn in a set of code plus a sheet with your answers. Write this as a report that you would give to a person who knows nothing about statistics. • No late projects are accepted. Dataset Location: Use the dataset provided to you on Blackboard. It will be a .csv or .xlsx file with a header in the dataset. Problem: This data file contains nutritional information and grocery shelf location for 77 breakfast cereals. Current research states that adults should consume no more than 30% of their calories in the form of fat, they need about 50 grams (women) or 63 grams (men) of protein daily, and should provide for the remainder of their caloric intake with complex carbohydrates. One gram of fat contains 9 calories and carbohydrates and proteins contain 4 calories per gram. A “good” diet should also contain 20-35 grams of dietary fiber. A variable named ”rating” was calculated by Consumer Reports. Our data consists of the following information: Variable Name Description Name Name of cereal type cold or hot calories calories per serving protein grams of protein fat grams of fat sodium milligrams of sodium fiber grams of dietary fiber carbo grams of complex carbohydrates sugars grams of sugars potass milligrams of potassium vitamins vitamins and minerals – 0, 25, or 100, indicating the typical percentage of FDA recommended shelf display shelf (1, 2, or 3, counting from the floor) weight weight in ounces of one serving cups number of cups in one serving rating a rating of the cereals Note: A value of −1 for nutrients indicates a missing observation. Total number of cases: 77. Question: The experiment is to develop one’s own rating system and find which cereal is the most healthy for you. Try to predict the cereal rating based on the nutrition facts and other observable characteristics. What variables best predict the rating? Can you quantify how good your predictive model is? 1 Important Items: 1. Goal: Construct a regression model. 2. Provide descriptive statistics such as sample size, minimum value, median, mean, variance / standard deviation, maximum value for quantitative values. For indicator variables, provide information as to the number of 0’s, the number of 1’s, and the sample size. 3. Ignore the missing values. Do not use the Name column in your analysis. 4. Check for multicollinearity. Exclude any variables with V IF > 10. Be sure to report if any variable(s) needed to be removed or if there were no issues present. This test only needs to be done once at the beginning of the analysis. 5. Check the model assumptions (linearity, independence, normality of residuals, equal variance of residuals). If you do not need to check a model assumption, explain why. • Provide any applicable plots or tests and interpret them. • For normality testing, use a 0.05 significance level. • If any of the model assumptions are not met, suggest ways to “fix” your data and then proceed to adjust it. You may round λ to one of the following values: −2,−1,−0.5, 0, 0.5, 1, 2, as suggested by the BoxCox transformation, even if software does not specifically suggest a convenient lambda value. • Be sure to re-check all the model assumptions after any transformations and address each of the assumptions in your report. • Note: To simplify things, if you need to do a transformation, do one transformation, and then even if the model assumptions are not met, proceed with analysis. Make some comments as to the fit of the model, but then continue with the process. 6. Build the “best” model possible by using either backward selection or forward selection (pick one) with the criteria for inclusion as having a significance of 0.10 or lower. 7. Draw conclusions/interpret your regression model. Include a statement about R2 before and after creating the “best” model possible. Include statements about each variable kept in the final model. Make sure these conclusions can be understood by a customer. Grading: You will be graded on the following items: • Code Provided • Written using Sentences • Data Summary • Initial Regression Analysis and Assumptions Check • Transformation of Variables (if needed, justification required) • Second Regression Analysis and Assumptions Check (if needed) • Building the “best” model possible. Includes summarizing which variables are kept and which are removed. Good to mention if the assumptions are met or not (specific plots / assumptions checks not required here). • Drawing Conclusions based on the “best” model possible. Includes final model statement, interpretation of parameter values, how R2 changes before and after creating the “best” model. 2 Some useful SAS procedures : • DATA or PROC IMPORT • PROC REG SAS PROC REG – model options: https://support.sas.com/documentation/cdl/en/statug/ 63033/HTML/default/viewer.htm#statug_reg_sect013.htm • PROC CORR • PROC UNIVARIATE • PROC TRANSREG • SAS Procedure Help https://documentation.sas.com/?docsetId=proc&docsetTarget=titlepage.htm& docsetVersion=9.4&locale=en Some useful R functions : • read.table() or read.csv() to import data • lm() and plot(lm()) to fit linear regression model • cor() for correlation coefficient • boxcox() in MASS package • vif() in car package • Chapter 11 (R Introduction) https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf 3

admin

Author admin

More posts by admin