辅导案例-STAT 511

1 STAT 511 Exam2 – Spring 2020 Instructions (Please take a moment to read): 1. Students are expected to work independently on the exam. Do NOT discuss the exam with anyone else. Do NOT post questions or comments about the exam to Canvas. Do NOT share R code or notes or email regarding the Final Exam. Please consider fairness to your classmates as a guide of conduct when working on the exam. 2. Consider using the R Markdown Template in Canvas, but NOT required. But, please be organized and concise which will be 6 points of Exam (no need to spend too much time on the “perfect document”, most submissions will receive 6/6). Also, the template may be useful guide for organization even if copy/pasting to Word document or knitting to Word. 3. When including figures and tables please make them clear and concise. No need to go overboard on detail, but correct formatting and essential labeling should be included. 4. Please make an effort to provide clear, concise, and coherent grammar for written response. 5. For any “hand” calculation questions, show your work in order to be eligible for partial credit. As a general rule, round answers to 4 decimal places. 6. Use α = 0.05 for all questions (unless specified otherwise). 7. You may use any software, reference, or on-line resource that you find helpful. 8. If you have a specific question regarding content of the exam such as: interpretation of a question on the exam, requirements for a response to a question, software issues with Rstudio, or R function that continues to give errors, please send an email directly to me ([email protected]). I will try to respond in a reasonable time frame. Also, please make sure your Canvas settings allow notifications when there are announcements on Canvas in case I need to clarify something on the Final. But, I will likely not be responding to any email after the due date (for a while anyway). 9. The Exam must be submitted to Canvas in pdf format by 11:59 pm Wednesday 4/14/2020 using the Please include your name on your submitted document as “signature” for Honor Pledge below. Honor Pledge: I have not given, received, or used any unauthorized assistance on this exam. Exam Parts: Multiple Choice (32 pts) True/False are 2 points, remaining 3 pts each as before. Matching Question (14 pts) Chapter 6 Problem: Sleep Data (12 pts) Chapter 10 problem: Lefties (12 pts) Chapter 8/9 Problem: Cuckoo Bird Eggs (24 pts) Organization /Clarity (6pts) 2 1. Multiple Choice ( 32 pts) For each numbered problem for this section, note the best answer choice in R Markdown or Submitted document for each numbered question for Multiple Choice. Questions 1 through 7 (True or False): For each question, just note in your submitted document the question number and True or False. No need to justify. (Each True/False question is 2 pts, remaining multiple choice are 3 pts). 1. Managing experiment-wise error rate is especially important when comparing means associated with a very large number of treatment levels. 2. The LSD (unadjusted) pairwise comparison method helps control the experiment-wise error rate. 3. The HSD (Tukey) pairwise comparison method has lower power than the LSD (unadjusted) method. 4. For many cases, multiple comparison tests using Bonferroni’s adjustment can be considered too conservative. 5. Dunnet’s method can be used to test all pairwise comparisons from a one-way ANOVA. 6. For a one-way ANOVA, subsequent multiple comparison adjustment methods are only viable when the response variable is normally distributed across all treatment groups. 7. A survey yielded an estimated proportion of 0.11 based on a sample of size n=55. The large sample normal approximation is adequate for this scenario. Use the criteria based on 3xSE. 8. In R, the function lm() performs which of the following? (A) An ANOVA of specified data (B) A linear model of specified response and predictor variables. (C) A list of means for a specified response variable (D) A likelihood maximized estimate for a specified response variable. 9. The multiple testing problem is best described by which of the following: (A) Testing a hypothesized mean before a treatment, and then testing the mean again after a treatment. (B) Having a large number of potential Type II errors when comparing many pairs of means between treatment groups. (C) Having a large number of potential Type I errors when comparing many pairs of means between treatment groups. (D) When performing an ANOVA, the degrees of freedom for the residuals (within) is considered to be too large. 10. As a variable, the number of CSU graduate students who voted in the 2020 primary election is best described by which of the following. (A) Qualitative and Discrete (B) Quantitative and Discrete (C) Qualitative and Continuous (D) Quantitative and Continuous 3 Suppose you collect data from four different populations and have the following summary statistics. Use the table below to answer questions 11 – 12. N Mean SD SE Group A 45 76.54 19.45 2.90 Group B 44 78.45 32.01 4.83 Group C 43 79.65 57.21 8.72 Group D 42 81.32 84.43 13.03 11. If you performed an ANOVA using the data that generated the summary statistics above, which of the following outcomes would you expect? (A) A small F statistic and a small p-value (B) A small F statistic and a large p-value (C) A large F statistic and a small p-value (D) A large F statistic and a large p-value 12. If you performed diagnostics for a fitted model in order to do an ANOVA using the data that generated the summary statistics above, which of the following would you expect? (A) There would be no need to perform diagnostics since the ANOVA assumptions are violated with unequal sample sizes. (B) The p-value from a Levene’s test is likely to be relatively small (C) There is certainly going to be problems with the data when plotted on a qqplot (D) The means are likely to be significantly different 13. A pharmaceutical company’s allergy medication is known to provide relief to 75% of the people who use it. The company wants to see if a new, improved version of the medication works even better. In a test of the hypotheses H0: = .75 versus HA: > .75, the p-value is .32. Which of the following gives the best interpretation of this p-value? (A) There is a 32% chance that the new medication is more effective than the old medication. (B) There is a 32% chance that the new medication and old medication are equally effective. (C) If the new medication is more effective than the old medication (if HA is true), there is a 32% chance of obtaining the observed sample proportion or something greater due to natural sampling variation. (D) If the new medication and old medication are equally effective (if H0 is true), there is a 32% chance of obtaining the observed sample proportion or something greater due to natural sampling variation. 4 Matching Problem (14 pts) Below are statistical tests covered in Chapter 6 up to Chapter 10. Match each named test to the most appropriate scenario below. Assume all data are collected through random sampling methods. Please list the scenarios in your submitted work and match the corresponding letter. Each letter used once. No explanation is necessary. A. Levene’s Test B. Welch-Satterthwaite Test C. #- Test D. ANOVA E. Paired T-test F. Kruskal-Wallis Test G. Tukey’s Method Scenario 1: A physical assessment called VO2 -max measures fitness levels by determining the volume of oxygen a person can use in respiration during physical activity. A researcher wants to see if VO2-max on average is different for those who live at high altitude versus those who live at lower elevations. A random sample of 45 active people between the ages of 30 and 40 are selected from several residents in the mountains of Colorado (above 9,000 ft). A random sample of 53 similarly active people are selected who live on coastal areas of California. On inspection of the boxplot, it appears that th
e variances of VO2-max from each group are quite different. Scenario 2: Using the same data in Scenario 1, the researcher would like to use a test and p-value that strengthen evidence that the variance between the two samples are really different. Scenario 3: A researcher wishes to compare means for 6 groups for which the standard deviations within each group appear very similar. She would like all pairwise comparisons to be based on honestly significant differences. She finds that the fitted model of response versus group levels has residuals which are distributed approximately Normal (0, %). Scenario 4: A researcher would like to determine if 3 treatments of sample size 9 have the same central value. The fitted model of response versus group levels has residuals that do not appear to be normally distributed. But, it does appear that the variances for each group are very similar. Scenario 5: A researcher would like to determine if caffeine helps sprinters run faster times. Twelve runners are selected to run one lap as fast as they can, and their time is recorded. Each runner then drinks 3 double espressos. Thirty minutes after drinking the coffee, each runner then runs one lap once again as fast as they can, and their time is recorded. Assume the differences between lap time for each runner before and after the being caffeinated is normally distributed. Scenario 6: A researcher would like to determine if 3 treatments of sample size 13 have the same central value. The fitted model of response versus group levels has residuals that appear to be normally distributed and the variance for each group appear to be very similar. Scenario 7: For quality control, a machine must manufacture a drug within a range for a certain amount of active ingredient. A random sample of 50 tablets are measured to see if the standard deviation of the amount of the active ingredient is below a certain value. 5 There is a R Markdown Template Available for these questions. Sleep Data in R (12 pts , 3 pts/part) 1. There are many “built-in” datasets available in Base R. You may have found some that are used when searching for help on the internet (the iris dataset for example is popular with graphing). Other packages that you install into R likely have their own example data as well. For this question we use the sleep data. It is important to first read the help for any dataset you use! This help is rather limited, but does offer a bit of background as well as references. To analyze, there is no data to load. For example, view the structure of these data with just: str(sleep). After checking out this very old data set about a sleep medication, submit answers to the following questions while including appropriate R code and output. But please make sure the answer is clear in some narrative form as well (not just a list of output). (a) Define the parameter(s) of interest. Use appropriate symbol(s) (or at least names of Greek letters, no mark-up required) to write the hypotheses that was used in this study. (Read the help carefully) (b) Provide a boxplot by group. Also, briefly explain (1 or 2 sentences) why this boxplot on its own might be misleading to a reader (This question is meant to help think about the hypotheses above to be sure. Sometimes these types of studies will use boxplots still, but clarify what it shows). (c) Provide output for the appropriate test. Also, please note the value of the test statistic and the p-value for the test you defined in part (a). (d) State the conclusion of the test in terms of the context of the study. Left-Handed, no data to load here either (12, 3 pts/part) 2. Before the 1980s, school children were encouraged (and sometimes even forced) to write with their right hand as opposed to their left. As a result, only about 8% of Americans in the 1980s claimed to be left-handed. Over time, the stigma associated with being left-handed and social pressure against it have relaxed. To investigate whether the proportion of the population that is left-handed has increased since the 1980s, a psychologist surveys a random sample of 150 Americans of whom 18 claim to be left-handed. At the 5% significance level, is this evidence that the proportion of Americans that are left-handed is higher today than in the 1980s? (a) Define the parameter of interest and state the null and alternative hypotheses. (b) Find an appropropriate test statistic. (c) Determine an appropriate p-value. (d) Write a conclusion for this hypothesis test in the context of the study. 6 Egg sizes of Cuckoo birds. 3. The European common cuckoo bird is known for laying eggs in nests of other bird species (in terms of data, these are HostSpecies, though not all are actual bird species descriptions). Nest categories include meadowlarks, trees, hedges, robins, wagtails, and wrens. Researchers measured the size of the cuckoo eggs (in mm) relative to the different types of nests in which cuckoos lays eggs. They hope to see if there are possible differences in size according to hostspecies. Suppose that measurements taken in this study represent random and independent samples. (total pts = 24) This dataset is provided in the Exam2 page. Note that these data need to be put in long form. The sample sizes are not equal, so you’ll need to deal with missing data when loading. Help for this is provided in the template. (a) Provide an appropriate summary plot. (b) Provide approporiate summary statistics (c) No need to include them, but check diagnostics for assumptions of this type of a analysis. In a sentence or three note any concerns that might be troubling and support your answers. (d) Regradless of model assumption concerns, provide an ANOVA table (e) State a conclusion to the F-test in context of the study. (f) Provide some type of compact display comparing the different host species and in a sentence or two summarize the findings. (g) Suppose that cuckoo clutch size (number of eggs laid) in nests of wrens and meadowlarks are similar, and cuckoo clutch size in nests of robins and wagtails are similar. Researchers are interested in seeing if the size of the host bird is also related to the average size of cuckoo eggs. They wish to compare egg size differences between (wren vs meadowlarks) vs (robins vs wagtails), which can also be thought of as an interaction contrast. In other words, wrens are smaller than meadowlarks, but robins and wagtails are very similar in size. A significant difference in these comparisons would suggest bird size may be related to cuckoo egg size while controlling for clutch size. (6 pts, 3 each for following) (i) Consider the appropriate contrast coefficients to write the appropriate null hypothesis using the following parameters ( ‘()* , -)./0’ , (012* , ‘.34.25). (ii) Provide an estimate for this contrast. And provide a p-value for this contrast.

辅导案例-STAT 511

Related

Previous Post辅导案例-CSE 460

Next Post辅导案例-CSCC63

Author admin