- May 15, 2020

STAT 341 Assignment 1Student Name and IDDue Friday September 27 at 9:00amNote• Replace “Student Name and ID” with your name and waterloo ID.• Using RMarkdown or LaTeX is required and no hand-written and/or imported screenshots will beaccepted in the assignments. A mark of 0% will be assigned to the questions which were not compliedin RMarkdown or LaTeX, and/or those which include hand-written solutions and/or screenshots.• Organization is part of a full solution. Full marks will be awarded to organized complete solutions andmarks will be deducted for unorganized solutions.Wayne Gretzky Goals• Wayne Gretzky “The Great One” is a Canadian former professional ice hockey player. He played 20seasons in the National Hockey League (NHL) and he is considered to be the greatest hockey playerever. The dataset “GretzkyGoals.csv” contains all of Gretzky’s goals during his time in the NHL. Here,we will examine the times at which the goals occurred during a sixty-minute game.• Note:– For each part below, any plots should be side-by-side in the same figure and they all should beproperly labelled.a) [5 Marks] Read-in the data and convert the times into seconds. Remove the overtime goals, which areany goals that occur beyond sixty minute mark of regular play. Then calculate average, median andrange for the times Wayne scored goals during a game.b) [5 Marks] Plot three histograms using Sturges, Scott and Freedman-Diaconis rules for the number ofbins along with a boxplot. All four plots should be side-by-side in the same figure and they all should beproperly labelled. From these three histograms and boxplot does Wayne tend to score at any particulartime during the game?c) [5 Marks] Construct two histograms, one using unequal bins (using the same number of bins usedby part b ) and another that the breaks the 60 minute game in two minutes interval. From these twohistograms does Wayne tend to score at any particular time during the game?d) [3 Marks] Construct a quantile plot of the goal times. What feature does the quantile plot exhibit?e) [5 Marks] Partition the goal times into empty-net goals and against-goalie goals. Construct a histogramusing the same number of bins used by part b) and using varying bins widths for each group. Commenton the differences among the groups?World Health Organization (WHO) on life expectancyIn this question you will be analyzing data for WHO on life expectancy. The data is in the file “WHO_life.csv”posted on LEARN. Below is the powerfun from the course notes for your convenience.1powerfun if(sum(x <= 0, na.rm=TRUE) > 0) stop(“x must be positive”)if (alpha == 0)log(x)else if (alpha > 0) {x^alpha} else -x^alpha}• The variables are Country, Year– LB.XXXX the life expectancy at birth (years) for Males, Females & Both, and– L60.XXXX Life expectancy at age 60 (years) for Males, Females & Both.a) [3 Marks] What range of powers (the values of α) make the distribution of the life expectancy at birth(years) for males symmetric?b) [3 Marks] What range of powers (the values of α) make the distribution of the life expectancy at age60 (years) for males symmetric?c) [3 Marks] Using α = 4 as the power for x = LB.Male and α = 0 as the power for y = L60.Male, plotthe transformed variables. Between the transformed and original data, which one is better-suited forlinear modeling?Investigating influence and sensitivity of the geometric meana) The geometric mean for the population P) = {y1, . . . , yN} isa(P) = a(y1, . . . , yN ) =(N∏i=1yi)1/Ni) [3 Marks] Derive the sensitivity curve for the geometric mean and write it as a function y and a(P)ii) [3 Marks] Write the infuence of the geometric mean as function of yu and a(P).b) The measure of sensitivity and influence does not have to depend on the difference between the attributevalues. Instead we might define the sensitivity-ratio (SR) for non-negative attributes asSR (y;α(P)) =[α(y1, …, yN−1, y)α(y1, …, yN−1)]Ni) [3 Marks] Derive the SR for the geometric mean as function of y and a(P)ii) [3 Marks] A measure of influence can be constructed with the ratio as well. Here we define theinfluence-ratio (IR) for non-negative attribute asIR (a, u) =[a(P)a(y1, . . . , yu−1, , yu+1, yN )]N• Derive the influence-ratio for the geometric mean as function of yu and a(P)c) The population provided in returns2.txt is the monthly returns of an investment over a period of 20years.2i) [2 Marks] Plot the sensitivity curve (SC) of the geometric mean for this population over theranges [0.01, 2] & [0.0001, 100]. No comments required.ii) [2 Marks] Write a function similar to sc called sr such that the• inputs are a population y.pop, a sequence or vector of y values y and an attribute functionattr and the• ouput is the sentivity-ratio for each y value.iii) [3 Marks] Plot the sensitivity-ratio (SR) of the geometric mean for this population over theranges [0.01, 2] & [0.0001, 100]. Comment on the plots.d) [4 Marks] Using the same population, plot the influence values from a) ii) and b) ii) for the geometricmean and a histogram of the data, and comment on the plots. Use Freedman–Diaconis rule for thenumber of bins. Comment on the influential observations based on each measure.3