Skip to main content
留学咨询

辅导案例-ID 00395

By May 15, 2020No Comments

Paper ID 00395 FAMILY NAME: OTHER NAME(S): STUDENT ID: SIGNATURE: SCHOOL OF RISK AND ACTUARIAL STUDIES TERM 1 2019 FINAL EXAM ACTL 2131: Probability and Mathematical Statistics INSTRUCTIONS: 1. TIME ALLOWED2 HOURS 2. READING TIME10 MINUTES 3. THIS EXAMINATION PAPER HAS 33 PAGES. 4. TOTAL NUMBER OF QUESTIONS7 5. TOTAL MARKS AVAILABLE100 6. MARKS AVAILABLE FOR EACH QUESTION ARE SHOWN IN THE EXAMINATION PA- PER (AND OVERLEAF). ALL QUESTIONS ARE NOT OF EQUAL VALUE. 7. ANSWER ALL QUESTIONS IN THE SPACE ALLOCATED TO THEM. IF MORE SPACE IS REQUIRED, USE THE ADDITIONAL PAGES AT THE END. 8. CANDIDATES MAY BRING a. THE TEXT FORMULÆ AND TABLES FOR ACTUARIAL EXAMINATIONS (ANY EDITION) INTO THE EXAMINATION. ITMUST BEWHOLLY UNANNOTATED. b. THEIR OWN UNSW APPROVED CALCULATOR 9. ALL ANSWERS MUST BE WRITTEN IN INK. EXCEPT WHERE THEY ARE EXPRESSLY REQUIRED, PENCILS MAY BE USED ONLY FOR DRAWING, SKETCHING OR GRAPH- ICAL WORKS. 10. THIS PAPER MAY NOT BE RETAINED BY THE CANDIDATE. Question Total available marks Total marks attained for the question for the question 1 [8 marks] 2 [10 marks] 3 [15 marks] 4 [25 marks] 5 [10 marks] 6 [17 marks] 7 [15 marks] [total: 100 marks] Page 2 of 33 Question 1 [8 marks] Suppose that random variables X and Y have joint probability density function given by fXY (x, y) = { Cxy, for 0 ≤ x ≤ a and 0 ≤ y ≤ b, 0, otherwise. (a) [2 marks] Show that C has to be equal to 4 a2b2 for fXY (x, y) to be an appropriate pdf. Solution: We should use the property that ∫∞ −∞ ∫∞ −∞ fXY (x, y)dxdy = 1. ( 1 point) ∫ ∞ −∞ ∫ ∞ −∞ fXY (x, y)dxdy = ∫ a 0 ∫ b 0 Cxydxdy = C × a 2b2 4 = 1. (1 point) (1) Therefore, C = 4 a2b2 . Page 3 of 33 (b) [2 marks] Determine the marginal density function of the random variable Y , i.e., fY (y). Solution: fY (y) = {∫∞ −∞ fXY (x, y)dx = ∫ a 0 4 a2b2 xydx = 2 b2 y for 0 ≤ y ≤ b. (1.5 points) 0 otherwise. (0.5 points) (2) Page 4 of 33 (c) [4 marks] Determine E [XY ] and express it in terms of a and b. Solution: E [XY ] (1 points) = ∫ ∞ −∞ ∫ ∞ −∞ xyfXY (x, y)dxdy (1 points) = ∫ a 0 ∫ b 0 4 a2b2 x2y2dxdy (1 points) = 4 a2b2 x3 3 ∣∣∣∣a 0 y3 3 ∣∣∣∣b 0 (1 points) = 4 9 ab. (3) Page 5 of 33 Question 2 [10 marks] You are provided with a sample of 20 observations. You have fitted a log-normal distribution to the sample and the Maximum Likelihood estimators are µ̂ = 2, and σ̂2 = 1. Your manager asks you to perform a χ2 goodness-of-fit test on the estimated log-normal distribution and has kindly provided you with the following table: cell observed observed 2 expected expected 2 (observed-expected) (observed-expected) 2 0-1 0 0 0.46 0.21 -0.46 0.21 1-2 2 4 1.46 2.12 0.54 0.29 2-3 1 1 1.76 3.10 -0.76 0.58 3-4 2 4 1.72 2.96 0.28 0.08 4-5 1 1 1.57 2.46 -0.57 0.32 5-6 3 9 1.39 1.93 1.61 2.59 6-7 2 4 1.22 1.48 0.78 0.61 7-8 0 0 1.06 1.13 -1.06 1.13 8-9 2 4 0.93 0.87 1.07 1.14 9-10 1 1 0.81 0.66 0.18 0.03 10-11 1 1 0.71 0.51 0.29 0.08 11-12 1 1 0.63 0.40 0.37 0.14 12-13 0 0 0.56 0.31 -0.56 0.31 13-14 1 1 0.49 0.24 0.51 0.26 14-15 1 1 0.44 0.19 0.56 0.32 >15 2 4 4.79 22.93 -2.79 7.78 sum 20 36 20 186.0103 0 22.1503 (a) [4 marks] Explain to your manager why the above table is not appropriate to perform a χ2-test. Propose changes to the table to make it suitable for a χ2-test. Solution: Cells should be chosen such that the expected number of observations in each cell, i, Ei ≥ 5. (2 points) The degree of freedom should be positive. Thus, minimum number of cells we could have is 1+2+1=4. (2 points) Page 6 of 33 Your manager has provided a new but incomplete table. observed expected (observed-expected) (observed-expected) 2 0-4 -0.39 4-8 0.58 2.42 5.87 >15 -2.79 7.78 (b) [6 marks] Complete the table and test whether the log-normal distribution is a valid distribution for this dataset. Use a level of significance of α = 5% for your test. Solution: We propose to use the following cells: (3 points) observed expected (observed-expected) (observed-expected) 2 (observed-expected) 2 expected 0-4 5 5.39 -0.39 0.16 0.03 4-8 6 5.24 0.76 0.58 0.11 8-15 7 4.58 2.42 5.87 1.28 >15 2 4.79 -2.79 7.78 1.62 sum 20 20 0 14.38 3.05 under H0 we have that T = ∑ (observed-expected)2 expected ∼ χ2c−k−1 = χ21. (2 points) Hence: T = 3.05 ≤ χ20.95(1) = 3.84 thus we cannot reject the null hypothesis that the log-normal distribution is the correct distribution. (1 point) Page 7 of 33 Question 3 [15 marks] Consider an independent random sample {X1, …, X200} with common density function given by: f(x;α) = α(1 + x)−α−1, for α > 0 and x ≥ 0. From this sample, the following information has been collected: 200∑ i=1 xi = 159.711; 200∑ i=1 x2i = 474.998; 200∑ i=1 log(xi) = −201.257; 200∑ i=1 log(1 + xi) = 99.266. (a) [6 marks] Show that the log-likelihood is given by log(L(α)) = n log(α)− (α+ 1) n∑ i=1 log(1 +Xi), and prove that the Maximum Likelihood estimator of the parameter α is given by αˆML = n∑n i=1 log(1 +Xi) . Solution: We find the log-likelihood as follows: log(L(α)) = n∑ i=1 log(α(1 +Xi) −α−1) = n log(α)− (α+ 1) n∑ i=1 log(1 +Xi). (1 point) (4) We have to solve the following optimization problem: log(L(α))→ max α (1 point) (5) We find the solution by differentiating the equation above subject to α and establishing the first order condition: ∂ log(L(α)) ∂α = n α − n∑ i=1 log(1 +Xi) = 0. (2 point) (6) Thus, αˆML = n∑n i=1 log(1 +Xi) . (1 points) (7) We also should check the second order condition: ∂2 log(L(α)) ∂α2 = − n α2 < 0, for α > 0. (1 point) (8) Thus, αˆML is the global minimum of the log-likelihood function. Page 8 of 33 (b) [5 marks] To test the hypothesis H0 : α = α0 vs. H1 : α 6= α0 we define the log-likelihood ratio statistic to be log(Λ) = log(L(α0))− log(L(αˆML)), where αˆML is the value of the Maximum Likelihood estimator of α. Show that log(Λ) = n log ( α0 ∑n i=1 log(1 +Xi) n ) + n− α0 n∑ i=1 log(1 +Xi). Solution: We have log(L(α)) = n log(α)− (α+ 1) n∑ i=1 log(1 +Xi). (9) Using the result in part (a) we then have that log(Λ) is log(Λ) = n log(α0)− (α0 + 1) n∑ i=1 log(1 +Xi)− n log(αˆ) + (αˆ+ 1) n∑ i=1 log(1 +Xi) (10) = n log ( α0 n/ ∑n i=1 log(1 +Xi) ) + ( n∑n i=1 log(1 +Xi) − α0 ) n∑ i=1 log(1 +Xi) (11) = n log ( α0 ∑n i=1 log(1 +Xi) n ) + n− α0 n∑ i=1 log(1 +Xi). (12) Page 9 of 33 (c) [4 marks] Using the result in (b), perform the likelihood ratio test of H0 : α = 2 vs. H1 : α 6= 2 at a 5% level of significance. Clearly state your conclusion. [Hint: you are reminded that −2 log(Λ) is asymptotically χ21, where Λ is the likelihood ratio] Solution: Using the result in part (b) we have that the value of −2 log(Λ) is −2 log(Λ) = −2 ( n log ( α0 ∑n i=1 log(1 +Xi) n ) + n− α0 n∑ i=1 log(1 +Xi) ) (13) = −2 (200 · log(2 · 99.266/200) + 200− 2 · 99.266) = 0.0108. (14) Then, p-value = Pα0(Λ < λ) = P (−2 log(Λ) > 0.0108) = P (χ21 > 0.0108) = 0.917. (15) Therefore, we fail to reject H0 at 5% level of significance. Page 10 of 33 Question 4 [25 marks] Let X be a Normal(0, σ2) random variable. In this question we consider the random variable Y = |X|, and an i.i.d sample coming from this distribution {Y1, Y2, . . . , Yn}. [Note that all questions are independent and do not require solving previous question(s). However, you may need information contained in the statements of other questions.] (a) [2 marks] Explain why the probability density function of Y is fY (y) =  √ 2 σ √ pi exp ( − y 2 2σ2 ) , for y ≥ 0, 0, otherwise. The pdf of X is 1√ 2piσ exp ( − x2 2σ2 ) for all x. Because Y is the absolute value of X, the pdf will be 0 for all y < 0, and exactly double than that of X for y ≥ 0. (b) [3 marks] Show that the expectation of Y is E[Y ] = σ √ 2 pi . By definition E[Y ] = √ 2 σ √ pi ∫ ∞ 0 y exp ( − y 2 2σ2 ) dy = √ 2 σ √ pi (−σ2) exp ( − y 2 2σ2 ) ∣∣∣∞ 0 = −σ √ 2 pi (0− 1) = σ √ 2 pi . Page 11 of 33 (c) [2 marks] Show that the Method of Moments estimator of σ is σˆMM = √ pi 2 ∑n i=1 Yi n . From (b) σ = √ pi 2 E[Y ] σˆ = √ pi 2 · Y . (d) [2 marks] Explain how to use the central limit theorem to identify the asymptotic distribution of the estimator, σˆMM , given in (c), if the sample size is very large. Because this Method of Moments estimator is simply a function of the sample mean Y , we can invoke the CLT and say that σˆ ≈ Normal ( σ, pi 2 V[Y ] n ) (Students don't need to specify the parameters to get full marks). Page 12 of 33 (e) [4 marks] The Maximum Likelihood estimator of σ is σˆML = √∑n i=1 Y 2 i n . This is different than the Method of Moments estimator, σˆMM , given in (c). Between these two estimators (the Maxi- mum Likelihood estimator and the Method of Moments estimator), which one would you choose and why? Solution: This question has no right answer, it is meant for them to explain some strengths and weaknesses of both estimators. Possible elements in favour of the MM estimator are It is straightforward to see this estimator is unbiased, which is suitable. It is straightforward to see that the variance converges to zero, which means the estimator is consistent. Possible elements in favour of the ML estimator are The method on which it relies (finding the parameter that makes the observations the most likely) is sounder. Automatically, the ML is consistent. Asymptotically, the ML reaches the Cramer-Rao bound, hence asymptotically it is the best unbiased estimator. Page 13 of 33 (f) [4 marks] The two estimators from part (b) and (c), σˆMM and σˆML, respectively, are different for a fixed n. Show that as n→∞, they converge in probability to the same quantity. For the MM estimator, it is both unbiased and with a variance going to 0, hence consistent. But consistent exactly means converging in probability to the true parameter σ. Alternatively, seeing that the MM estimator is just √ pi 2 · Y one can invoke the Law of Large numbers and by it we have that it converges to √ pi 2E[Y ] = σ. For the ML estimator, one automatically has that the MLE is consistent, hence converges in probability to σ. (g) [4 marks] Show that if X ∼ Normal(0, σ2), then X2 ∼ Gamma (12 , 12σ2 ). [Hint: you are reminded that χ21 is the same as a Gamma( 1 2 , 1 2).] It seems wise to start with the hint. A χ21 is the square of a standard Normal, hence( X σ )2 = X2 σ2 ∼ χ21 ≡ Gamma ( 1 2 , 1 2 ) But then, we can find the MGF of X2 as MX2(t) = E [ e X2 σ2 σ2t ] = MX2 σ2 (σ2t) = ( 1− σ 2t 1/2 )− 1 2 which is exactly the MGF of a Gamma(12 , 1 2σ2 ). Page 14 of 33 (h) [4 marks] Use the result given in part (g) to construct a confidence interval for the parameter σ, at a level of significance α. We need a pivot for σ. Realising that X2 = Y 2 and using part (g) we have X2 σ2 = Y 2 σ2 ∼ χ21 and so 1 σ2 ∑ Y 2i ∼ χ2n so that Pr [ χ2n,α/2 ≤ 1 σ2 ∑ Y 2i ≤ χ2n,1−α/2 ] = α Pr [ χ2n,α/2∑ Y 2i ≤ 1 σ2 ≤ χ2n,1−α/2∑ Y 2i ] = α Pr [ ∑ Y 2i χ2n,1−α/2 ≤ σ2 ≤ ∑ Y 2i χ2n,α/2 ] = α Page 15 of 33 Question 5 [10 marks] Professional actuarial exams are hard, and often require several attempts to complete. Out of 50 actuaries randomly selected who completed one specific exam, the table below summarises the number of attempts they needed to pass. # Attempts Needed # Actuaries 1 19 2 13 3 7 4 4 5 1 6 4 7 2 For question (a), (b) and (c), assume the following to hold true: • Candidates are independent from each other. • Candidates are of equal strength. • All trials have constant probability p of success, and are independent of previous attempts. (a) [2 marks] If X denotes the number of attempts a given candidate requires, what random variable would you use to model X? Justify your answer. Given the assumptions of independence and identical probability of success p, X would follow a Geometric(p) distribution. Page 16 of 33 (b) [3 marks] Using a technique of your choice, propose an estimator for p and compute its numerical value. We can see the whole data as being one big sample from Bernoulli trials. there have been 50 success out of a total number of trials of 19 + 13 · 2 + 7 · 3 + . . .+ 2 · 7 = 125. Hence, a simple Method of Moments estimator would be pˆ = Y¯ = 50 125 = 0.4. (c) [2 marks] Let X1, X2, . . . , X50 be the number of attempts needed by each of the 50 candidates. Say you wanted to conduct an hypothesis test on the hypothesis that Xi ∼ Some Specified Distribution, i ∈ {1, 2, . . . , 50}. Name two possible tests you could use. Anderson-Darling, Cramer-von Mises, Kolmogorov-Smirnoff, Kuiper or the chi-squared test are all valid options. (this is not required, but if we do the χ2 test for real): we find the expected number of observa- tions for each x. Using p(x) = p(1− p)x−1 = (1/2)x we have x # Expected 1 50 · 0.5 = 25 2 50 · 0.52 = 12.5 3 50 · 0.53 = 6.25 4+ 50− 25− 12.5− 6.25 = 6.25 And the test statistic is T = ∑ i (Ei −Oi)2 Ei = (25− 19)2 25 + (12.5− 13)2 12.5 + (6.25− 7)2 6.25 + (6.25− 11)2 6.25 = 1.44 + 0.02 + 0.09 + 3.61 = 5.16. Because no parameters were estimated, the d.o.f. for the χ2 are 4− 1 = 3. Next, χ23,95% = 7.815 which is bigger than the test statistic, hence we do not reject H0. Page 17 of 33 (d) [3 marks] Are the three assumptions reasonable? Explain your answer for each assumption. The is no `perfect answer' here. They should get the marks for coherent reasoning. Some elements of answer: Candidates are independent: It is reasonable to a certain extent because the exams are individually done and each candidate is responsible for its own study. However, one can make the case that some candidates may be good friends studying together and encouraging themselves to work hard. In that case the success of one candidate may be linked to that of another, so that some candidates are not perfectly independent. Candidates are of equal strength: This one is harder to justify: within all the people attempting those exams, it would make sense that some would be more apt and more determined than others. All trials have constant probability p of success, and independently of previous attempts: This one is also hard to justify: it would seem that a candidate redoing an exam has a better chance of success, because they should know a bit more with at each attempt. Page 18 of 33 Question 6 [17 marks] You want to model students results in ACTL1101 final exam (y) using their results in ACTL1101 midterm exam (x) through a simple linear regression model. For those 300 students, at first you consider a model without intercept, i.e. for each student i = 1, 2, . . . , 300, your model is yi = βxi + εi, εi ∼ N(0, σ2). You also consider a model with an intercept, i.e. yi = α+ βxi + εi, εi ∼ N(0, σ2). You fit both models and some of the numerical results are presented in the table below, Model α (std error) β (std error) Residual std error SST SSE No intercept NA 1.021 (0.011) 10.91 70612.82 35557.27 With intercept 21.823 (1.434) 0.671 (0.024) 8.19 70612.82 20005.12 where SST stands for the total sum of squares and SSE stands for the sum of squared errors. (a) [4 marks] For the `no intercept model', derive the least squares estimator of β. Call the sum of squared errors S(β), with S(β) = ∑ (yi − βxi)2. We want to minimise this quantity, hence we take its derivative and set it to 0: dS dβ = 2 n∑ i=1 (yi − βxi)(−xi) = 0 =⇒ ∑ xiyi = β ∑ x2i =⇒ β̂ = ∑ xiyi∑ x2i . The second derivative d2S dβ2 = 2 ∑ x2i > 0 hence we have a minimum (this part is not necessary for students to get full marks). (b) [2 marks] Explain why SST is the same for both models? By definition SST is ∑n i=1(yi − y¯)2 which is unaffected by the choice of model (involves only observations). Page 19 of 33 (c) [4 marks] Based on the information you have, which model do you consider better? Provide two reasons. Quite clearly, the model with intercept is better. Possible answers: The respective R2 for both models are 1 − SSE SST = 0.496 for the `no intercept’ model and 0.717 for the full model. Based on this criteria measuring how well the model captures
the variability of data (the dependent variable y), the full model (with intercept) is better. The residual standard error is smaller for the full model, which is another sign it captures better the trend of the data (because the variability of the unexplained error is smaller). In the full model, both parameters α and β are highly significant (you can tell by how narrow the standard errors are), which suggests this model is an improvement over a model with no intercept. We will accept as valid the argument that the `no intercept’ model still explains at least partially the trend in data and is a simpler, which is often preferred. Likewise, we will accept as a valid argument if someone coherently argues that it `makes more sense’ to have a null intercept (arguing for instance that someone who gets 0 in the midterm can be expected to get 0 in the final). Page 20 of 33 (d) [4 marks] What additional information would you like to obtain in order to refine your judgement on the validity of these models? Provide two examples and justify why the information can be useful. The answer should revolve around the fact we don’t know how our residuals look like, so we don’t know if the assumptions of Normality and Homoscedasticity are verified. Elements to help us would be: Plot of residuals ε̂ vs fitted dependent variable ŷ Plot of residuals ε̂ vs explanatory variable x Formal tests of Homoscedasticity (e.g. Breusch and Pagan test) QQ plot of residuals against a Normal Formal tests of Normality of the residuals Will also be accepted (as one additional information) if students mention they would like to have the t-values and p-values associated with the estimates of α and β, to better judge the significance of the parameters. Is not accepted as a valid answer: investigation of multicollinearity (since this concept is relevant only in multiple linear regression). Page 21 of 33 (e) [3 marks] You forgot to include Josephine’s mark in your fitting. She got a 85 mark in her midterm, and you would like to use the `with intercept model’ to build a 90% confidence interval on her specific mark in the final. You are given that x¯ = 1 300 300∑ i=1 xi = 55.42 and Sxx = 300∑ i=1 x2i = 1,033,989. We need to compute:∑ x2i − nx¯2 = 1,033,989− 300 · 55.422 = 112,576.08. The confidence interval for a single point yˆ (also called prediction interval) with observed value of the explanatory variable x0 is given by yˆ ± t0.95,n−2 × s× √( 1 + 1 n + (x0 − x¯)2∑ x2i − nx¯2 ) 21.823 + 0.671 · 85± 1.65 · 8.19 √( 1 + 1 300 + (85− 55.42)2 112,576 ) 78.858± 13.588 = [65.270, 92.446] Page 22 of 33 Question 7 [15 marks] You are investigating the key drivers of lawyer’s weekly income (y). You have selected the following three variables as part of your analysis: – x1: weekly hours worked (continuous variable) – x2: age (continuous variable) – x3: gender (categorical variable: 1 = male, 0 = female) The model you fit is a multiple linear regression model with the following specification y = β0 + β1×1 + β2×2 + β3×3 + ε, ε ∼ N(0, σ2). Based on a sample of 250 lawyers, you have obtained the following results on the fit Estimate Std. Error t-value p-value (Intercept) 64.700 78.335 0.826 0.409635 x1 29.335 1.447 20.266 <0.000001 x2 4.870 1.247 3.905 0.000122 x3 587.286 28.442 20.649 <0.000001 In addition, a Normal Q-Q plot of the residuals from your model is given below. −3 −2 −1 0 1 2 3 − 40 0 − 20 0 0 20 0 40 0 Normal Q−Q Plot Theoretical Quantiles Sa m pl e Qu an tile s [Questions start on the next page.] Page 23 of 33 (a) [2 marks] Does the Normal Q-Q plot give you confidence in the validity of the fitted model? Justify your answer. This plot assesses the hypothesis of Normal residuals. If Normally distributed, the points should follow the straight line. Although they do not perfectly follow the straight line at low and high quantiles, there is not cause for real concern here (and especially because the sample quantiles are `less extreme' than the Normal ones). If students argue that points at the extremes are not on the reference line, hence the residuals might not be perfectly Normal, this is considered a valid argument. (b) [2 marks] From the fitted model, what is the estimated average weekly income of a 65 year old male lawyer working 35 hours per week? yˆ = 64.700 + 29.335 · 35 + 4.870 · 65 + 587.286 = 1,995.26$. Page 24 of 33 (c) [4 marks] Conduct a test of significance level 1% on the null hypothesis that gender has no effect on lawyers' weekly income. State clearly H0 and H1 in terms of the parameters of the model and state your conclusion. If gender has no effect, then it would mean that β3 = 0. That is, we are testing H0 : β3 = 0 vs. H1 : β3 6= 0. Importantly, this is a two-sided test (i.e. we want to detect significant differences potentially on either sides). The t-value and p−value of such a test are given as is in the table as 20.649 and < 0.000001, which clearly indicate a rejection of the null. (d) [2 marks] The explanatory variable `age' is significant here. Do you think age could be a con- founding variable? Justify your answer. This question is meant for students to discuss what is a confounding variable, in the context of this question. There is not a specific `good' answer. Any coherent reasoning that is on the topic should be awarded marks. A confounding variable is one that affects both the predictor(s) and the response. Here, some- one could argue that Age could affect income itself (older lawyers could be more competent, or perceived as more competent and hence being paid more), but could also influence how many hours they work (maybe younger lawyers need to impress and hence are working longer hours). We will also accept a different interpretation of this question, i.e. `could there be a confounder of Age?'. If students interpret this question that way, that is fine. An answer could then be: Yes, their could be a confounder (not in the model), of which Age is only an observable approx- imation. We see here that the higher the age, the higher the income. But can we conclude that being old in itself makes you earn more money? Possible, but questionable. It could be reason- able to argue that older lawyers simply had more years to obtain expertise, develop a network of clients and because of that, they earn more. Likewise, older lawyers have had more time to build a reputation and can therefore arguably charge more per hour to their clients. Something like `Experience' or `Reputation' would then be the real driver (i.e. a confounder). Any answer along these lines is accepted. Page 25 of 33 (e) [2 marks] Provide an interpretation of the value βˆ2 = 4.870. What does this value represent? In this model, it means that each lawyer earns on average 4.87$ additional dollars per year of age, per week (and regardless of their gender and number of hours worked). (f) [3 marks] Consider an alternative model with an interaction between x2 and x3, namely the model y = β0 + β1x1 + β2x2x3 + ε, ε ∼ N(0, σ2). What are the differences between this model and the old one? The impact of `hours worked' is the same, i.e. pay increases linearly as a function of hours worked. However, the effect of the other two variables (gender and age) are fairly different. In this new model: The key point is: Only if you are a man, your salary will increase linearly with your age. If you are a women, age has no effect on your salary. Age alone has no impact on weekly income. Gender alone has no impact on weekly income. This is different then in the old model, where both gender and age had a `stand-alone' effect on weekly income. Said otherwise: for both men and women, age affects income. And for any age, gender affects income. End of Paper Page 26 of 33 Page 27 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working. Page 28 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working. Page 29 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working. Page 30 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or us e for rough working. Page 31 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working. Page 32 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working. Page 33 of 33 ADDITIONAL PAGE Answer any unfinished questions here, or use for rough working.

admin

Author admin

More posts by admin