Skip to main content
留学咨询

辅导案例-T3-Assignment 2

By May 15, 2020No Comments

Multivariate Analysis: Assignment 2 The University of New South Wales School of Mathematics Department of Statistics 2019 T3: Due Friday, Week 10, 22 November at 23:59 • Submission instructions will be posted shortly. • No late assignments will be accepted without a successful application for a Special Consideration. • For computational and applied exercises, you may use either R or SAS. Include commands used and a reasonable amount of relevant output. • Use of computer algebra systems is permitted and encouraged, though note that one may not be available during the exams. 1. Consider identifying the neurotic state of an individual referred for psy- chiatric examination. Three measurements A, B, and C are made on each individual. The mean scores for each of 3 groups are: Group A B C Anxiety 2.970 1.13 0.795 Normal 0.655 0.06 0.090 Obsession 4.420 1.72 1.155 The pooled within group covariance matrix Spooled =  2.27 0.371 0.5170.371 0.565 −0.013 0.517 −0.013 0.505  . (a) Discriminant analysis For the following, calculate from the infor- mation provided here, i. Assuming equal misclassification costs and equal priors for the three groups, calculate the linear discriminant scores for classi- fying each of the three groups. 1 ii. Based on the above scores, classify the following newly observed individuals: A B C Mary 2.5 1.1 1.0 Fred 4.2 1.4 1.3 Giselda 1.1 0.6 0.3 iii. Suppose that in the population of people administered this exam- ination, 20% are, in fact, “normal”, 40% have anxiety, and 40% have obsession. Show how this changes the linear discriminant scores and classifications of the three individuals. iv. Consider classifying individuals from the “Anxiety” and “Ob- session” groups only. Determine the linear discriminant func- tion and estimate the probabilities of misclassification P(1|2) and P(2|1). (b) Discriminant analysis continued Load the original dataset from neurotic.csv provided. Using R or SAS: i.–iii. Repeat the corresponding parts of Part (a). iv. Calculate the in-sample confusion matrix for LDA (assuming equal prior probabilities). v. Use an appropriate hypothesis test to check that the equal within- group variance assumption required by LDA is satisfied. Report the test statistic, the p-value, and state the conclusion in the context of the problem. (c) Support vector machine Fit and tune a support vector machine of your choice for predicting the patient group from the measurements. Report the following for the SVM fit: i. Selected tuning parameters. ii. In-sample confusion matrix. iii. Out-of-sample accuracy estimated by cross-validation. iv. Predictions for the individuals in 1(a)ii. (d) Principal component analysis Perform a principal component analysis on the three measurements A, B, and C, ignoring grouping. i. Report the coefficients for the components, the eigenvalues, and the cumulative variance explained. ii. How many components are needed to explain at least 90% of the variation in the data? iii. How many components are needed according to the Kaiser’s rule? 2. Data on n = 20 consecutive years has been collected reflecting annual average prices of beef steers X1 and of hogs X2 and the annual per capita consumption of beef X3 and of pork X4. We are interested in the rela- tionship of livestock prices to meat production. The file price-cons.csv 2 contains the variables Y (year index) and X1, X2, X3, X4. We could pro- ceed by calculating U = (X1 + X2)/2, V = X3 + X4 and then regressing U on V. (a) Canonical correlation A perhaps better procedure would be to construct a (weighted) price index U = a1X1+a2X2 and consumption index V = b3X3 + b4X4 and to look at the maximal correlation between U and V. This is the canonical correlation analysis approach. i. Find and list both the canonical correlations and the related canonical variates (i.e., U and V ). Express the canonical variates using the raw coefficients and also by using the standardised coefficients (i.e., coefficients obtained by first standardising the variables involved). Since the prices are in dollar units but the consumption is in pounds, does it make sense to standardise here? Hint: Recall from the lecture that SAS provides standardised coefficients as a part of its output. In R, they may be ob- tained by first using the scale() function to standardise the inputs and then performing canonical correlation analysis on those. ii. Using canonical correlation analysis, formulate and test the hy- pothesis of independence of the price index and of the consump- tion index (intuition shows that it must be rejected). Report the test statistic, the p-value, and state the conclusion in the context of the problem. iii. Is one only canonical variable pair enough (i.e., is the second canonical correlation also significant)? (b) Multivariate linear model Now, suppose that our goal is not cor- relation but explanation: we wish to model consumption as a function of the prices. i. Fit a multivariate linear model with the consumption variables as responses and prices as predictors. Report the coefficients, the standard errors, and the estimated variance–covariance matrix of the residuals. ii. Briefly (in 2–3 sentences), interpret the regression coefficients and their significance. 3

admin

Author admin

More posts by admin