辅导案例-FINC 430

FINC 430 Midterm Exam Spring 2020 1. Decide whether each model is parametric or non-parametric (your answer should be either parametric or non-parametric for the following models). a. b. c. KNN regression d. a. parametric, b. parametric, c. non-parametric, d. parametric 2. Decide whether each model is linear or non-linear (your answer should be either linear or non-linear for the following models). a. b. c. KNN regression d. a. linear, b. linear (or nonlinear), c. non-linear, d. linear (or nonlinear) 3. Calculate approximate values of parameters for the following LDA classification: (you should fill out the seven blanks in the table below) Class 1 (yellow) Class 2 (green) Class 3 (blue) Mean vector <1, 3> <2, 1> Covariance matrix Any 2×2 matrix with all positive values is correct, but 0 or negative value is wrong. Class membership probabilities 1/3 1/3 1/3 4. Which classification method is better between LDA and QDA for the following 3-class classification problem? (your answer should be either LDA or QDA), and why? Either of the following answers is correct: a. LDA, covariance matrices are similar among classes (or linear boundary is appropriate to perform classification) b. QDA, covariance matrices are different among classes 5. Fill the missing part in the following Python program to calculate , where X is BTC, and Y is ETH. import pandas as pd df = pd.read_csv(“Log returns.csv”) print(df) Unnamed: 0 BTC ETH 0 12/31/2018 -0.040653 -0.057338 1 1/1/2019 0.039229 0.066670 2 1/2/2019 0.016204 0.090924 3 1/3/2019 -0.025809 -0.040519 4 1/4/2019 0.010956 0.040081 .. … … … 394 1/29/2020 -0.011010 -0.011679 395 1/30/2020 0.022367 0.058923 396 1/31/2020 -0.017096 -0.026763 397 2/1/2020 0.005135 0.022159 398 2/2/2020 -0.006410 0.025335 [399 rows x 3 columns] import numpy as np cov = np.array(df.cov()) alpha = (cov[1][1] – cov[0][1]) / (cov[0][0] + cov[1][1] – 2*cov[0][1]) or alpha = (cov[1][1] – cov[1][0]) / (cov[0][0] + cov[1][1] – 2*cov[1][0]) Hint: your answer should be similar, but different from the following code: alpha = (cov[‘ETH’][‘ETH’] – cov[‘BTC’][‘ETH’]) \ / (cov[‘BTC’][‘BTC’] + cov[‘ETH’][‘ETH’] – 2*cov[‘BTC’][‘ETH’]) 6. The table below provides a training data set containing six observations, three predictors, and one qualitative response variable. Suppose we wish to use this data set to make a prediction for when ଵ = 0,ଶ = 1, ଷ = 2 using -nearest neighbors. a. Compute the Euclidean distance between each observation and the test point, ଵ = 0, ଶ = 1, ଷ = 2. b. What is our prediction with = 1? (your answer should be either Red or Green) Why? Our prediction is Y=Green because that is the response value of the first nearest neighbor to the point X1=0, X2=1, X3=2 c. What is our prediction with = 3? (your answer should be either Red or Green) Why? Red, because majority of the 3 nearest neighbors are Red. 7. Suppose that we wish to predict whether a given stock will issue a dividend this year (“Yes” or “No”) based on , last year’s percent profit. We examine a large number of companies and discover that the mean value of for companies that issued a dividend was ത = 7, while the mean for those that didn’t was ത = −1. In addition, the variance of for these two sets of companies was ොଶ = 25. Finally, 60% of companies issued dividends. Assuming that follows a normal distribution, predict the probability that a company will issue a dividend this year given that its percentage return was = 2 last year. Answer: 0.52 8. The below table shows the results of a linear regression model: = ଴ + ଵଵ + ଶଶ + , where is ETH, ଵ is BTC, and ଶ is USDT. a. What are the (estimated) values of ଴, ଵ, and ଶ? i. ଴ = −0.0014 ii. ଵ = 0.9786 iii. ଶ = 0.5199 b. What are the p-values of ଴, ଵ, and ଶ? i. p-value of ଴ is 0.251 ii. p-value of ଵ is 0.000 iii. p-value of ଶ is 0.185 c. Are ଴, ଵ, and ଶ statistically significant with significance level of 0.05? i. ଴ is not statistically significant ii. ଵ is statistically significant iii. ଶ is not statistically significant d. What is 95% confidence interval of ଵ? [0.910 1.047] 9. I made an object by using the following Python code: data = (‘abc’, ‘def’, ‘ghi’) and I tried to change data[2] from ‘ghi’ to ‘jkl’, but error occurred as follows: a. Explain the reason of the error (your answer should not be “’tuple’ object does not support item assignment”). Tuples are “immutable”, i.e., they cannot be modified after creation. b. Modify the original code “data = (‘abc’, ‘def’, ‘ghi’)” to avoid the error. data = [‘abc’, ‘def’, ‘ghi’] 10. The below diagram shows KNN classification. a. What is KNN prediction at ? with K=3? (your answer should be either Red or Green) Green (or B) b. What is KNN prediction at ? with K=7? (your answer should be either Red or Green) Red (or A) 11. The below Python program generates a bootstrap data set: import pandas as pd df = pd.read_csv(“Log returns.csv”) print(df) Unnamed: 0 BTC ETH 0 12/31/2018 -0.040653 -0.057338 1 1/1/2019 0.039229 0.066670 2 1/2/2019 0.016204 0.090924 3 1/3/2019 -0.025809 -0.040519 4 1/4/2019 0.010956 0.040081 .. … … … 394 1/29/2020 -0.011010 -0.011679 395 1/30/2020 0.022367 0.058923 396 1/31/2020 -0.017096 -0.026763 397 2/1/2020 0.005135 0.022159 398 2/2/2020 -0.006410 0.025335 [399 rows x 3 columns] from sklearn.utils import resample df_resmp = resample(df) Calculate the probability that df is the same as df_resmp. The resample( ) function allows repetition; hence, the number of outputs of df_resmp is 399ଷଽଽ. Besides, there is only one event that df is the same as df_resmp. Therefore, the probability is 399ିଷଽଽ. 12. The below table shows the results of a logistic regression. Calculate the six missing values. z = coef / std_err [0.025 0.975] = [coef – 1.96*std_err coef + 1.96*std_err] 13. The below plot shows the results of KNN regression with K=8. a. Is the parameter value of K=8 appropriate? No b. Why do you think so? (explain in terms of bias and variance) Low bias, high variance (or overfitting) c. Answer the following question if you think K=8 is inappropriate: What is an appropriate parameter value of K? Any value greater than 8 14. List two regression methods to model the relationship between X and Y in the below scatter plot, and explain why you choose it. Polynomial regression: this method is appropriate to represent a nonlinear relationship (e.g., quadratic function) KNN regression: this method is appropriate to represent a nonlinear relationship 15. List two classification methods to build a classifier, given the below training data, and explain why you choose it. LDA: covariance matrices are similar among classes (or linear boundary is appropriate to perform classification) QDA: QDA works even if covariance matrices are similar KNN classification: KNN works for linear boundary as well as for nonlinear boundary

辅导案例-FINC 430

Related

Previous Post辅导案例-FIT2004-Assignment 3

Next Post辅导案例-PMCOMP4403/COMP7402-Assignment 2

Author admin