辅导案例-CS5487-Assignment 1

CS5487 Programming Assignment 1 Regression Antoni Chan Department of Computer Science City University of Hong Kong In this programming assignment you will implement and test some of the regression methods that were introduced in the lecture and in the problem sets. Let f(x, θ) be a function with input x ∈ Rd and parameters θ ∈ RD, and f(x, θ) = φ(x)T θ, (1) where φ(x) : Rd → RD is a feature transformation of x. For example, the K-th order polynomial with input x ∈ R can be expressed as f(x, θ) = K∑ k=0 xkθk = φ(x) T θ, (2) where the feature transformation and parameters are φ(x) = [ 1, x, x2, · · · , xK]T ∈ RK+1, θ = [θ0, · · · , θK]T ∈ RK+1. (3) Our goal is to obtain the best estimate of the function given iid samples D = {(x1, y1), . . . , (xn, yn)}, where yi are the noisy observations of f(xi, θ). We have seen several ways of doing the regression, using different assumed noise models and formulations. For convenience, define the following quan- tities, y = [ y1, · · · , yn ]T , Φ = [ φ(x1), · · · , φ(xn) ] , X = [ x1, · · · , xn ] . (4) Here is a summary of the various regression algorithms we have seen so far: method objective function parameter estimate θˆ prediction f∗ for input x∗ least-squares (LS) ∥∥y − ΦT θ∥∥2 θˆLS = (ΦΦT )−1Φy f∗ = φ(x∗)T θˆ regularized LS (RLS) ∥∥y − ΦT θ∥∥2 + λ ‖θ‖2 θˆRLS = (ΦΦT + λI)−1Φy f∗ = φ(x∗)T θˆ L1-regularized LS (LASSO) ∥∥y − ΦT θ∥∥2 + λ ‖θ‖1 QP solver (see Prob. 3.12) f∗ = φ(x∗)T θˆ robust regression (RR) ∥∥y − ΦT θ∥∥ 1 LP solver (see Prob. 2.10) f∗ = φ(x∗)T θˆ distributions posterior predictive Bayesian regression (BR) θ ∼ N (0, αI) θ|X, y ∼ N (µˆθ, σˆ2θ) f∗|X, y, x∗ ∼ N (µˆ∗, σˆ2∗) y|x, θ ∼ N (f(x, θ), σ2) µˆθ = 1σ2 ΣˆθΦy µˆ∗ = φ(x∗)T µˆθ Σˆθ = ( 1 αI + 1 σ2 ΦΦ T )−1 σˆ2∗ = φ(x∗) T Σˆθφ(x∗) 1 Part 1 Polynomial function In the first problem, you will test these regression methods on a simple dataset, and look at the effects of using different objective functions, regularization terms, and formulations. The MATLAB data file poly data.mat (or poly data *.txt if you are not using MATLAB) contains a set of samples generated from a 5th order polynomial function, with observation noise σ2 = 5. The file contains the following variables: • sampx – sample input values (xi), each entry is an input value. • sampy – sample output values (yi), each entry is an output. • polyx – input values for the true function (also these are the test inputs x∗). • polyy – output values for the true function. • thtrue – the true value of the θ used to generate the data. A plot of the samples (sampx,sampy) and the true polynomial function (polyx, polyy) are shown below. −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −20 −10 0 10 20 30 40 x y samples function The goal is to try to estimate the polynomial function from the samples. (a) Implement the above 5 regression algorithms for the K-th order polynomial given in (2). In the next problem, you will use these regression methods with a different feature transforma- tion φ(x). Hence, it would be better to separate the regression algorithm and the feature transformation in your implementation. (b) For each regression method, use the sample data (sampx, sampy) to estimate the parameters of a 5th order polynomial function. Make a plot of the estimated function using polyx as inputs, along with the sample data. For BR, also plot the standard deviation around the mean. What is the mean-squared error between the learned function outputs and the true function outputs (polyy), averaged over all input values in polyx? For algorithms with hyperparameters, select some values that tend to work well. 2 (c) Repeat (b), but reduce the amount of training data available, by selecting a subset of the samples (e.g., 10%, 25%, 50%, 75%). Plot the estimated functions. Which models are more robust with less data, and which tend to overfit? Make a plot of error versus training size, and comment on any important trends and findings. (You should run multiple trials with different random subsets, and take the average error). (d) Add some outliers output values (e.g., add large numbers to a few values in sampy), and repeat (b). Which methods are robust to the presence of outliers, and which are most sensitive? Why? (e) Repeat (b) but estimate a higher-order polynomial (e.g., 10th order). Which models tend to overfit the data when learning a more complex model, and which models do not? Verify your observations by examining the estimated parameter values. . . . . . . . . . Part 2 A real world regression problem – counting people Now we will consider a real world regression problem – estimating the number of people in an image. Below is an example image of a sidewalk with people. From each image, we extract a feature vector xi ∈ R9, which is based on some properties of the crowd (e.g., the area covered by the crowd, the perimeter of the crowd, etc). We also have the number of people in the image, yi ∈ R. The goal is to learn a regression function between this feature vector and the number of people in the image. (Let’s ignore the fact that the count is actually a non-negative integer, and just apply our standard regression methods). The data can be found in the count data.mat MATLAB file (or count data *.txt if you are not using MATLAB). The mat file contains the following variables: • trainx – training inputs. Each column is a 9-dimensional feature vector. • trainy – training outputs (mean-subtracted counts for each input vector). • testx – test inputs. • testy – true outputs for the test inputs. Note that the output values are actually the mean-subtracted counts, so you will see some negative outputs. 3 (a) Let’s first look at using the features directly, i.e., set φ(x) = x. Use the training set (trainx, trainy), estimate a function with some of the regression algorithms above. Use the test set inputs testx to predict the output, and compare the predictions to the true output testy. (You can round the function outputs so they are counting numbers). Calculate the mean-absolute error and mean-squared error. Which method works the best? Plot the test predictions and the true counts, and discuss any interesting findings. (b) Now try some other feature transformations. For example, you can create a simple 2nd order polynomial as φ(x) = [ x1, · · · , x9, x21, · · · , x29 ]T . This could be expanded to include more cross- terms xixj . Try some other non-linear transformation of the features. Can you improve your results from (a)? . . . . . . . . . Part 3 Estimating hyperparameters (optional) So far you probably have been setting the hyperparameters of each method (e.g., λ) by hand to get the best performance on the test set. This is not a fair way to do an experiment, since it amounts to optimizing to the test set, and doesn’t reflect whether the model can generalize well to unseen data. In a real scenario, you wouldn’t have the true outputs for the test set! Here are two common ways to set the hyperparameters using only the training set: 1. N-fold cross-validation: With this method, the original training set is split into a new training set and a validation set. The regression function is estimated from the new (smaller) training set, and tested on the validation set, for different hyperparameter values. This is repeated for different partitions of the original training set. Formally, separate the training set D into N equal partitions, {D1, · · · ,DN}. For each partition i and candidate hyperparameter value λj : 1) train a regression function using λj on D\Di (all the data except for Di); 2) test the function on Di; 3) record the error ei,j . Finally, sum the error over the N partitions, Ej = ∑N i=1 ei,j . Select the λj with the lowest overall error, and train the final regression function on the full training set D. (usually people use between N = 3 to N = 10 folds, depending on the size of the
training set). 2. maximum marginal likelihood: For Bayesian regression, another way to select the hyper- parameters is to maximize the marginal likelihood of the training data (see Problem 3.11 for details), αˆ, σˆ2 = argmax α,σ2 p(y|X,σ2, α) = argmax α,σ2 log p(y|X,σ2, α) (5) where the marginal log-likelihood is log p(y|X,σ2, α) = log ∫ p(y|θ,X, σ2)p(θ|α)dθ (6) = −D 2 logα− n 2 log σ2 − 1 2σ2 ∥∥y − ΦT µˆθ∥∥2 − 1 2α ‖µˆθ‖2 − 1 2 log ∣∣∣Σˆ−1θ ∣∣∣− n2 log(2pi). (7) The procedure is as follows: for each candidate hyperparameter pair {αj , σ2j }, calculate the marginal likelihood in (7). Select the hyperparameter pair that results in the largest marginal likelihood. 4 (a) For Problem 2, use cross-validation or maximum marginal likelihood to select the hyperparam- eters using only the training data. Does the accuracy increase or decrease? How close were the automatically selected parameters to your hand-selected ones? . . . . . . . . . • What to hand in – You need to turn in the following things: 1. Answers, plots, analysis, discussion, etc. for the above problems. 2. Source code files. You must submit your programming assignment using the Canvas website. Go to “Assign- ments” ⇒ “Programming Assignment 1”. • Plagiarism – You should implement each regression algorithm using your own code. Do not use someone else’s implementation of the regression algorithms (including MATLAB’s implementation). It is okay to use common library components and functions, e.g. standard matrix operations (e.g., inv, eig), general purpose optimization toolboxes (e.g., quadprog, linprog), etc. If you use any special toolboxes or libraries, make sure to note them in your report. • Grading – The marks for this assignment will be distributed as follows: – 10% – Implementation of the regression algorithms (1a). – 40% – Results testing the regression functions (1b, 1c, 1d, 1e). – 20% – Results on counting people (2a). – 10% – Trying out other features for counting people (2b). – 20% – Quality of the written report. More points given to insightful observations and analysis. Note: if you really cannot implement the algorithms correctly, it is okay to use 3rd party software. You will not get marks for implementation, but you can still get marks for presenting the results. • Optional Problems – The optional problem 3 will not be graded. 5

辅导案例-CS5487-Assignment 1

Related

Previous Post辅导案例-CSCI-561

Next Post辅导案例-CS 1132-Assignment 3

Author admin