辅导案例-EEEM007 ASP

EEEM007 ASP-Lab 2019-20 / v1.8 / AM,MDP / 11 Mar 20 Module: EEEM007 ADVANCED SIGNAL PROCESSING Year: 2019/2020. Examiner: A Mustafa, M D Plumbley Date Due: 4pm Tuesday 12 May 2020 LAB EXPERIMENT: PATTERN RECOGNTION 1. INTRODUCTION The aim of the experiment is to provide a practical support for the main aspects of the material covered in EEEM007 Advanced Signal Processing, namely Pattern Classification. The experiment is designed to reinforce the main theoretical results learnt in the course and to enable the student to gain intuitive feeling about the effects of classifier design factors (such as training set size, class separability, feature space dimensionality, classification rule) on the classification system performance. It should also validate experimentally the results derived in the Assignment. Two popular classifiers are investigated in the experiment: i) the Bayes decision rule for normally distributed classes and ii) the k-Nearest Neighbour decision rule. 2. MATLAB COMMANDS You may find the following Matlab commands useful: mvnrnd generate random numbers fitcnb construct Gaussian classifier fitcknn construct k-NN classifier predict function for finding class labels mahal Mahalanobis distance Include the Matlab code you used for your experiments in an Appendix to your report. 3. REPORT FORMAT The Report should be a single document, consisting of descriptions and results for the experiments (approx. 8-10 pages), plus an Appendix containing text listings of your Matlab code. [Remember: The TurnItIn submission system will identify similar reports, so please write your own Matlab code and experiment descriptions.] 4. MARKING SCHEME For each experiment: 1. Description of experiment, objectives, design choices [20%] 2. Predicted experimental outcome from the theory [20%] 3. Presentation of the experimental results obtained [20%] 4. Analysis of results, discussion and conclusions [20%] 5. Presentation of the Matlab code used [20%] Each of Experiments 1 to 4 is worth 1/4 of the final mark. EEEM007 ASP-Lab 2019-20 / v1.8 / AM,MDP / 11 Mar 20 5. EXPERIMENT DESCRIPTION 5.1 Experiment 1 The aim of this experiment is to investigate the effect of training sample size on the classifier performance and the effect of the size of test set on the reliability of the empirical error count estimator. A Gaussian classifier for discriminating between two 2-dimensional classes will be used for the study. Use the class parameters (, , , , ) corresponding to your Assignment. (a) Generate a design set and test set , each containing 100 patterns per class, distributed according to the selected class parameters. Design a Gaussian classifier for different values of = 3,5,10,50,100 select samples to form a training set () from . Test the designs with the same training samples to obtain error estimates design(). Test each classifier also using the full test set of 100 patterns to obtain error estimate test. Repeat the experiment ten times for independent design sets , = 1, . . .10 (the same test set may be used) and average the estimated errors to obtain: design() = 1 10 ∑ design () 10 =1 test() = 1 10 ∑ test () 10 =1 . Plot the average errors design, test as a function of and compare them with the theoretical error. Comment on your results and try to explain them. (b) Repeat (a) using the k-Nearest Neighbour classifier for all odd values of k in the range 1 to 51, noting that cannot exceed 2. For a representative range of , starting from k=1 compare your results with those obtained for the Gaussian classifier in Experiment 1. For each value of select the best result ∗() (smallest error) over all values of and record the corresponding ∗(). Plot these best results as a function of in the same graph as that used for presenting the results of (a) above. Plot also ∗() in the same graph. Comment on your results. 5.2 Experiment 2 In this experiment we shall investigate the effect of the size of test set on the reliability of the empirical error count estimator and we will also learn how to classify test and training samples. (a) Design a Gaussian classifier for a two class problem similar to Experiment 1 (a). Generate ten independent test sets , = 1, . . .10 of size 100 using the same class conditional distribution parameters. Choose several different numbers of test patterns = 3,5,10,50,100. For each samples form test set () for each to obtain an estimate () of the classifier error probability. Repeat the experiment ten times for independent test sets and find the mean value () and the variance 2() of the estimated error, i.e. EEEM007 ASP-Lab 2019-20 / v1.8 / AM,MDP / 11 Mar 20 () = 1 10 ∑ () 10 =1 2 = 1 9 ∑[() − ()] 2 10 =1 Plot your results and comment on how they compare with your theoretical predictions. Comment on your results. (b) Now assume that you have 100 samples for both testing and training, such that + = 100. Assign = 3,5, 10, 50, 75,90 and respectively = 97,95, 90, 50, 25,10 to form training () and test () sets from the and samples to calculate the design (design() from Experiment 1 (a)) and test error (() from Experiment 2 (a)) estimates for each combination of train as test sets. Create a table as shown here depicting both design and test errors for each set of and samples. design() () 3 97 (results) … (etc) Comment on your results. Which combination of and is the best trade-off in terms of error estimates. 5.3 Experiment 3 In this experiment we shall investigate the dependence of the test() curves on the dimensionality of the pattern recognition problem. As the determination of the true error probability in high dimensional spaces is difficult, we shall take the estimated error test(), = 500) as the true error. You may choose the covariance matrix to be an identity matrix in this exercise. Choose the mean vectors so that the error probability is maintained in the range of 5-10%. Try to estimate test() for values = 3,5,10,20,50,100,200 for two class problems in = 5,10,15 dimensional spaces. The changes in the test error with the increase in training set size are different in different dimensions. If you are unable to design the classifier, consider what is the minimum number of training samples required as a function of dimensionality and why. How many training samples as a function of dimensionality do you need to achieve a reasonable performance (close to the true error rate)? 5.4 Experiment 4 The aim of this experiment is to explore the relationship between class separability and error probability. Choosing a suitable pattern space dimensionality, , generate a sequence of sets of normally distributed training data containing patterns from two classes. The Mahalanobis distance between the means of the two classes in each set should be monotonically increasing with the position of the set in the sequence. Estimate the error probability of the classifier in each case. Plot the error as a function of Mahalanobis distance. Comment on your findings.

辅导案例-EEEM007 ASP

Related

Previous Post辅导案例-GR5241

Next Post辅导案例-CSSE2002/7023-Assignment 2

Author admin