辅导案例-MCEN90048 AI

MCEN90048 AI for Mechatronics Project 1: Let us be a Fashion Critic Submission: Please submit via LMS either the link to your GitHub/Bitbucket repository, or a compressed file that contains all the code files and results structured in proper folders. Due Date: 6:00 pm, 1st May 2020 Contents 1. Dataset description …………………………………………………………………………………………………………. 2 2. Project background …………………………………………………………………………………………………………. 2 3. Special protocol for training and test ………………………………………………………………………………… 4 4. Description of tasks, tutorials and expected results …………………………………………………………….. 4 5. Structure of code and results ……………………………………………………………………………………………. 5 6. Marking criterion ……………………………………………………………………………………………………………. 7 Appendix 1 – Expected test accuracy on Fashion MNIST dataset ……………………………………………… 7 1. Dataset description Figure 1. A Sprite image made by stacking 400 image samples from the Fashion MNIST dataset. The Fashion MNIST dataset (link: https://github.com/zalandoresearch/fashion-mnist) contains a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28 × 28 grey-scale image that belongs to the following 10 classes: 0 – ‘T-shirt/top’, 1 – ‘Trouser’, 2 – ‘Pullover’, 3 – ‘Dress’, 4 – ‘Coat’, 5 – ‘Sandal’, 6 – ‘Shirt’, 7 – ‘Sneaker’, 8 – ‘Bag’, 9 – ‘Ankle boot’. 2. Project background The objective of this project to help the students get familiar with basic TensorFlow (Keras) operations and process to train a classification model. The basic process of data analysis is: 1) Pre-process the training data, e.g., you may collect prior information about the dataset, remove outliers, analyse the statistics of each feature, scale/normalize the features. For the natural image, the pixel value varies from 0 to 255, which should be scaled to either [0,1] or [−1,1] to avoid numeric issues during forward- and back-propagation. 2) Design a model. For Fashion MNIST dataset, the input has 28 × 28 = 784 pixels, which is considered small and may be processed by Multi-layer perceptrons (MLP), convolutional neural networks (CNN) or recurrent neural networks (RNN). Checklist for design choices and hyper-parameters: • Model architecture. You may choose: o MLP, CNN or RNN (or any other neural network models you think suitable) as the base model. o The number of layers and number of neurons or channels in each layer. o The activation function in each layer. For the output layer, since it is a multi-class classification problem, softmax activation is recommended. • Training procedure. You may choose o Which gradient descent method to update the model parameters; popular choices include SGD, Momentum, RMSprop, and Adam. In TensorFlow, the back- propagation and gradient updates are done automatically by the selected optimizer. o The learning rate and schedule. Learning rate is very important in gradient methods; if your model does not perform as expected, perhaps try another learning rate first. Popular choices for learning rate schedule include constant, step decay, exponential decay, and warm restart. o The batch size; popular choices are 2: 32, 64, 128, 256 (please note that there is no strong reason behind these choices; batch size like 60, 100 are totally fine, but 60,000 is not suitable due to memory issues). o The number of iterations or epochs, or when to stop training. An epoch is defined as iterations during which the model has seen every training example once. For example, if your batch size is 60, it takes 60000 60 = 1000 iterations to see each training example, thus one epoch equals 1000 iterations. Early stop may be used. • Regularization methods. Good normalization and regularization methods may improve training stability, model capacity and/or prevent overfitting. See lecture slides for a variety of regularization methods. • Objective/loss function. For multi-class classification and balanced class, it is suggested to use cross entropy loss. • Others: ________________. You are welcome to fill the blank here. 3) Train the model. The usual training protocol is to divide the training data into training set TR (around 80% of whole training data) and validation set VA (the rest 20%). a. After you’ve made decisions on the hyperparameters (let’s call this set HP1), you start the training and the model parameters are updated based on the training set TR. After every few iterations (e.g., 200), you validate the model on validation dataset VA to monitor the training process. After you stop the training, you validate the trained model M1 on validation dataset to get a performance for HP1 and M1. You may repeat the model training using the same HP1 and save the model with the best validation accuracy. b. Now it is time to try different sets of hyperparameters HP2, HP3, HP4… For each hyperparameters set HPi, you repeat the training process in step 1 and get a performance for HPi and Mi. c. After enough trials, you obtain the best set of hyperparameters HPbest and the corresponding trained model Mbest. 4) Test the model. At this stage, you may test your best model on the test set and report the performance (accuracy). 3. Special protocol for training and test Please note that, in this project, we do NOT encourage you to repeatedly train the model to get the best possible set of hyperparameters because this takes a significant amount of time and is not the aim of this project. Thus, you may just train the model on the whole training dataset and validate it on the test dataset. A reasonable test accuracy for Fashion MNIST dataset is in the range of 0.88 – 0.95, depending on your network architecture. The accuracy is defined as follows. Consider a single example in the test set; let be its label; assume = 6, i.e., is an image of ‘Shirt’. See below for a few examples of network outputs: • = [0.01,0.01,0.02,0.0,0.01,0.01,0.91,0.01,0.01,0.01]. Class 6 has the highest probability in the predictions; thus, the model correctly predicts the label with a high confidence 0.91. • = [0.11,0.11,0.12,0.01,0.11,0.11,0.20,0.11,0.11,0.11]. Class 6 has the highest probability in the predictions; thus, the model correctly predicts the label, though with a low confidence 0.20. • = [0.21,0.11,0.12,0.01,0.11,0.11,0.10,0.11,0.11,0.11]. Class 1 has the highest probability in the predictions; thus, the model fails to predict the label. Let be the number of examples where the model correctly predicts the labels, the test accuracy is defined as 10000 . More information on the expected accuracy can be found at the Benchmark section of download link or Appendix 1 of this document. If you have obtained a reasonable accuracy (e.g., ±0.025 of the expected accuracy of your type of model), perhaps stop there and focus on other tasks. 4. Description of tasks, tutorials and expected results The students should finish the following basic tasks: 1. Visualizing the dataset. In this task, you are required to write code to randomly sample 400 images from the training set and prepare them for visualization in TensorBoard using t-SNE and PCA. The expected files for submission include: • Checkpoint files: see details at the section “Train and test a classification model” below. • Sprite image file: xxx.png • Label file: xxx.tsv • Projector configuration file: projector_config.pbtxt 2. Train and test a classification model. In this task, you are required to train a neural network classifier on the training set and validate your model on the test dataset. See previous section “Special
protocol for training and test” for more details. You are required to submit the checkpoint files or HDF5 file for a successfully trained model. If you choose checkpoint files, you need to submit the following: • A file named checkpoint • One or several files named xxx.index • Zero, one or several files named xxx.meta • One or several files named xxx.data-00000-of-00001 3. Monitor the training process. In this process, you are required to add the model parameters, their gradients and training loss into summary; save model summary to event files (one or several files named events.out.tfevents.xxx); visualize the summaries and graph in TensorBoard. 4. Profile the training process. In this task, you are required to collect the runtime statistics during training process for one epoch and visualize them in TensorBoard. You need to submit a folder called profile with the following files: • xxx.input_pipeline.pb • xxx.kernel_stats.pb • xxx.overview_page.pb • xxx.tensorflow_stats.pb • xxx.trace.json.pb The students should finish one of the following additional tasks: 1. Compare two or three gradient methods. 2. Compare two or three learning rate schedules. 3. Compare two or three regularization methods. 4. Compare two or three network architectures (e.g., you may compare MLP vs CNN vs RNN, or plain CNN vs ResNet). Please provide saved model files for successful trials. Please BRIEFLY explain what additional task(s) you have done and the reasons the results are different (better or worse). Please provide Matplotlib code for plotting figures if applicable. Please note that you may use any standard API or modules (e.g. tf.keras, pytorch) to finish the above tasks. 5. Structure of code and results It is highly encouraged that the students organize the code and results in a systematic way so that it is easy for the collaborators/examiners to reproduce and modify. This section suggests one possible way to organize the code and results. Please note that the examples given in this section are for your reference. It is not compulsory to organize your code and results in the same format. You are welcomed to develop your own style and preference. Code and results The code and results should be in different folders based on their characteristics. For example, you may consider the following file structure: (here indention indicates the folder hierarchy) ProjectFolder NoteBooks FashionMNIST.ipynb Results FashionMNIST MLPBaseline Momentum_ckpt_01 Checkpoint momentum_0.001.ckpt.data-00000-of-00001 momentum_0.001.ckpt.index momentum_0.001.ckpt.meta momentum.json Momentum_event_01 events.out.tfevents.xxx SGD_ckpt_01 SGD_event_01 CNNModels ConvPool_01 DataVisual Checkpoint Fashionmnist.png Fashionmnist_embedding.ckpt.data-00000-of-00001 Fashionmnist_embedding.ckpt.index Fashionmnist_embedding.ckpt.meta Fashionmnist_label.tsv Projector_config.pbtxt If you put different trials/models in different folders, you may simply compare their summaries using TensorBoard. In the example above, using tensorboard – logdir=’ProjectFolder/Results/FashionMNIST/MLPBaseline’ will let you compare SGD and Momentum summaries. Jupyter Notebook The content of cells in Jupyter Notebook should be concise, while it is recommended to put additional definitions of functions and classes in python files (.py). For example, you may consider the following cell structure: In [0]: from additional_func import FLAGS # configure input and output file folders FLAGS.DEFAULT_IN = ‘…’ FLAGS.DEFAULT_OUT = ‘…’ In [1]: import tensorflow as tf (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data() In [2]: from additional_func import MLPModel model = MLPModel(architecture = …, ) model.fit(x_train, y_train, validation_data=(x_vali,y_vali),…) In [3]: # load trained model and test on new data model.evaluate(x_test, y_test) 6. Marking criterion The first project takes up 5% of the final marks of the subject. The marks are divided among the tasks, as follows: 1. Visualizing the dataset – 0.5% 2. Train and test a classification model – 1.5% • The model is properly defined – 0.5% • The expected accuracy is achieved – 0.5% • The model is saved and can be loaded – 0.5% 3. Monitor the training process – 0.5% 4. Profile the training process – 0.5% 5. Additional task – 2% • The additional models are properly defined and trained – 1% • Different models are properly compared, and the explanations are reasonable – 1% Please always remember to provide appropriate and smart comments on your code. Appendix 1 – Expected test accuracy on Fashion MNIST dataset A reasonable test accuracy for Fashion MNIST depends on network architecture you use and many other factors like number of epochs, batch size, learning rate, etc. Please try to achieve a test accuracy within ±0.025 of the expected accuracy. For example, if you are using a convolutional neural network with 2 Conv + pooling, you are expected to get an accuracy of 0.851 − 0.941. If you are unable to get such accuracy after many trials, please provide possible reasons. Classifier Pre-processing Test accuracy 2 Conv + pooling None 0.876 2 Conv + pooling None 0.916 2 Conv + pooling + ELU activation None 0.903 2 Conv Normalization, random horizontal flip, random vertical flip, random translation, random rotation. 0.919 2 Conv < 100K parameters None 0.925 2 Conv ~113K parameters Normalization 0.922 2 Conv + 3 FC ~1.8M parameters Normalization 0.932 2 Conv + 3 FC ~500K parameters Augmentation, batch normalization 0.934 2 Conv + pooling + BN None 0.934 2 Conv+2 FC Random Horizontal Flips 0.939 3 Conv+2 FC None 0.907 3 Conv + pooling + BN None 0.903 3 Conv + pooling + 2 FC + dropout None 0.926 3 Conv + BN + pooling None 0.921 5 Conv + BN + pooling None 0.931 CNN with optional shortcuts, dense-like connectivity Standardization + augmentation + random erasing 0.947 GRU + SVM None 0.888 GRU + SVM with dropout None 0.897 WRN40-4 8.9M params Standard pre-processing (mean/std subtraction/division) and augmentation (random crops/horizontal flips) 0.967 Densenet-BC 768K params Standard pre-processing (mean/std subtraction/division) and augmentation (random crops/horizontal flips) 0.954 Mobilenet Augmentation (horizontal flips) 0.950 Resnet18 Normalization, random horizontal flip, random vertical flip, random translation, random rotation. 0.949 Googlenet with cross- entropy loss None 0.937 Alexnet with Triplet loss None 0.899 Squeezenet with cyclical learning rate 200 epochs None 0.900 Dual path network with wide resnet 28-10 Standard pre-processing (mean/std subtraction/division) and augmentation (random crops/horizontal flips) 0.957 MLP 256-128-100 None 0.8833 VGG16 26M parameters None 0.935 WRN-28-10 Standard pre-processing (mean/std subtraction/division) and augmentation (random crops/horizontal flips) 0.959 WRN-28-10 + Random Erasing Standard pre-processing (mean/std subtraction/division) and augmentation (random crops/horizontal flips) 0.963 Human Performance Crowd-sourced evaluation of human (with no fashion expertise) performance. 1000 randomly sampled test images, 3 labels per image, majority labelling. 0.835 Capsule Network 8M parameters Normalization and shift at most 2 pixel and horizontal flip 0.936 HOG+SVM HOG 0.926 Xgboost Scaling the pixel values to mean=0.0 and var=1.0 0.898 DENSER - 0.953 Dyra-Net Rescale to unit interval 0.906 Google automl 24 compute hours (higher quality) 0.939

辅导案例-MCEN90048 AI

Related

Previous Post辅导案例-CSE1010

Next Post辅导案例-CS 471

Author admin