Skip to main content
留学咨询

辅导案例-CMPUT 328-Assignment 10

By May 15, 2020No Comments

CMPUT 328 Assignment 10 Classification and Detection on MNIST Double Digits Dataset Overview In this assignment, you are going to do classification on the MNIST Double Digits (MNISTDD) dataset. This dataset contains gray scale images of size 64 × 64. Each image has two MNIST digits (from 0 to 9) randomly placed inside it as in this visualization: Figure 1: Visualization of the first 64 images in the MNISTDD training set. Each image is 64 × 64. Please note that the digits in these images are not taken directly from MNIST, but rather generated by Generative Adversarial Networks (GANs) trained on it. Two digits in an image may partially or completely overlap each other as shown in Figure 1 (for example, see the last image in 3rd row) though complete overlap only happens in a small fraction of images. Most images do not have digits overlapping or only partially overlapping. Two digits in an image may also be of the same class. Your task in this assignment is to tell which digits are contained in each image and where are they located. Dataset The MNISTDD dataset is divided in to 3 subsets: train, validation and test containing 55K, 5K and 10K samples respectively. A sample consists of: • Image: A 64 × 64 image that has been vectorized to a 4096-dimensional vector. • Labels: A 2-dimensional vector that has two numbers in the range [0, 9] which are the two digits in the image. Note that these two numbers are always in ascending order. For example, if digits 7 and 5 are in a image, then this two-vector will be [5, 7] and not [7, 5]. • Bounding boxes: A 2 × 4 matrix which contains two bounding boxes that mark locations of the two digits in the image. The first row contain location for the first digit in labels and the second row for the second one. Each row of the matrix has 4 numbers which represent [row of the top left corner, column of the top left corner, row of the bottom right corner, column of the bottom right corner] in the exact order. Note: it is always the case that row of the bottom right corner – row of the top left corner = column of the bottom right corner – column of the top left corner = 28. This means that each bounding boxes has a size of 28 × 28 no matter how large or small the digit inside that box is. As an example, consider the following 64 × 64 image that contains the digits 5 and 2: • Image will be the above image but flattened to a 4096-dimensional vector. • Labels will be [2, 5] • Bounding boxes will be a matrix with the first row: [11, 27, 39, 55] and the second row: [6, 2, 34, 30]. This means that the digit 2 is located between the 11th and 39th rows and between 27th and 55th columns of the image. Same goes for digit 5 – it is located between the 6th and 34th rows and between 2nd and 30th columns of the image. Each set comprises 3 .npy files which can each be read using numpy.load()to obtain the corresponding matrix stored as a numpy.ndarray. Following are detailed descriptions of the 3 files where {SET_NAME} denotes the name of the subset (train, valid or test) and is the number of samples: • {SET_NAME}_X.npy: 2D matrix with dimension [, 4096] and containing the vectorized images Each row is a single vectorized image. • {SET_NAME}_Y.npy: 2D matrix with dimension [, 2] and containing the labels. Note that the labels are always in ascending order in each row. • {SET_NAME}_bboxes.npy: 3D matrix with dimension [, 2, 4] and containing the bounding boxes. For more information, see the description of bounding boxes above. For example, following are the dimensions of the numpy.ndarray in 3 files of the train set: • train_X.npy: [55000, 4096] • train_Y.npy: [55000, 2] • train_bboxes.npy: [55000, 2, 4] Task You are provided with the train and valid sets that can be downloaded from e-class in a zip archive named MNISTDD_train_valid.zip. This contains 6 files: train_X.npy, train_Y.npy, train_bboxes.npy, valid_X.npy, valid_Y.npy, valid_bboxes.npy. The test set is NOT released. You are also provided two python files: A10_eval.py and A10_submission.py. The latter has a single function classify_and_detect that you need to complete. It takes a single matrix containing all test (or validation) images as input and returns two NumPy arrays pred_class and pred_bboxes respectively containing the classification labels and detection bounding boxes in the same format as described above. To reiterate, the two digits in each row of pred_class must be in ascending order and the two rows in pred_bboxes corresponding to that image must match this ordering too. A10_eval.py can be run to evaluate the performance of your method as in terms of the classification accuracy (% of digits classified correctly) and Intersection over Union (IOU) of the detection boxes with the ground truth boxes. You are free to add any other functions or classes you need including those in other files imported from A10_submission.py. Just make sure to submit all files needed to run your code. You can use any machine learning method to solve this problem. There is no restriction or guideline as to what algorithm you can or should use. Marking This assignment uses relative marking where all qualifying submissions will be ranked by classification accuracy and detection IOU separately. A linear scaling from 50 – 100 will then be used for lowest to highest ranked submissions to generate a score for each metric. The overall score will be the mean of the two. There is no lab quiz for this assignment. What to Submit You need to submit a single zip file containing the modified A10_submission.py along with any other python files or checkpoints needed to run it on the test set. Testing will be done by running the main function in A10_eval.py and it is your responsibility to ensure that your code can be imported and run without errors. Apart from correct functioning, there is one more qualifying criterion for inclusion in the ranking process – testing time should not exceed 10 minutes on Colab CPU. Any submission not satisfying this will get no marks for this assignment. Submission deadline is November 29, 11:55 PM. This assignment is worth 8.7% of the overall grade.

admin

Author admin

More posts by admin