辅导案例-COMP9321-Assignment 3

2020/4/5 Assignment 3 | COMP9321 20T1 | WebCMS3 https://webcms3.cse.unsw.edu.au/COMP9321/20T1/resources/44201 1/4 Resources / Assignments (/COMP9321/20T1/resources/41975) / Week 8 (/COMP9321/20T1/resources/44199) / Assignment 3 Assignment 3 Introduction In this assignment you will be using the Movie dataset provided and the machine learning algorithm you have learned in this course in order to find out, knowing only things you could know before a film was released , what the rating and revenue of the film would be. the rational here is that your client is a movie theater that would like to decide for how long should they reserve the movie theater to show a movie when it is released. Datasets In this assignment you will be given two datasets training.csv (https://github.com/mysilver/COMP9321-Data- Services/raw/master/20t1/assign3/training.csv) and validation.csv (https://github.com/mysilver/COMP9321- Data-Services/raw/master/20t1/assign3/validation.csv) . You can use the training dataset (but not validation) for training machine learning models, and you can use validation dataset to evaluate your solutions and avoid over-fitting. Please Note: This assignment is on the scale of individual small project and hence specifications are deliberately left open to encourage students to submit innovative solutions. You can only use Scikit-learn to train your machine learning algorithm Your model will be evaluated against a third dataset (available for tutors, but not for students) You must submit your code and a report Part-I: Regression (10 Marks) In the first part of the assignment, you are asked to predict the “revenue” of movies based on the the information in the provided dataset. More specifically, you need to predict the revenue of a movie based on a subset (or all) of the following attributes (**make sure you DO NOT use rating** ): cast,crew,budget,genres,homepage,keywords,original_language,original_title,overview,production_companies, production_countries,release_date,runtime,spoken_languages,status,tagline Part-II: Classification (10 Marks) Using the same datasets, you must predict the rating of a movie based on a subset (or all) of the following attributes (**make sure you DO NOT use revenue** ): cast,crew,budget,genres,homepage,keywords,original_language,original_title,overview,production_companies, production_countries,release_date,runtime,spoken_languages,status,tagline Specification Make Submission Check Submission Collect Submission 2020/4/5 Assignment 3 | COMP9321 20T1 | WebCMS3 https://webcms3.cse.unsw.edu.au/COMP9321/20T1/resources/44201 2/4 Submission You must submit two files: A python script z{id}.py A report named z{id}.pdf Python Script and Expected Output files You code must be executed in CSE machines using the following command with three arguments: $ python3 z{id}.py path1 path2 path1 : indicates the path for the dataset which should be used for training the model (e.g., ~/training.csv) path2 : indicates the path for the dataset which should be used for reporting the performance of the trained model (e.g., ~/validation.csv); we may use different datasets for evaluation For example, the following command will train your models for the first part of the assignment and use the validation dataset to report the performance: $ python3 YOUR_ZID.py training.csv validation.csv Your program should create 4 files on the same directory as the script: z{id}.PART1.summary.csv z{id}.PART1.output.csv z{id}.PART2.summary.csv z{id}.PART2.output.csv For the the first part of the assignment: ” z{id}.PART1.summary.csv ” contains the evaluation metrics (MSR,correlation) for the model trained for the first part of the assignment. Use the given validation dataset to compute the metrics. The file should be formatted exactly as follow: zid,MSR,correlation YOUR_ZID,6.1,0.7 MSR : the mean_squared_error in the regression problem correlation : The Pearson correlation coefficient in the regression problem ” z{id}.PART1.output.csv ” stores the predicted revenues for all of the movies in the evaluation dataset (not training dataset) , and the file should be formatted exactly as follow: movie_id,predicted_revenue 1,7655555 2,75875765 … For the the second part of the assignment: ” z{id}.PART2.summary.csv ” contains the evaluation metrics (average_precision,average_recall,accuracy) for the model trained for the second part of the assignment. Use the given validation dataset to compute the metrics. The file should be formatted exactly as follow: 2020/4/5 Assignment 3 | COMP9321 20T1 | WebCMS3 https://webcms3.cse.unsw.edu.au/COMP9321/20T1/resources/44201 3/4 zid,average_precision,average_recall,accuracy YOUR_ZID,6.1,0.7,89 average_precision : the average precision for all classes in the classification problem average_recall : the average recall for all classes in the classification problem ” z{id}.PART2.output.csv ” stores the predicted ratings for all of the movies in the evaluation dataset (not training dataset) and it should be formatted exactly as follow: movie_id,predicted_rating 1,1 2,4 … Marking Criteria For EACH of the parts, you will be marked based on: (3 marks) You code must run and perform the designated tasks on CSE machines without problems and create the expected files. (3 marks) How well your model (trained on the training dataset) perform in the test dataset (2 marks) You must correctly calculate the evaluation metrics (e.g., average_precision) in the output files (e.g., z{id}.PART2.summary.csv) (2 marks) One page report containing: Performance of your model on the validation dataset and how you evaluated the performance and improved it (e.g., relying on feature selection, switching from one machine leanring model to a more suitable one,…etc.) Problems you have faced in predicting (e.g., JOSN formated columns, keywords, missing data) and how you tried to solve the problem. Plagiarism This is an individual assignment . The work you submit must be your own work. Submission of work partially or completely derived from any other person or jointly written with any other person is not permitted. The penalties for such offence may include negative marks, automatic failure of the course and possibly other academic discipline. Assignment submissions will be checked using plagirisim derection tools for both code and the report and then the submission will be examined manually. Do not provide or show your assignment work to any other person – apart from the teaching staff of this course. If you knowingly provide or show your assignment work to another person for any reason, and work derived from it is submitted, you may be penalized, even if the work was submitted without your knowledge or consent. Pay attention that is also your duty to protect your code artifacts . if you are using any online solution to store your code artifacts (e.g., GitHub) then make sure to keep the repository private and do not share access to anyone. Reminder: Plagiarism is defined as (https://student.unsw.edu.au/plagiarism) using the words or ideas of others and presenting them as your own. UNSW and CSE treat plagiarism as academic misconduct, which means that it carries penalties as severe as being excluded from further study at UNSW. There are several on-line sources to help you understand what plagiarism is and how it is dealt with at UNSW: Plagiarism and Academic Integrity (https://student.unsw.edu.au/plagiarism) UNSW Plagiarism Procedure (https://www.gs.unsw.edu.au/policy/documents/plagiarismprocedure.pdf) 2020/4/5 Assignment 3 | COMP9321 20T1 | WebCMS3 https://webcms3.cse.unsw.edu.au/COMP9321/20T1/resources/44201 4/4 Resource created 2 days ago (Friday 03 April 2020, 12:06:06 PM), last modified 5 minutes ago (Sunday 05 April 2020, 08:59:23 PM). Make sure that you read and understand these. Ignorance is not accepted as an excuse for plagiarism. In particular, you are also responsible for ensuring that your assignment files are not accessible by anyone but you by setting the correct permissions in your CSE directory and code repository, if using one (e.g., Github and similar). Note also that plagiarism includes
paying or asking another person to do a piece of work for you and then submitting it as your own work. UNSW has an ongoing commitment to fostering a culture of learning informed by academic integrity. All UNSW staff and students have a responsibility to adhere to this principle of academic integrity. Plagiarism undermines academic integrity and is not tolerated at UNSW. Comments There are no comments yet.   (/COMP9321/20T1/forums/search?forum_choice=resource/44201)  (/COMP9321/20T1/forums/resource/44201) Add a comment

辅导案例-COMP9321-Assignment 3

Related

Previous Post辅导案例-STAD29

Next Post辅导案例-INFS1200/7900-Assignment 2

Author admin