辅导案例-CE807-Assignment 2

CE807 – Assignment 2 – Final Practical Text Analytics and Report School of Computer Science and Electronic Engineering – University of Essex Assignment Due at 11:59:59am on 21/04/2020 Electronic Submission URL: https://www1.essex.ac.uk/e-learning/tools/faser2/ Please also see your student handbook for rules regarding the late submission of assignments On Plagiarism The work you submit must be your own. Any material you use, whether it is from textbooks, classmates, the web or any other source must be acknowledged in your work. All submissions are fairly and transparently checked for plagiarism. Please make sure that you provide frequent citations. But also make sure that each sentence written is originally yours, i.e. the material is read, understood and the report is written using your own words and own language only. Do not copy and paste and rephrase copied text. There are many different forms of what is considered plagiarism. For example, based on the feedback from the SAO officer, many students were not aware that, e.g. copying entire paragraphs without clearly identifying them as quote etc. is a form of plagiarism etc. Thus, please check back with your scientific writing module, before you submit! Further note that also plainly reusing software code or merely slightly adapting existing software code and submitting as one’s own fulfils the matter of plagiarism. Cite any code that you reuse, too. In 2019, 20% of the submitted reports where plagiarised. There were also multiple cases of software code plagiarism. This number is too high and shall be 0% in 2020! MOTIVATION: The task of eXtreme Multi-Label Classification (XMLC) deals with the problem of assigning multiple labels to a data object. The specific challenge is that the classification of a data object is conducted in k few labels given a pool of hundreds or thousands of labels, i.e., k << n. XMLC is of tremendous practical importance and industries and organisations in various fields such as retailing, web-content recommender, scientific libraries, new providers and others are dealing with it. OBJECTIVE: After Assignment 1 focused on the theoretical aspects of text classification, the objective of this assignment is to get practical experience in designing, implementing, running and scientifically evaluating your own XML classifier. The dataset will be a collection of two very large-scale resources of scientific papers in economics and medicine. Each `document’ in this dataset represents a record from a large document corpus. You can also use further datasets of your choice. SUBMISSION, ASSESSMENT AND RULES  This assignment counts towards 75% of the overall mark for CE807.  The assignment is to be done individually.  Be sure to put your registration number as a comment at the top of all code and other files. Furthermore, the assessment is blind, i.e. do not put your name on any document or provide personally identifiable information.  The assignment must be submitted in a single zipped archive containing the following exactly three subfolders: CE807/Assignment2/ All files CE807/Assignment2/Task1 The report produced for Task 1. It is mandatory to include the source files (Word/LaTeX) and a PDF. CE807/Assignment2/Task2 The commented code written to perform the classification Task 2 (with comments describing the features and further information external to the original datasets you used, e.g. some entity graph or linguistic resource). CE807/Assignment2/Task3 The scientific report produced for Task 3. It is mandatory to include the source files (Word/LaTeX) and a PDF. Important Note: You are free to use any software you like for this assignment. Your software should run first on your laptop or in one of the CSEE labs. You start with using a small portion of the dataset. Subsequently, you scale the experiments as much as possible by using the HPC of the university. For details see: https://hpc.essex.ac.uk/ Furthermore, you are encouraged to explore and exploit additional datasets for XMLC. You can also use any additional data sources such as thesauri and lexical resources. Task Description: eXtreme Multi-Label Classification of Scientific Papers Multi-label classification is one of the standard tasks in text analytics. The objective of the assignment is to perform an eXtreme multi-label classification, short XMLC. In an XMLC setting, there are k many labels from a large pool of n labels to be assigned to the data objects. The classification task is extreme in two senses: First, the number of n labels is very large with hundreds or thousands of labels. Second, there are only very few k labels to assign, i. e. it holds k

辅导案例-CE807-Assignment 2

Related

Previous Post辅导案例-CSE 511-Assignment 1

Next Post辅导案例-CS 480

Author admin