- May 15, 2020
FIT5141 Semester 2, 2019 1 FIT5141 Assignment 2 2019 Due Date Sunday 15th Sept Deployment of Dataset on MongoDB and R based simple analysis of Data Introduction This assignment is due by 11:55pm Sunday 15th Sept. It is worth 20% of the marks for your final assessment in this unit. A penalty of 10% (of the 20 Marks) per day, including each day of a weekend, will apply for late submission. This is an individual assignment and must be entirely your own work. Please note the section on plagiarism, cheating and collusion in this document. Specification Overview Implement a storage solution for dataset selected in Assignment 1 and in preparation for Assignment 3. Store dataset in database and perform simple data analytics and visualisations using Tableau and R. Must include analyses and visualizations that convey useful summaries of the data. Data storage solution should be implemented in MongoDB. A report must be written about the implementation and results of analyses. Details • MongoDB server setup options: o You can either set up a mongoDB server on your own computer and perform all operations there, or use the cloud server provided by us; o We will provide a MongoDB server in the cloud that each student can use to set up his or her own database. We will provide each student that needs it with their own MongoDB login and a separate database. • Database setup requirements: • The dataset must be stored in more than one collection (a collection is equivalent to a table in a relational database). o This means you will need to store descriptive information about the data in separate collection to the primary data (e.g. the meaning of the information in the Vic Roads dataset such as the units of measurement, the meaning of “density” and “flow”) and so forth. • Use the mongoimport command to import CSV or other supported data formats. E.g.: mongoimport -h 18.104.22.168 -d fit5141 -c vicroads –type CSV –headerline < 3003.csv • Connecting and analyzing data using R o As you will need to query mongoDB using R and therefore need to install the R package Rmongo from here: http://cran.r-project.org/web/packages/rmongodb/index.html. Refer to the R documentation for how to install a package. o Perform simple statistical reports in the form of tables and charts on the data as appropriate for the business question chosen in assignment 1, part B. § If you choose to change your dataset and/or business question between assignment 1 and 2, this is ok but of course, change your report here appropriately. FIT5141 Semester 2, 2019 2 § Please discuss the business question and how you might report descriptive summary statistics with your tutor. • Connect to MongoDB using Tableau (Optional for up to 5 bonus marks) o There are several packages available to provide an ODBC or other interface for Tableau to be able to access Mongo. o Using Tableau generate visualization of the data appropriate to the business question as described in the R section above. These can be similar summary statistics or related summary information to that provided in the R section. [20 marks] Assignment will be marked on: ◦ Quality of consideration of factors affecting implementation of storage solution ◦ Quality of projected performance of storage solution ◦ Completeness and seamlessness of database setup and R integration. ◦ Innovative approach to setting up the database, interfaces, and data analysis Resources • MongoDB download and installation instructions (if you are using your own PC) http://www.mongodb.org http://docs.mongodb.org/manual/core/introduction/ • MongoDB management GUI umongo - http://www.edgytech.com/umongo • The R interface for mongo rmonogo is a R package that can be installed directly through the Rstudio package installation function. Or manually from: http://cran.r-project.org/web/packages/rmongodb/index.html. Interviews Students will be interviewed for 15-20 minutes to confirm the work is their own and during which they will be required to demonstrate: • Setting up the MongoDB connection • Connection to MongoDB using R • Generation of simple statistics using R Additional Guidelines • The report part of the assignment should be up to 1000 words and should include o Description of how the implementation was conducted o Examples of reported statistics and charts. • Recommended font for the report body is Times New Roman, 12 point, single line spacing. Please consult the lecturer if you have any enquiries about any of the aforesaid points or requirements. Submission Requirements This assignment is due to be submitted by 11:55pm nd September 2018 PLEASE NOTE: submission of the report component of the assignment assignment requires only a single submission to Moodle through a Turnitin link. Turnitin FIT5141 Semester 2, 2019 3 Turnitin is a plagiarism detection system that is very effective in discovering and proving plagiarism and collusion. Submission can be made through the following. Your submission to Turnitin is your report in an MSWord file or PDF file that must be named with your student id number followed by _A1. For example, if your student id number is 12345678, then the file you submit should be named 12345678_A1.doc . Marks will be deducted for any non-compliance with any these submission requirements. Plagiarism, cheating and collusion Students should consult University materials on this matter at: http://www.monash.edu.au/students/policies/academic-integrity.html The following excerpt is from the aforementioned url: Plagiarism and cheating are regarded as very serious offences. In cases where cheating has been confirmed, students have been severely penalised, from losing all marks for an assignment, to facing disciplinary action at the Faculty level. While we would wish that all our students adhere to sound ethical conduct and honesty, I will ask you to acquaint yourself with the University Plagiarism policy and procedure (http://www.policy.monash.edu/policy-bank/academic/education/conduct/plagiarism-procedures.html) which applies to students detected plagiarising It is your responsibility to make yourself familiar with the contents of these documents.