Skip to main content
留学咨询

辅导案例-MSDS 16

By May 15, 2020No Comments

MSDS 16:954:597 Data Wrangling and Husbandry Final Project Instructions Spring 2020 1 Project Instructions For the final project for the course, your assignment is essentially to wrangle some data and to show off your skills. I am intentionally not specific about how you do so, but you have the weekly assignments as models. Think of the project as the equivalent of chaining together multiple weeks of assignments: you should bring data into R, clean it, tidy it, perhaps create new variables, perhaps summarize your data, and report on it with tables and figures. However, there are some required elements: • You must get your data from at least two distinct sources, at least one of which must be at least somewhat difficult to work with (requires scraping or cleaning). • You must use Git and Github to manage your project. If you have not done so already, please create an account. • All of your code and the R Markdown file should run in its own directory, without any additional files or code. • Every code chunk must be labeled. • You must include a step where you save a tidy version of (perhaps just some of) your data as a csv file. The idea is that the csv file would be an easy place for someone else to start from. • Your report, generated from an R Markdown file, should be as good looking and well formatted as you can make it — that includes tables and figures. Do not use echo = TRUE except as truly needed. • We have not done statistical analyses more sophisticated than correlation and linear regression in this course and there is no need for more advanced analyses in your report. You can do so if you wish, however. • If some parts of your project are relatively easy, you should balance that out by going into more depth in other aspects. • Your report should explain the steps you’ve taken and why — I do not want to see just a collection of tables and figures. Feel free to describe approaches that didn’t work or were more troublesome than expected. • I expect that you will discuss this project with others, but please avoid using datasets in common (I realize that might still happen by coincidence). All of the work submitted must be your own. Be sure to credit the sources of your data and any other material — it is better to over-credit than to under-credit. If you have any questions about properly crediting others’ work, just see me about it. page 1 of 2 Data Wrangling and Husbandry April 2, 2020 2 Presentation or Written Project Students will be given the following two options for their final project. (a) Give a 5-10 minute presentation of your project during our last class on Mon- day, May 4 (think 5-10 slides). Besides the presentation, you will also turn in your slides and other components required for the project, including a 5-page report of your project. Students who give a presentation will have until the end of that week (May 8) to turn in their project. Because everything is now virtual, be sure you have a working webcam with microphone. (b) Produce a 10-page formal report. Students who hand in a formal report will turn in their project at the time of the last class (May 4). In any case, focus on why you were interested in the datasets, some of the issues in wrangling it, and a few interesting figures or tables. While keeping in mind that what was time-consuming for you may not be interesting for others, remember that the course has emphasized mechanics and that your classmates may very well be interested in, say, what regular expression you used to reformat a particular column. 3 Procedures and Dates Submit (via Canvas) a short description of your data and plans for it by Tuesday, April 7. I will also ask for your preference for a presentation versus report as part of that “assignment”. The description should include links to your data sources. There is no grade associated with this part. You will submit your final project (again, via Canvas) by giving the URL to clone your GitHub repository. Also submit any API keys required using the format api.key.. Your final project will be graded holistically, but I will be looking at these elements: • That you have demonstrated your ability to use R to accomplish your tasks. • That your code is easy to understand with supporting comments. • That your report is well-written (i.e., clear and concise) with well-presented tables and figures. page 2 of 2

admin

Author admin

More posts by admin