Skip to main content

代写作业:代写统计 r代写GU4206-GR5206北美作业代写

By May 15, 2020No Comments

GU4206-GR5206Name and UNI 3/02/2018The STAT GU4206/GR5206 Spring 2018 Midterm is open notes, open book(s), open computer and online resources are allowed. Students are not allowed to communicate with any other people regarding the exam with the exception of the instructor (Gabriel Young) and TA (Fan Gao). This includes emailing fellow students, using WeChat and other similar forms of communication. If there is any suspicion of one or more
students cheating, further investigation will take place. If students
do not follow the guidelines, they will receive a zero on the exam and
potentially face more severe consequences. The exam will be posted on Canvas at 10:05AM. Students are required to submit both the .pdf and .Rmd files on Canvas (or .html if you must) by 12:40PM. Late exams will not be accepted. If for some reason you are unable to upload the completed exam on Canvas by 12:40PM, then immediately email markdown file to the course TA (fg2425). Part 1 (CDC Cancer Data – Subsetting and Plotting)Consider the following dataset BYSITE.TXT taken directly from the Center of Disease Control’s website. This dataset describes incidence and mortality crude rates of several types of cancer over time and also includes demographic variables such as RACE and SEX. The variables of interest in this exercise are: YEAR, RACE, SITE, EVENT_TYPE, and CRUDE_RATE.cancer  <-  read.table(“BYSITE.TXT”,sep  =  “|”,header=T,na.strings=c(“~”,”.”))dim(cancer)Load in the BYSITE.TXT dataset. Also look at the levels of the variable RACE.## [1] 44982 13 levels(cancer$RACE) ##  [1] “All Races” “American Indian/Alaska Native” ##  [3]  “Asian/Pacific  Islander” “Black”## [5] “Hispanic” “White” Problem 1.1Create a new dataframe named Prostate that includes only the rows for prostate cancer. Check that theProstate dataframe has 408 rows.#levels(cancer$SITE) # code goes here Problem 1.2Using the Prostate dataframe from Problem 1.1, compute the average incidence crude rate for each level of RACE. To accomplish this task, use the appropriate function from the apply family. Note: first extract the rows that correspond to EVENT_TYPE equals Incidence. Then use the appropriate function from the apply family with continuous variable CRUDE_RATE.  #levels(cancer$EVENT_TYPE) # code goes hereProblem 1.3#levels(cancer$YEAR) # code goes hereRefine the Prostate dataframe by removing rows corresponding to YEAR level 2010-2014 and removing rows corresponding to RACE level All Races. After removing the rows, convert YEAR into a numeric variable. Check that the new Prostate dataframe has 320 rows. Problem 1.4Create a new variable in the refined Prostate dataframe named RaceNew that defines three race levels: (1) white, (2) black, and (3) other. Construct a base-R plot that shows the incidence crude rate (not mortality) as a function of time (YEAR). Split the scatterplot by RaceNew. Make sure to include a legend and label the graphic appropriately. # code goes here  Part 2 (Basic Web Scraping)Problem 2.1setwd(“~/Desktop/Data”)SP500 <- readLines(“SP500.html”) head(SP500)Open up the SP500.html file to get an idea of what the data table looks like. This website shows the SP500 monthly average closing price for every year from 1871 to 2018. Use regular expressions and the appropriate character-data functions to scrape a “nice” dataset out of the html code. Your final dataframe should have two variables: (1) the variable Time, which ranges from 1871 to 2018; (2) the variable Price which are the corresponding SP500 price values for each year. Name the final dataframe SP500.df and display both the head and the tail of this scrapped dataset.## [1] “”## [2] “”## [3] “


Author admin

More posts by admin