Skip to main content
留学咨询

辅导案例-STATS 2107

By May 15, 2020No Comments

STATS 2107 Statistical Modelling and Inference II Project Sharon Lee Semester 2 2019 Details • Due date: Friday 1st Nov 2019, 5pm. • Group submission (groups of up to four people) self-select yourselves into groups. • A single report is to be submitted for each group. • The report is to be typed in Rmarkdown. • You must submit two files: the Rmarkdown (Rmd) file containing the full analysis report, and the compiled pdf or html document. (Note that marks will be graded based on the Rmd file.) • Each person also completes Self and Peer Learning and Assessment Tool (SPLAT) to indicate the contribution completed by each team member. Data The dataset provided is a subset of the 1986 Obstetric Audit, which includes births at the Flinders Medical Centre, Bedford Park, South Australia, during 1986. Each year, patients of the Department of Obstetrics and Gynaecology at the Flinders Medical Centre is interviewed regarding a wide range of medical history and basic demographics. The results of this interview are recorded in the Obstetric Audit for the year. The provided dataset includes the following variables: Variable Label Details id ID number A four-digit unique identifier for each birth twin_id Twin ID Unique identifier for each twin pair (twins will have the same Twin ID) Twins Twin Yes=twins, No=singleton AgeMother Age of mother in years AgeFather Age of father in years Sex Gender of infant 1=male, 2=female WghtInf Weight of infant in grams Head Head circumference of infant in millimetres Length Length of infant (from crown to heel) in millimetres WghtMother Weight of mother before delivery in kilograms Gravidity Gravidity 1=first baby, no previous incomplete pregnancy; 2=first baby, previous incomplete pregnancy; 3=second or subsequent baby Gestation Length of gestation in weeks Marital Marital status 1=first marriage, 2=second or subsequent marriage, 3=unmarried Smoking_1stTrim Smoking during first trimester 0=no, 1=yes 1 Variable Label Details Smoking_2ndTrim Smoking during second trimester 0=no, 1=yes Labour Labour onset 1=born before arrival at hospital, 2=spontaneous, 3=induced, 4=elective caesarian section Insure Health insurance 1=private patient, 2=health service Goal Find a predictive model for the weight of infant (in singleton pregnancies only) by considering the following predictors: • age of mother, • age of father, • gender of infant, • weight of mother before delivery, • gravidity, • length of gestation, • any smoking during pregnancy [0=no, 1=yes] (Note: you will need to create this variable from the data given), and • marital status. Marks The project is worth 10% of the final mark of SMI. The breakdown is as follows Section Marks Introduction 5 Data description 5 Data filtering 5 Variable description 5 Model fitting and selection 25 Final model 10 Assumption checking 10 Prediction 20 Conclusion 10 Formatting 5 Total 100 Each member of the group will receive the final mark for the project unless the SPLAT indicates that indiviuals have not contributed to the project. In that case penalties will apply. Description of sections In the following subsections, an indication is given of what each section in the final report should contain as a minimum. 2 Introduction The problem is introduced with an outline of the steps involved in the analysis. Data description The data is described including identifying the subjects, the variables, which of the variables are predictors, and which are response variables. The number of subjects and variables must also be given. For each variable, it and its levels are explained in context. Data filtering Each step of the data filtering performed is described with illustratory code and its output. The reasons and e ect of the cleaning is described. For example, any excluded subjects should be noted and the numbers excluded given along with summary statistics both before and after cleaning. Variable description For each variable considered, there should be a section giving the type of variable, summary statistics and a plot to illustrate its distribution. Summary statistics are best given in a table and referred to in the text. A discussion of the distribution of each variable must be given. Model fitting and selection The model fitting process is described with the type of algorithm used, and the choice of heuristic discussed. The various models explored are compared. Final model The final model is given and the coe cients interpreted in context. Assumption checking The assumptions of the final model are checked with accompanying tables / figures to support the checking. Prediction Generate a plot of predicted weight of infant (with shaded 95% confidence interval bands) for the following variables: • weight of mother before delivery, and • length of gestation. Conclusion Summarise your analysis and findings in context. 3 Formatting • All figures and tables should be appropriately captioned and cross-referenced in the text. • Correct use of grammar and spelling. • You need a title page with – title, – authors, and – date. Code The R markdown file of the report (including all analysis codes) must be submitted along with the compiled PDF or HTML document. Bonus challenge (Twins) 50 marks Repeat the analysis investigating the e ect of the following predictors on the weight of infant, taking into account both singleton and twins: • age of mother, • gender of infant, • weight of mother before delivery, • gravidity, • length of gestation, and • any smoking during pregnancy. Note that the samples are no longer considered independent due to the inclusion of twns. Hint: Look into Linear mixed e ects models (LME) or Generalized estimating equations (GEE) for how to handle clustered (i.e. dependent) data. 4

admin

Author admin

More posts by admin