Skip to main content
留学咨询

辅导案例-STAT6083

By May 15, 2020No Comments

STAT6083 Generalised Linear Models Coursework 1. General Information Your solution to the following two questions is worth 50% of the overall marks for this module. The submission deadline is 16:00 on Thursday 12th December 2019. Your coursework must be submitted electronically via the STAT6083 Blackboard website using the Turnitin link (in the Assignments folder, select View/Complete to submit your report). If you wish to overwrite a submitted report, you can do it by uploading a new version of your document, as long as you do it within the deadline. Only the last document uploaded by the deadline will be assessed. It is strongly advised that you convert your document to pdf before submitting it, as some of the properties of your document may be lost otherwise, and tables and plots may look different in the submitted version. Note that your file has to be smaller than 40MB. No paper copy of your coursework is required. There is not word count limit for this assignment. However, the main body of your report should not exceed 8 pages (minimum margins 2.5cm for left, right, top and bottom; minimum line space 1.15 and minimum font size Arial 12 or equivalent). Material beyond 8 pages will not be marked. An appendix of no more than 4 pages can be used to present relevant output for question 1. The syntax used to answer question 2 (2) should be included in an additional appendix (no page limit). Carefully follow the guidance notes at the end of this document. Failing to follow these notes will result in marks being deducted. It is the policy of the Department of Social Statistics and Demography that coursework is anonymous, therefore please do not put your name on any part of your report. Students are encouraged to discuss and exchange ideas, since this is an important part of the educational process. However, it is not acceptable that you read and gain ideas for your coursework from another student’s finished work. If copying between pieces of coursework occurs, it will be penalised after discussion with the students. Copying includes using another student’s computer program (output or graphics), or copying of materials found on the web. The software Turnitin will check if there are plagiarism issues in your assignment. You will be severely penalised if plagiarism and cheating issues are found. More information on the School and University policy on plagiarism and cheating can be found in your Programme Booklet and in the module outline. Extensions to the deadline for this assignment will be given only in exceptional circumstances such as illness or other serious personal difficulties. Please note that computer disk failure or loss of work due to failure to back up computer files are not sufficient grounds for an extension. The procedure for obtaining an extension and the penalty for late submission are outlined on section 3d) of the module outline. Individual feedback will be made available through Turnitin on Blackboard on January 9 2020. Generic feedback will be provided in the revision class that will take place on January 10 2020. Question 1 [60%] Household income is a key variable in socio economic studies. Amongst others, it allows us to build measures of poverty and inequality and to formulate strategies aiming to their reduction. Unfortunately, collecting high quality data is particularly difficult. Non-response rates for this variable are typically higher than for other socio economic variables, such as expenditure. Furthermore, even in the cases where a response is obtained, individuals tend to under report their income, particularly those in the right tail of the distribution. Given this scenario, a special survey has been designed to collect high quality data on income, expenditure and other socio economic variables. You have been given the task of using these data to develop a model to predict the income of a household. A description of the available variables is presented below. The data (1 200 observations) is included in the file income.txt. Id: Household identifier Income: Gross weekly average household income (GBP). Expenditure: Total household expenditure (GBP). [Includes food, clothing, transport, housing, education,…] Type.inc: Type of income: 1= Earned Income, 2= Other Income. House.ten: Household tenure: 1= Public rented, 2= Private rented, 3= Owned. Sex.hh: Sex of the household head: 1= Male, 2= Female. Lab.force: Type of income: 1= Full time working, 2= Part time working, 3= Unemployed, 4= Economically inactive. Hh.size: Household size: 1= 1 person, 2= 2 persons, 3= 3 persons, 4= 4 persons, 5= 5 persons or more. Hh.adults: Number of adults in the Household: 1= 1 adult, 2= 2 adults, 3= 3 adults, 4= 4 adults or more. TASKS (1) Present informative univariate and multivariate descriptive analysis of your dataset. Justify why the descriptive analyses you present is relevant. [8 marks] (2) Propose a suitable linear regression model to describe the relationship between Income and Expenditure. Document your model building process and use diagnostic tools to assess the fit of your model. Describe the relationship between those two variables (you may use plots, statements, predicted values…). Model building and diagnostics [7 Marks] Interpretation [5 Marks] (3) Considering all other variables in the dataset, propose a suitable regression model for Income. Document your model building process and use diagnostic tools to assess the fit of your model. Describe the relationship between the variables in your model (you may use plots, statements, predicted values…). Model building and diagnostics [20 Marks] Interpretation [12 Marks] (4) Summarize your findings of your data analysis for a non-specialised audience (1 paragraph maximum). [8 Marks] Question 2 [35%] (1) The Yates correction for continuity is a well-known, albeit somewhat controversial, modification to the Pearson chi-square test, aiming to prevent the overestimation of statistical significance in the cases of small sample sizes. Provide a brief literature review of the Yates Continuity correction and the controversy around its use (300 words maximum). [10 Marks] (2) Comparison of procedures for independence testing via simulation. Consider a contingency table that contains data on two variables. Y, the variable of interest, is in the columns and takes values 1 (success) and 0 (failure). X is in the rows and is a categorical variable that takes values A and B. 50% of the observations are allocated to each category of X. Randomly generate a sample table with size n=100 assuming P(Y=1|X=A) = 0.3 and independence between X and Y. Use the sample to assess whether X and Y are statistically independent at 5% of significance using: a. The Pearson chi-square test (without Yates correction for continuity) b. The Pearson chi-square test (with Yates correction for continuity) c. A 95% confidence interval for the Odds ratio, using the approximation presented in class. Repeat this procedure B=10 000 times and use your results to obtain an empirical estimate of Type I error of each procedure. If needed, remove from your analysis any samples with 0 cell counts. Repeat the procedure above for n = 30, 500, 1000, 10 000 and 100 000. Present your estimates of the Type I error for each procedure and discuss your findings in relation to your literature review. In order to get marks on this section, you must submit your R syntax in appendix. Error Type I estimates [10 Marks] Discussion [15 Marks] General presentation of the report [5 marks] Guidance Notes Structure your report according to the different questions, starting by question 1. Use separate, appropriately numbered subsections for each task. Please be concise, answer the question you are asked to, justify the analysis tools you use for answering each question and avoid using analysis that is not relevant. You MUST use R for building, estimating and selecting your models. A copy of your syntax i
n R is required only for Question 2 (2). Although no marks are allocated to the syntax as such, please notice that your syntax for this section should be executable. You are allowed to discuss your general analytic strategy with your fellow students, but you must do your own analyses as well as interpret and write up the results yourself. Notice that there are rarely unique ‘right’ answers for any of the questions. However, an appropriate model should take into account all relevant relationships between the variables involved and satisfy (reasonably) the model assumptions.

admin

Author admin

More posts by admin