Page 1 of 4 National College of Ireland BSHBISNCI4_BSHCDA4_BSHTM4, BSHBISE15NCI4_BSHCEDA4_HDSDASEP2, HDSDA_SEP1, HDSDA_SEPOL_YR1 Terminal Assessment Based Assignment [50% of Module Marks] Release Date 7th December 2020 Submission Deadline: 12:00 Noon 8th January 2021 ___________________________________________________________________ Business Data Analysis Dr. Ian McLoughlin Dr Shazia Afzal Dr Stephen O’Shaughnessy John Kelly Jorge Basilio Brian Buckley David O’Dwyer This assignment counts as 50% of the overall module grading, the Continuous Assessment completed in November covers the other 50%. Excel software and the Excel Data Analysis tool-pack should be used where indicated. Excel may also be used for any manual calculations needed. Where relevant, sufficient evidence of calculation should be given to clearly show the calculation method used. Simple summary answers with/without formulae alone are not sufficient, the calculation process must be shown fully. “Manual” calculation steps performed in Excel (and copied into Word) are acceptable. It is the student’s own responsibility to ensure the final submission document is sufficiently legible. The submission should be a single ‘Word’ document for uploading to the Moodle submission link with images of Excel spreadsheets cut and pasted into the document as required. The submission document’s filename should contain the Student ID number. This is an individual personal assignment, co-operation or collaboration among students is strictly not allowed and may result in disqualification. Students may be asked to outline/explain in person the reason for any approach taken or solution provided. Attachments: Excel data generator spreadsheet separate file (required). Data provided is not actual data. Disregard and omit entirely any zero values generated. Statistical tables, formulae or reference material should be individually sourced as required. Page 2 of 4 1. In the context of the current COVID-19 Coronavirus Pandemic, choose some aspect, of your own choice of the development and progression of the virus, and how it is being managed. Describe and discuss how you might use an analysis based on inferential statistics to examine your chosen topic. You should outline any research question or tests of hypotheses that would be relevant to your consideration. Note: This is intended as a hypothetical discussion exercise only based on your own personal insights of the concepts involved. It is not a requirement to source or use actual data, provide actual factual examples or reach any conclusions or outcomes to your analysis. References should be provided to any sources of information used – using the Harvard referencing style Word count guideline: 300 – 500 (not including quotations, tables, graphics) (Total 20 marks) 2. A concrete block manufacturer produces blocks at four different plants. The Quality Control Engineer wants to assess if all four plants are producing blocks of a similar strength and takes a random sample of blocks from each plant. She records the results of her measurements of the block strengths in a table. Generate a dataset of four columns of data from the data generator spreadsheet provided using your own student number. Carry out an appropriate statistical test to compare the four sets of data. Outline and explain the choice of test you use, the research questions/hypotheses and level of significance. Formally report the result of your test and your overall conclusion. (Total 20 marks) 3. Researchers want to find out if Cyclists and Buses are significantly different in relation to how long it takes them to travel through the city. A group of cyclists and a group of buses were randomly selected at the same point on their route into the city centre and timed using the traffic management CCTV system to establish the duration of their travel to another set point in the city centre. The results of the travel times of both groups were measured in seconds. Generate two datasets from the data generator spreadsheet provided using your own student number. Carry out an appropriate statistical test to compare the two sets of data. Outline and explain the choice of test you use, the research questions/hypotheses and level of significance. Formally report the result of your test and your overall conclusion. (Total 20 marks) Page 3 of 4 4. The actual results of the 2019 UK General Election were as follows: Using your own student number, generate a table from the data generator spreadsheet provided to indicate the survey data from a (hypothetical) survey poll carried out in advance of the election by a survey company. The survey data indicates the proportions of the representation of different political parties expected in the election results. (i) Specify the null and alternate hypotheses to test if a significant difference exists in the observed and expected breakdown of UK political party representation in the 2019 general election (the expected figures being based on the (hypothetical) survey poll data) (ii) Calculate the test statistic to measure the discrepancy (if any) between the observed and the expected results. (iii) Interpret and explain your results. (Total 20 marks) 5. a) Using your own student number generate a table from the data generator spreadsheet provided to indicate the number of electricity outages to residential homes over the winter period in a large rural area as recorded by the electricity network operator and the appropriate forecast parameters you should use for your own analysis. Assume that you have been tasked with making a forecast of the number of outages for the winter 2020/2021 based on the figures and required parameters indicated. Calculate each of the following: i. A simple moving average to estimate the number of outages for the winter of 2020/2021. ii. A weighted moving average to estimate the number of outages for the winter of 2020/2021. iii. Using Exponential Smoothing, an estimate of the number of outages for the winter of 2020/2021. iv. Interpret and briefly explain your results. (10 marks) 5 b) Identify online a dataset of your own choice associated with any variable (independent variable) and another dataset for a possibly associated variable (dependent variable). Each dataset should have a minimum of 36 datapoints. Using the Simple Linear Regression model and showing in full how your calculations are arrived at, calculate the following for your chosen datasets: i.The correlation coefficient (r). ii.The slope of a line of best-fit for this data. iii.The y-intercept. iv. One example of the value of y (dependent variable) for a value of x (independent variable) of your own choice. Generate in Excel a scatter plot of the two variables with the trendline equation and R-squared value shown on the chart. What can you tell about the relationship between these two variables? (10 marks) (Total 20 marks) Page 4 of 4 Marking All calculations and supporting text, such as hypothesis statements and reports, must be clearly shown. General Grading Scheme (Questions 2 to 5) Research questions/Hypotheses: 10% Justification of level of significance (α value): 5% Justification of appropriate statistical test: 5% Calculation of test statistic: 30% Formal report and presentation of result: 30% Overall conclusions: 20% Question 1: Answers will vary but there should be good evidence of a well-considered realistic approach, some use of graphics and diagrams as necessary, indication of personal consideration, research, approach and application.