5CCM242A / 6CCM242B

                      Statistical Modelling


                         Steven Gilmour  

                Department of Mathematics

                    King’s College London

                            4 March 2019

Very important: This coursework counts for 20% of the marks for this module. As such, it is an individual piece of work. You should use the techniques and skills you have learned in the course. Please do not ask the teaching assistants for help with this coursework. They have been told not to answer any questions relating to it and to report to me anyone who asks for help.

Deadline: 9.00 29 March 2019. (Note: there will be no help with technical prob- lems in Keats or Turnitin for at least 16 hours before this deadline. Submit the previous day, during normal working hours, to avoid any risk.)

The file FlightDelaysSM.csv, available on Keats, contains data on flights from airports in the USA for one month in 2015. The file contains the following variables:

schedtime the scheduled time of departure (using the 24 hour clock);

deptime the actual departure time;

distance the length of the flight (in miles)

flightnumber the flight number;

weather 0 = normal; 1 = severe;

dayweek the day of the week (1=Monday, . . ., 7=Sunday);

daymonth the day of the month.

Note that is this a comma separated file.  You  need to tell R this when using   the read.table command. For example, if you have saved the data file to a directory named MyR, the following command should work:

     Delays <- read.table(“MyR\FlightDelays.csv”,header=T,sep=”,”)

The aim of this project is to be able to predict the length of flight delays, using   a statistical model. Note that the data set does not contain the length of flight delays, but it can be calculated from the actual and scheduled departure times. Think carefully about how to do this.

Carry out a thorough analysis of these data, using whatever methods you believe to be suitable. Write a brief summary of the results.

Your report should be no more than 300 words in total and include no more than one plot. There is no fixed format for structuring the report, but it should include:

a brief description of the methods you have used and why;

a brief description of the results of your analysis and what they mean in practical terms;

an explanation how to predict the length of a flight delay, given information on all variables other than the actual departure time, illustrated by at least one example.

Please upload a single pdf file to the Turnitin link on the course page on Keats.