STAT 412/612 Week 12: Homework

forcats  and lubridate

Instructions

YOUR NAME 2020-04-10

• Submit your R Markdown file and your PDF, knitted directly from R Markdown, on Blackboard. Only include the necessary code, not any extraneous code, to answer the questions.

• Learning outcomes:

Manipulate factors with forcats.

Manipulate dates with lubridate.

Question 1:  Capital Bikeshare Data

1. Load in the data containing trip information from the Capital Bikeshare program. Also load in the station information. Rename variables that have spaces in the names

trip data station data

Note: These data were originally from http://data.codefordc.org/group/transportation.

2. Parse  the  date-time  information  from  the  trip  data. Recall  the  times  are  recorded  in  the

America/New_York time zone, not the UTC time zone. Specify that in your parser.

3. Calculate the average number of trips for each weekday (Sunday, Monday, Tuesday . . . ) given the day has trips.  There are several days with no trips.

• Save the resulting days of week and corresponding average number of trips as a data frame called sumdf

and print it out.

• It should look like this:

5. In a stunning show of contempt, the IEEE Computer Society decided to add a new weekday called “Fooday” with abbreviation “Foo”. Fooday was decided to be the first day of the week (ahead of Sunday).

On the first Fooday ever, people used Capital Bikeshare in record numbers, yielding 15567 trips. Add Fooday as the first level to the wday variable in sumdf and add its average number of trips (now 15567 since there has only been one Fooday so far).

Hint: Create a new data frame that contains the Fooday trips and use bind_rows(). Your final data frame should look like this:

6. In another stunning show of contempt, the IEEE Computer Society decided to change the abbreviations from three letters to two letters. Change the levels of wday so that each day uses only two-letter abbreviations. Your final data frame should look like this:

7.In the stations data frame, it seems that installDate is populated by  the number of milliseconds   since January 1, 1970, 00:00:00 (in the America/New_York time zone). Parse this into a date-time and make a histogram of the install dates. It should look something like this: