1

Bentley University, Spring 2019

CS 230

Introduction to Programming with Python

Homework 6 – Twitter Analytics

A hashtag is any word in a Tweet that begins with a # symbol, a mention is any word in a Tweet that begins with an @ symbol. In this assignment, you will use the pandas, numpy, and matplotlib modules to analyze hashtags or mentions on Twitter.

Export Your Tweets

You can use the sample Twitter data file supplied containing Tweets from the cisSandbox account or export your own Tweets to analyze. Follow these steps to export your own Tweets:Sign in to your Twitter account.

1.Click your profile icon at the top right corner of the page.

2. Click Analytics from the dropdown menu below your profile icon.

3. On the Analytics page, click Tweets at the top of the page

2
3

4. Select a date range of activity (choose a large enough date range so you have Tweets containing several hashtags or mentions),

4

5. Click the Export Data button to export your Twitter data.

6. Save the file as tweets.csv in the folder with the python program you are about to write.

Analyze Your Tweets

1. Follow the example given in class to load Twitter data from the tweets.csv file into a pandas DataFrame, removing irrelevant columns.

2. Allow the user to specify whether to analyze hashtags or mentions by typing h or m.

3. Analyze the Tweet text column of the data frame to create a dictionary of hashtags/mentions and corresponding frequencies (number of times each appears). For simplicity, the DataFrame will have two columns: Hashtag and Frequency, regardless of whether you are analyzing hashtags or mentions. As you process the data, convert all words to lower case (so that #BENTLEY and #bentley will be considered the same word.

4. Create a pandas DataFrame containing each hashtag and its corresponding frequency.

5. Print the DataFrame.

6. Sort the DataFrame alphabetically by hashtag. Print the sorted DataFrame.

7. Sort the DataFrame in decreasing order by frequency. Print the sorted DataFrame.

8. Create a horizontal bar chart that plots each hashtag and its frequency. Be sure to set x and y labels for the axes, ticks for values and hashtags, and a title for your plot.

See https://matplotlib.org/gallery/lines_bars_and_markers/barh.html or https://medium.com/python-pandemonium/data-visualization-in-python-bar-graph-in- matplotlib-f1738602e9c4 for examples of how to create a horizontal bar chart.

5
6
7

Grading:

1. This assignment will be worth 8 percent of your final grade.

2. The input and output of your program need to appear in exactly the order that is shown in the sample interaction above.

3. Your program should compile without syntax errors to receive any credit. If a part of your program is working, you will receive partial credit, but only if the program compiles without syntax errors. As you program, I highly recommend that you save intermediate versions of the .py file each time you get a piece of the program running. This way you can always have something to submit that works on at least some of the requirements.

Points:

This problem has several tasks. Complete as many as you can, or all of them for full credit.

8

Submission:

· Name your script as tweets.py and upload it onto the Blackboard by the deadline.