MLSeiyuuAnalysis

Data Collection

Python is our language of choice for this project, used for both data collection and analysis.

At first, we sourced our list of usernames (AKA handles), from a community maintained list of seiyuus then compiled the data into a CSV file.

Twitter offers an API that we used to collect user information and tweets. What we learned that we can collect 200 tweets per 1 GET request. we used a thrid-party library named tweepy to simplify the process. Tweepy is a wrapper around Twitter's API, which also handles authentication and "paging" for us.

We can then write a script to read the csv file, then using the "GET statuses/user-timeine/" API endpoint, we would have a Tweet object with tweet contents and user information in a JSON format, with each request yielding 200 tweets. Unfortunately,due to API limitations imposed by Twitter, we cannot retrieve more than 3200 tweets without using their paid historical search API. However, not all users we researched would have 3200 tweets.

Then we can dump the data into a JSON file for later analysis. This is to prevent calling the API excessively. (this endpoint has a limit of 1500 requests per 15 minutes if autheticated using app auth.) It also made sure that all users' information are pulled in a relatively small timeframe.

You can view an example of our data on our Github Page.

Data Analysis

For data analysis, we made some scripts to help us with understanding the data. We leveraged libraries such as "collections" and "pandas" in the process. At first, we used "matplotlib" but the results weren't satisfactory. We didn't like how the graphs turned out to be, and the difficulty using the said library.

We looked for alternatives and found "Pygal" to be the perfect fit for our project. Pygal made interactive charts instead of static images like what matplotlib did.

For each users, we analysed their favourites accumulation and trends like hashtags and mentioned users.
We also analysed all user's trends like above, and also analysed various metrics such as follower counts, account creation date and following users.

You can see our analysis results for individual users here or overview of all users combined here.

Analysis Information

Analysis Results

From analysing the data, we found out that:

Most used tweet origin (relates to platform used) is "Twitter for iOS"
This took into account that each tweet would have it's own origin tag.

User with the most followers: @aimi_sound (Aimi Terakawa)

User mentioned the most: @yun_yk28 (Yui Kondou)

Most used hashtag: #ミリシタ
Followed by: #imas_ml

Most retweets are from @imasml_theater

Most accounts are created during 2010 and 2011

and more.

You can view the results from here.

Writeup

More Information on the Subject.

Technical Information

Data Collection

Data Analysis

Analysis Information

Analysis Results

What Did We Learn?