Wednesday, September 27, 2017

Word Cloud with R and Twitter


I'm teaching R in the morning to my Advanced Programming class. So I thought let's do some exploration with Twitter and show the power of R and its packages! In this example which I've used from another source (links at the bottom) I plot word clouds by searching for trends on twitter using hashtags. The tutorial I followed led me into a number of problems which I had to solve eventually to get my word clouds!

Step 1: Installation using github instead of CRAN 

https://cran.r-project.org/web/packages/twitteR/README.html

In this step we use the installation through github method as using install.packages("twitteR") was causing authorization errors with Twitter. The following paragraph is copied from the README.html paged referenced above. 

"twitteR is an R package which provides access to the Twitter API. Most functionality of the API is supported, with a bias towards API calls that are more useful in data analysis as opposed to daily interaction.

Getting Started

  • Please read the user vignette, which admittedly can get a bit out of date
  • Create a Twitter application at http://dev.twitter.com. Make sure to give the app read, write and direct message authority.
  • Take note of the following values from the Twitter app page: "API key", "API secret", "Access token", and "Access token secret".
  • You can use the CRAN version (stable) via the standard install.packages("twitteR") or use the github version. To do the latter:
  • install.packages(c("devtools", "rjson", "bit64", "httr"))
  • Make sure to restart your R session at this point
  • library(devtools)
  • install_github("geoffjentry/twitteR")
  • At this point you should have twitteR installed and can proceed:
  • library(twitteR)
  • setup_twitter_oauth("API key", "API secret")
    • The API key and API secret are from the Twitter app page above. This will lead you through httr's OAuth authentication process. I recommend you look at the man page for Tokenin httr for an explanation of how it handles caching.
  • You should be ready to go!
  • If you have any questions or issues, check out the mailing list "

Step2: Get your Twitter API key and API secret

Create a new App

Add application name, some dummy URL agree to the terms and create your application.


Once it is created you can find the API key and API secret in the menu in blue. You can use these in your R script to authorize this app to be used from your account.


Step 3: Install required packages
#install the necessary packages
install.packages("twitteR") (This is already installed in Step 1 so ignore it)
install.packages("wordcloud")
install.packages("tm")

Step 4: Load the required libraries
library("twitteR")
library("wordcloud")
library("tm")

Step 5: Run the following lines

#necessary file for Windows
download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")
#to get your consumerKey and consumerSecret see the twitteR documentation for instructions
consumer_key <- 'your key'
consumer_secret <- 'your secret'
access_token <- 'your access token'
access_secret <- 'your access secret'
setup_twitter_oauth(consumer_key,
                    consumer_secret,
                    access_token,
                    access_secret)


Step 6: Start your analysis
You can experiment by replacing the hashtag in the following function. n=500 means to bring the first 500 tweets from Twitter with hashtag #MUFC


r_stats <- searchTwitter("#MUFC", n=500)
#should get 1500
length(r_stats)
#[1] 1500
#save text
r_stats_text <- sapply(r_stats, function(x) x$getText())


Step 7: Fixing the emoticons problem
twitteR package seems a little old so I was having trouble with emoticons in the tweets that were being returned with the searchTwitter querry. Some functions of R were not able to handle the encoding of the emoticons so I found a solution on 

The solution is not neat, as it simply converts these emoticons into HEX values which the twitteR functions can handle, although they are being printed in the word cloud. So need more work on that. It would be interesting to plot an 'emoticloud' instead of a word cloud using twitter hashtags. Maybe an assignment for the students 😈! Notice the use of "emoticon" in discussing a problem with emoticons !

#Fixing the emoticons problem
r_stats_text <- data.frame(text = iconv(r_stats_text, "latin1", "ASCII", "byte"), 
                      stringsAsFactors = FALSE)

Step 8: Clean up and Create the Word Cloud
#create corpus
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))

#clean up
r_stats_text_corpus <- tm_map(r_stats_text_corpus, content_transformer(tolower) )
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, function(x)removeWords(x,stopwords()))
wordcloud(r_stats_text_corpus)


Here is the result of my query. You can see some garbage as the emoticons are being showed as hex numbers. That's just a small workaround. Need to explore more to completely remove them. 


Another Result. Searching with the hashtag #muhammad reveals Love and prophet as the two most used words!


This is a lot of fun! Going to experiment some more later! Now gtg as I have a lecture 9am in the morning and its 4:30 am right now!

References:




Geting started with R programming


I'm working on MAC so my tutorial will follow the steps I followed on my Macbook
Step 1: Download R

Step 2: Download RStudio
We need to download the RStudio Desktop version as we will be using it on our desktop like Python. 

Step 3: Try running R
Click on the R icon in the applications list

You will find the following window. This looks just like the Python Console if you're familiar with Python. So this is like the Python console window and RStudio would be like IDLE.
The first interesting thing to do is to rum some demos to see how everything runs. Since R is a functional language we can pass functions as arguments to other functions. So we can run the demo() function by calling the graphics and colors demos using commands demo(graphics) and demo(colors). 





Once playing with this we can test out RStudio which is like IDLE in python.



We have successfully installed R and RStudio and now we can start playing around with R. For that I'll be doing some more blog posts. Here is a nice link to get started with R