In this R tutorial, we will determine the most common Halloween costumes names by using a list of over 5000 Halloween costumes submitted by people. The costumes will be classified by text analysis and visualize the common names using Wordcloud.
Install and Load Packages
Below are the packages and libraries that we will need to load to complete this tutorial.
Input:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | install.packages("tm") install.packages("wordcloud") install.packages("NLP") install.packages("RColorBrewer") install.packages("e1071") install.packages("SnowballC") install.packages("gmodels") library(tm) library(wordcloud) library(NLP) library(RColorBrewer) library(e1071) library(SnowballC) library(gmodels) |
Download and Load the Halloween Costume Names Dataset
Since we will be using the Halloween costume names dataset, you will need to download this dataset. This dataset is already packaged and available for an easy download from the dataset page or directly from here Halloween Costume Name Dataset – halloween_costume_names.csv
Input:
1 | halloween_costume_names <- read.csv("halloween_costume_names.csv", stringsAsFactors = FALSE) |
View the Halloween Costume Names Dataset
head() function
In order to have an idea of what data is being processed, we can use the head() function to view sample data of the costume names and we will print the first 20 rows.
Input:
1 | head(halloween_costume_names, 20) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | Name 1 Anglerfish 2 Mad scientist 3 Darth Vader 4 Page of Swords 5 Lego Minifigure 6 Spider 7 Person being eaten by shark 8 Scary princess 9 Cat from Cats 10 Quantum Mechanic 11 Cardboard box robot 12 Freudian Slip 13 Wraith 14 Vampire 15 Ghost 16 Teenage Porn Star 17 Funky Chicken 18 pumpkin 19 sexy nurse 20 sexy cop |
str() function
An alternative way to print sample data is using the str() function. The str() command displays the internal structure of an R object. This function is an alternative to summary(). When using the str() function, only one line for each basic structure will be displayed.
Input:
1 | str(halloween_costume_names) |
Output:
1 2 | 'data.frame': 4991 obs. of 1 variable: $ Name: chr "Anglerfish" "Mad scientist" "Darth Vader" "Page of Swords" ... |
summary() function
Another function to print data from the sms_raw file is using the summary() function. The summary() function is a basic function that issued to produce the result summary of various model functions.
Input:
1 | summary(halloween_costume_names) |
Output:
1 2 3 4 | Name Length:4991 Class :character Mode :character |
Cleaning the Halloween Costume Names
Below are sample outputs that are used to clean the raw data file for the costumes. A few topics in which are used to clean the file are listed below:
What will be removed from the costume names?
- remove words/stop words
- remove white spaces
- remove punctuation
- returning words to root form
Build the Halloween Costume Names Wordcloud
Below are the Wordclouds used for the above cleaning of the Halloween costume names. Below are a series of tasks that must be completed to clean and validate the Halloween costume names.
- transform letters to lowercase
- remove numbers
- remove stop words
- remove all punctuation
- remove stem words in a text document using Porter’s stemming algorithm.
- remove white spaces
Input:
1 2 3 4 5 6 | halloween_costume_names <- VCorpus(VectorSource(halloween_costume_names$Name)) halloween_costume_names_corpus_clean <- tm_map(halloween_costume_names, content_transformer(tolower)) halloween_costume_names_corpus_clean <- tm_map(halloween_costume_names_corpus_clean, removeNumbers) halloween_costume_names_corpus_clean <- tm_map(halloween_costume_names_corpus_clean, removeWords, stopwords()) halloween_costume_names_corpus_clean <- tm_map(halloween_costume_names_corpus_clean, removePunctuation) halloween_costume_names_corpus_clean <- tm_map(halloween_costume_names_corpus_clean, stripWhitespace) |
Now that we have cleaned the Halloween costume name, we can create the Wordcloud for the costume names.
Halloween Costume Names Wordcloud Black and White
Input:
1 | wordcloud(halloween_costume_names_corpus_clean, min.freq = 10,random.order = FALSE) |
Output:
Halloween Costume Names Wordcloud Color
There are many colors that can be used with the RColorBrewer package. Below is a picture with a few color combinations to use in the next Wordcloud.
We can create a color variable from one of the color names from the above. For this tutorial, we will use the PuOr color combination.
Input:
1 2 3 | color = brewer.pal(8, "PuOr") wordcloud(halloween_costume_names_corpus_clean, min.freq = 8, colors=color, random.order = FALSE) |
Output:
Hope you enjoyed this tutorial and have some fun changing up the Wordcloud with various sizes and colors.