In this R Tutorial, we will complete data analysis of confirmed unprovoked United States shark attacks from 1837 until July 26, 2018. The shark attack data will be analyzed based on total occurrences in each state based in the U.S. and will graphically be displayed.
Install and Load Packages
Below are the libraries that we will need to load to complete this tutorial.
install.packages("dplyr") install.packages("ggmap") install.packages("ggplot2") library(dplyr) library(ggmap) library(ggplot2)
Download and Load the United States Shark Attacks Dataset
This data is already packaged and is available for download from the University of Florida, Florida Museum. Or you can easily download the United States shark attack dataset from the R-ALGO dataset page or directly from here United States Shark Attacks – united_states_shark_attacks.csv
us_attacks <- read.csv("united_states_shark_attacks.csv", stringsAsFactors = FALSE)
View the United States Shark Attacks Dataset
State Total 1 Florida 812 2 Hawaii 159 3 California 122 4 South Carolina 102 5 North Carolina 64 6 Texas 43
State Total Length:21 Min. : 1.00 Class :character 1st Qu.: 2.00 Mode :character Median : 8.00 Mean : 66.67 3rd Qu.: 43.00 Max. :812.00
The above outputs from the summary() and head() functions help to display the shark attack data.
The next step is to use the lapply() command and see classes of data we have.
$State  "character" $Total  "integer"
United States Shark Attacks Bar Chart Plot
Below is a ggplot() to plot out all of the states with shark attacks (obviously only coastal states).
ggplot(us_attacks, aes(x = State, y = Total, fill = State)) + geom_bar(stat = "identity") + xlab("States") + ylab("Total Shark Attacks") + ggtitle("United States Shark Attacks") + theme(axis.text.x = element_text(angle = 35, hjust = 1)) + theme(plot.title = element_text(hjust = 0.5))
The above output is correct and plotted accordingly.
United States Shark Attacks Bar Chart Plot (Ordered)
I would prefer to plot the states with the most attacks to the lowest attacks. We can do this by using the factor() function as shown below.
us_attacks$State <- factor(us_attacks$State, us_attacks$State) ggplot(us_attacks, aes(x = State, y = Total, fill = State)) + geom_bar(stat = "identity") + xlab("States") + ylab("Total Shark Attacks") + ggtitle("United States Shark Attacks") + theme(axis.text.x = element_text(angle = 35, hjust = 1)) + theme(plot.title = element_text(hjust = 0.5))
Mapping Shark Attacks with get_map() and ggmap()
Now that our data is clean and ready to map, we will be using the get_map() and ggmap() functions.
The get_map() function is a smart wrapper that queries the Google Maps, OpenStreetMap, Stamen Maps or Naver Map servers for a map. We can change the color of the map to either color or black and white. Let’s take a look at both since in the next step we are adding gradient colors to the map for shark attack locations.
color_map_us <- get_map(location = "United States", zoom = 4, color = "color") ggmap(color_map_us)
bw_map_us <- get_map(location = "United States", zoom = 4, color = "bw") ggmap(bw_map_us)
Depending on your preference, the black and white may not be as eye-catching but with the additional color gradients used in the next plot, the black and white may fit the purpose.
The ggmap() function is a collection of functions to visualize spatial data and models on top of static maps from various online sources (Google Maps). A few cool tools within the function includes tools common to those tasks, including functions for geolocation and routing.
Before plotting the shark attacks by states on the map, we will need to add the latitude and longitude to the us_attacks matrix. The site ink plant code provided the latitude and longitude for all states within the United States.
us_attacks$Latitude <- c(27.766279, 21.094318, 36.116203, 33.856892, 35.630066, 31.054487, 44.572021, 40.298904, 33.040619, 42.165726, 32.806671, 37.769337, 42.230171, 39.318523, 31.169546, 32.741646, 47.400902, 44.693947, 41.597782, 41.680893, 39.063946) us_attacks$Longitude <- c(-81.686783, -157.498337, -119.681564, -80.945007, -79.806419, -97.563461, -122.070938, -74.521011, -83.643074, -74.948051, -86.791130, -78.169968, -71.530106, -75.507141, -91.867805, -89.678696, -121.490494, -69.381927, -72.755371, -71.511780, -76.802101) head(us_attacks)
State Total Latitude Longitude 1 Florida 812 27.76628 -81.68678 2 Hawaii 159 21.09432 -157.49834 3 California 122 36.11620 -119.68156 4 South Carolina 102 33.85689 -80.94501 5 North Carolina 64 35.63007 -79.80642 6 Texas 43 31.05449 -97.56346
Now that the Longitude and Latitude are added, let’s plot the states with shark attacks in the United States.
ggmap(bw_map_us, extent = "device") + geom_point(data = us_attacks, aes(x = Longitude, y = Latitude, color = Total, size = Total)) + scale_colour_gradient(low = "blue", high = "red")
The above plotted all states in the United States with a shark attack since 1837.
Note: If you see errors as shown below from running the ggmap() function, please run the development version of ggmap to install. There’s a bug with ggmap and the new version of R Studio.
Theme element panel.border missing Theme element axis.line.x.bottom missing Theme element axis.ticks.x.bottom missing Theme element axis.line.x.top missing Theme element axis.ticks.x.top missing Theme element axis.line.y.left missing Theme element axis.ticks.y.left missing Theme element axis.line.y.right missing Theme element axis.ticks.y.right missing Error in width_cm(guide$barwidth %||% theme$legend.key.width) : Unknown input In addition: Warning messages: 1: `panel.margin` is deprecated. Please use `panel.spacing` property instead 2: Removed 1 rows containing missing values (geom_point).
Run the below: