In this R tutorial, we will learn some basic functions and learn to use the Plotly package in R to build histograms such as a basic histogram, normalized histogram and a linear histogram with the data from the used cars dataset.
Install and Load Packages
In order to build the Plotly histograms, we will need to load the below packages and libraries to complete this tutorial.
Input:
1 2 | install.packages("plotly") library(plotly) |
Download and Load the Used Cars Dataset
In order to build the create the Plotly histograms, we will need to download and load the used cars dataset. This dataset is already packaged and available for an easy download from the dataset page or directly from here Used Cars Dataset – usedcars.csv
Input:
1 | usedcars <- read.csv("usedcars.csv", stringsAsFactors = FALSE) |
View the Used Cars Data
In order to effectively build the histograms, one should know what data is in the dataset. We can use a few functions that would help with viewing the used car’s data.
str()
The str() command displays the internal structure of an R object. The str() function will only display one line for each structure of the data.
Input:
1 | str(usedcars) |
Output:
1 2 3 4 5 6 7 | 'data.frame': 150 obs. of 6 variables: $ year : int 2011 2011 2011 2011 2012 2010 2011 2010 2011 2010 ... $ model : chr "SEL" "SEL" "SEL" "SEL" ... $ price : int 21992 20995 19995 17809 17500 17495 17000 16995 16995 16995 ... $ mileage : int 7413 10926 7351 11613 8367 25125 27393 21026 32655 36116 ... $ color : chr "Yellow" "Gray" "Silver" "Gray" ... $ transmission: chr "AUTO" "AUTO" "AUTO" "AUTO" ... |
summary()
The summary() function is a basic function that issued to produce the result summary of various model functions.
Input:
1 | summary(usedcars) |
Output:
1 2 3 4 5 6 7 8 9 10 11 | year model price mileage color Min. :2000 Length:150 Min. : 3800 Min. : 4867 Length:150 1st Qu.:2008 Class :character 1st Qu.:10995 1st Qu.: 27200 Class :character Median :2009 Mode :character Median :13592 Median : 36385 Mode :character Mean :2009 Mean :12962 Mean : 44261 3rd Qu.:2010 3rd Qu.:14904 3rd Qu.: 55125 Max. :2012 Max. :21992 Max. :151479 transmission Length:150 Class :character Mode :character |
range()
The range() function returns a vector containing the maximum and minimum of all the given arguments.
Input:
1 | range(usedcars$price) |
Output:
1 | [1] 3800 21992 |
Plotly Basic Histogram of Used Cars Price
Input:
1 2 3 | usedcars_basic_histogram <- plot_ly(x = usedcars$price, type = "histogram") usedcars_basic_histogram |
Output:
Plotly Normalized Histogram of Used Cars Price
Input:
1 2 3 4 5 | usedcars_normalized_histogram <- plot_ly(x = usedcars$price, type = "histogram", histnorm = "probability" ) usedcars_normalized_histogram |
Output:
Plotly Linear Histogram of Used Cars Price and Mileage
Input:
1 2 3 4 | usedcars_linear_histogram <- plot_ly(x = usedcars$mileage, y = usedcars$price, type = "histogram") %>% layout(yaxis=list(type='linear')) usedcars_linear_histogram |
Output: