intro to graphics with ggplot2 - github pageskmaurer.github.io/documents/ggplot2intro.pdf · intro...
TRANSCRIPT
![Page 1: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/1.jpg)
Intro to Graphics with ggplot2
![Page 2: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/2.jpg)
ggplot2 in a nutshell
I Package for statistical graphics
I Developed by Hadley Wickham (An ISU Alumni)
I Designed to adhere to good graphical practices
I Supports a wide variety plot types
I Constructs plots using the concept of layers
I http://docs.ggplot2.org/current/ for reference material
I Hadley's book ggplot2: Elegant Graphics for Data Analysis
![Page 3: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/3.jpg)
ggplot()
ggplot() function is the starting point for plots using the package
I This is the "blank canvas" function
I Can set default data scales for the plot here
I creates an object that can be saved
I plot layers can be added to modify plot complexity
![Page 4: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/4.jpg)
ggplot() structure
ggplot() function has a basic syntax
ggplot(aes(variables=scales), dataset)
I The aes(..) statement: de�nes connection of variables toscales
I variables: and data column we want to plot
I scales: x, y, color, size, shape, groupings, orderings, etc.
I dataset: speci�ed with a data= statement
![Page 5: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/5.jpg)
Adding Layers to ggplot()
Now that aesthetic scales have been de�nedwe need to add geometric or statistical layers
ggplot(aes(variables=scales), dataset) +
geom_point(aes(...),dataset) +
stat_smooth(aes(...),dataset)
I aes(..) : De�ne in layers if di�erent from default intextttggplot()
I dataset: De�ne in layers if di�erent from default intextttggplot()
I This allows layers to be built from multiple data sources
I http://docs.ggplot2.org/current/ for reference material
![Page 6: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/6.jpg)
Diamonds Data
We will explore the diamonds data set (preloaded along withggplot2) using qplot for basic plotting.
The data set was scraped from a diamond exchange company database by Hadley. It contains the prices and attributes of over 50,000diamonds
![Page 7: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/7.jpg)
Examining the Diamonds Data
What does the data look like?
Lets look at the top few rows of the diamond data frame to �ndout!
head(diamonds)
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
![Page 8: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/8.jpg)
qplot() demo
Demo of basic plot types and options using ggplot()!
Follow along with the demo by opening GraphicsIntro.R in yourown R environment
![Page 9: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/9.jpg)
Scatterplot
Basic scatter plot of diamond price vs carat weight
ggplot(aes(x=carat, y=price), data=diamonds) +
geom_point()
![Page 10: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/10.jpg)
Scatterplot
Scatter plot of diamond price vs carat weight showing versitility ofoptions in qplot
ggplot(aes(x=carat, y=log(price), color=color), data=diamonds, alpha=I(.2))+
geom_point() + ggtitle("Log price by carat weight, grouped by color")
![Page 11: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/11.jpg)
Your Turn
All of the your turns for this section will use the tips data set(loaded in with reshape package)
data(tips, package="reshape2")
I Use qplot to build a scatterplot of variables tips and total bill
I Use options within qplot to color points by smokers
I Clean up axis labels and add main plot title
![Page 12: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/12.jpg)
Histograms
Basic histogram of carat weight
ggplot() +
geom_histogram(aes(x=carat), data=diamonds)
## `stat_bin()` using `bins = 30`. Pick better value with
`binwidth`.
0
2500
5000
7500
0 2 4carat
coun
t
![Page 13: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/13.jpg)
Histograms
Carat weight histograms faceted by cut
ggplot(aes(x=carat), data=diamonds) +
geom_histogram(binwidth=.2) +
facet_grid(.~cut )
Fair Good Very Good Premium Ideal
0
2000
4000
6000
0 2 4 0 2 4 0 2 4 0 2 4 0 2 4carat
coun
t
![Page 14: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/14.jpg)
Your Turn
I Create a new variable in tips data frame rate = tip/total bill
I Use qplot to create a histogram of rate
I Change the bin width on that histogram to 0.05
I Facet this histogram by size of the group
![Page 15: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/15.jpg)
Boxplots
Side by side boxplot of diamond prices within cut groupings
ggplot(aes(x=cut, y=price), data=diamonds) +
geom_boxplot()
![Page 16: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/16.jpg)
Boxplots
Side by side boxplot of log prices within cut groupings with jitteredvalues overlay
ggplot(aes(x=cut, y=log(price)), data=diamonds,
main="Boxplots of log Diamond Prices Grouped by Cut Quality") +
geom_boxplot(color="blue") +
geom_jitter(alpha=I(.025))
![Page 17: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/17.jpg)
Your Turn
I Make side by side boxplots of tipping rate for males andfemales
I Overlay jittered points for observed values onto this boxplot
![Page 18: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/18.jpg)
Bar plots
To investigate bar plots we will switch over to the Titanic data set
titanic <- as.data.frame(Titanic)
Data includes passenger characteristics and survival outcomes forthose aboard the RMS Titanics ill fated maiden voyage
![Page 19: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/19.jpg)
Bar Plots
Basic bar plot of survival outcomes
ggplot(aes(x=Survived, weight=Freq), data=titanic) +
geom_bar()
0
500
1000
1500
No YesSurvived
coun
t
![Page 20: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/20.jpg)
Bar Plots
Bar plot faceted by gender and class
ggplot(aes(x=Survived, weight=Freq), data=titanic) +
geom_bar()+
facet_grid(Sex~Class)
1st 2nd 3rd Crew
0
200
400
600
0
200
400
600
Male
Fem
ale
No Yes No Yes No Yes No YesSurvived
coun
t
![Page 21: Intro to Graphics with ggplot2 - GitHub Pageskmaurer.github.io/documents/ggplot2Intro.pdf · Intro to Graphics with ggplot2. ggplot2 in a nutshell I Package for statistical graphics](https://reader030.vdocuments.net/reader030/viewer/2022040215/5ed9ecb78f8ad53f095e99e8/html5/thumbnails/21.jpg)
Your Turn
I Use the tips data to make a barplot for counts of smoking andnon smoking customers
I Facet using day of week and time of day to view how smokingstatus changes for di�erent meal times