data visualization using r & ggplot2 - github pages · data visualization using r & ggplot2...
TRANSCRIPT
![Page 1: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/1.jpg)
Data Visualization Using R & ggplot2
Naupaka Zimmerman (@naupakaz)Andrew Tredennick (@ATredennick)
Hat tip to Karthik Ram (@ inundata) for original slides
February 22, 2015
![Page 2: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/2.jpg)
Some housekeeping
Install some packages
install.packages("ggplot2", dependencies = TRUE)install.packages("plyr")install.packages("ggthemes")install.packages("reshape2")
![Page 3: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/3.jpg)
Section 1
Why ggplot2?
![Page 4: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/4.jpg)
Why ggplot2?
I More elegant & compact code than with base graphics
I More aesthetically pleasing defaults than lattice
I Very powerful for exploratory data analysis
![Page 5: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/5.jpg)
Why ggplot2?
I ‘gg’ is for ‘grammar of graphics’ (term by Lee Wilkinson)
I A set of terms that defines the basic components of a plot
I Used to produce figures using coherant, consistant syntax
![Page 6: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/6.jpg)
Why ggplot2?
I Supports a continuum of expertise:
I Easy to get started, plenty of power for complex figures
![Page 7: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/7.jpg)
Section 2
The Grammar
![Page 8: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/8.jpg)
Some terminology
I data I Must be a data.frame
I Gets pulled into the ggplot() object
![Page 9: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/9.jpg)
The iris dataset
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa## 3 4.7 3.2 1.3 �.2 setosa## 4 4.6 3.1 1.5 �.2 setosa## 5 5.� 3.6 1.4 �.2 setosa## 6 5.4 3.9 1.7 �.4 setosa
![Page 10: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/10.jpg)
plyr and reshape are key for using R
These two packages are the swiss army knives of R.I plyr
1. ddply (data frame to data frame ply)1.1 split
1.2 apply
1.3 combine
2. llply (list to list ply)3. join
![Page 11: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/11.jpg)
plyr
iris[1:2, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa
# Note the use of the . function to allow Species to be used# without quotingddply(iris, .(Species), summarize,
mean.Sep.Wid = mean(Sepal.Width, na.rm = TRUE))
## Species mean.Sep.Wid## 1 setosa 3.428## 2 versicolor 2.77�## 3 virginica 2.974
![Page 12: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/12.jpg)
plyr and reshape are key for using R
These two packages are the swiss army knives of R.I reshape
1. melt2. dcast (data frame output)3. acast (vector/matrix/array output)
![Page 13: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/13.jpg)
reshape2
iris[1:2, ]
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species## 1 5.1 3.5 1.4 �.2 setosa## 2 4.9 3.� 1.4 �.2 setosa
df <- melt(iris, id.vars = "Species")df[1:2, ]
## Species variable value## 1 setosa Sepal.Length 5.1## 2 setosa Sepal.Length 4.9
![Page 14: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/14.jpg)
reshape2
df[1:2, ]
## Species variable value## 1 setosa Sepal.Length 5.1## 2 setosa Sepal.Length 4.9
dcast(df, Species ˜ variable, mean)
## Species Sepal.Length Sepal.Width Petal.Length## 1 setosa 5.��6 3.428 1.462## 2 versicolor 5.936 2.77� 4.26�## 3 virginica 6.588 2.974 5.552## Petal.Width## 1 �.246## 2 1.326## 3 2.�26
![Page 15: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/15.jpg)
Section 3
Aesthetics
![Page 16: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/16.jpg)
Some terminology
I data
I aesthetics
I How your data are represented visuallyI a.k.a. mapping
I which data on the x
I which data on the y
I but also: color, size, shape, transparency
![Page 17: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/17.jpg)
Let’s try an example
myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))summary(myplot)
## data: Sepal.Length, Sepal.Width, Petal.Length,## Petal.Width, Species [15�x5]## mapping: x = Sepal.Length, y = Sepal.Width## faceting: facet_null()
![Page 18: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/18.jpg)
Section 4
Geoms
![Page 19: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/19.jpg)
Some terminology
I data
I aesthetics
I geometry
I The geometric objects in the plot
I points, lines, polygons, etc
I shortcut functions: geom point(),geom bar(), geom line()
![Page 20: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/20.jpg)
Basic structure
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))+ geom_point()
myplot <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))myplot + geom_point()
I Specify the data and variables inside the ggplot function.
I Anything else that goes in here becomes a global setting.
I Then add layers: geometric objects, statistical models, andfacets.
![Page 21: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/21.jpg)
Quick note
I Never use qplot - short for quick plot.
I You‘ll end up unlearning and relearning a good bit.
![Page 22: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/22.jpg)
Let’s try an example
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +geom_point()
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8Sepal.Length
Sepal.W
idth
![Page 23: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/23.jpg)
Changing the aesthetics of a geom:Increase the size of points
ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width)) +geom_point(size = 3)
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8Sepal.Length
Sepal.W
idth
![Page 24: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/24.jpg)
Changing the aesthetics of a geom:Add some color
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 3)
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 25: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/25.jpg)
Changing the aesthetics of a geom:Di↵erentiate points by shape
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3)
# Why aes(shape = Species)?
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 26: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/26.jpg)
Exercise 1
# Make a small sample of the diamonds datasetd2 <- diamonds[sample(1:dim(diamonds)[1], 1���), ]
Then generate this plot below.
0
5000
10000
15000
1 2 3carat
price
colorDEFGHIJ
![Page 27: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/27.jpg)
Section 5
Stats
![Page 28: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/28.jpg)
Some terminology
I data
I aesthetics
I geometry
I stats
I Statistical transformations and datasummary
I All geoms have associated default stats, andvice versa
I e.g. binning for a histogram or fitting alinear model
![Page 29: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/29.jpg)
Built-in stat example: Boxplots
See ?geom boxplot for list of options
library(MASS)ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()
1000
2000
3000
4000
5000
1 2 3factor(race)
bwt
![Page 30: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/30.jpg)
Built-in stat example: Boxplots
myplot <- ggplot(birthwt, aes(factor(race), bwt)) + geom_boxplot()summary(myplot)
## data: low, age, lwt, race, smoke, ptl, ht, ui, ftv,## bwt [189x1�]## mapping: x = factor(race), y = bwt## faceting: facet_null()## -----------------------------------## geom_boxplot: outlier.colour = black, outlier.shape = 16, outlier.size = 2, notch = FALSE, notchwidth = �.5, varwidth = FALSE## stat_boxplot:## position_dodge: (width = NULL, height = NULL)
![Page 31: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/31.jpg)
Section 6
Facets
![Page 32: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/32.jpg)
Some terminology
I data
I aesthetics
I geometry
I stats
I facets
I Subsetting data to make lattice plots
I Really powerful
![Page 33: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/33.jpg)
Faceting: single column, multiple rows
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(Species ˜ .)
2.02.53.03.54.04.5
2.02.53.03.54.04.5
2.02.53.03.54.04.5
setosaversicolor
virginica
5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 34: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/34.jpg)
Faceting: single row, multiple columns
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(. ˜ Species)
setosa versicolor virginica
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 35: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/35.jpg)
or just wrap your facets
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_wrap( ˜ Species) # notice lack of .
setosa versicolor virginica
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 36: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/36.jpg)
Section 7
Scales
![Page 37: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/37.jpg)
Some terminology
I data
I aesthetics
I geometry
I stats
I facets
I scales
I Control the mapping from data toaesthetics
I Often used for adjusting color mapping
![Page 38: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/38.jpg)
Colors
aes(color = variable) # mappingcolor = "black" # setting
# Or add it as a scalescale_fill_manual(values = c("color1", "color2"))
![Page 39: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/39.jpg)
The RColorBrewer package
library(RColorBrewer)display.brewer.all()
![Page 40: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/40.jpg)
Using a color brewer palette
df <- melt(iris, id.vars = "Species")ggplot(df, aes(Species, value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +scale_fill_brewer(palette = "Set1")
0
2
4
6
8
setosa versicolor virginicaSpecies
value
variableSepal.LengthSepal.WidthPetal.LengthPetal.Width
![Page 41: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/41.jpg)
Manual color scale
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point() +facet_grid(Species ˜ .) +scale_color_manual(values = c("red", "green", "blue"))
2.02.53.03.54.04.5
2.02.53.03.54.04.5
2.02.53.03.54.04.5
setosaversicolor
virginica
5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 42: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/42.jpg)
Refer to a color chart for beautful visualizations
http://tools.medialab.sciences-po.fr/iwanthue/
![Page 43: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/43.jpg)
Adding a continuous scale to an axis
library(MASS)ggplot(birthwt, aes(factor(race), bwt)) +
geom_boxplot(width = .2) +scale_y_continuous(labels = (paste�(1:4, " Kg")),
breaks = seq(1���, 4���, by = 1���))
1 Kg
2 Kg
3 Kg
4 Kg
1 2 3factor(race)
bwt
![Page 44: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/44.jpg)
Commonly used scales
scale_fill_discrete(); scale_colour_discrete()scale_fill_hue(); scale_color_hue()scale_fill_manual(); scale_color_manual()scale_fill_brewer(); scale_color_brewer()scale_linetype(); scale_shape_manual()
![Page 45: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/45.jpg)
Section 8
Coordinates
![Page 46: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/46.jpg)
Some terminology
I data
I aesthetics
I geometry
I stats
I facets
I scales
I coordinates
I Not going to cover this in detail
I e.g. polar coordinate plots
![Page 47: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/47.jpg)
Section 9
Putting it all together with more examples
![Page 48: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/48.jpg)
Section 10
Histograms
![Page 49: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/49.jpg)
See ?geom histogram for list of options
h <- ggplot(faithful, aes(x = waiting))h + geom_histogram(binwidth = 3�, colour = "black")
0
50
100
150
0 50 100 150waiting
count
![Page 50: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/50.jpg)
h <- ggplot(faithful, aes(x = waiting))h + geom_histogram(binwidth = 8, fill = "steelblue",colour = "black")
0
20
40
60
30 50 70 90waiting
count
![Page 51: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/51.jpg)
Section 11
Line plots
![Page 52: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/52.jpg)
climate <- read.csv("data/climate.csv", header = T)ggplot(climate, aes(Year, Anomaly1�y)) +
geom_line()
0.0
0.5
1920 1950 1980Year
Anom
aly10y
climate <- read.csv(text =
RCurl::getURL(https://raw.github.com/karthikram/ggplot-lecture/master/climate.csv))
![Page 53: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/53.jpg)
We can also plot confidence regions
climate <- read.csv("data/climate.csv", header = T)ggplot(climate, aes(Year, Anomaly1�y)) +
geom_ribbon(aes(ymin = Anomaly1�y - Unc1�y,ymax = Anomaly1�y + Unc1�y),fill = "blue", alpha = .1) +
geom_line(color = "steelblue")
0.0
0.5
1920 1950 1980Year
Anom
aly10y
![Page 54: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/54.jpg)
Section 12
Bar plots
![Page 55: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/55.jpg)
ggplot(iris, aes(Species, Sepal.Length)) +geom_bar(stat = "identity")
0
100
200
300
setosa versicolor virginicaSpecies
Sepal.Length
![Page 56: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/56.jpg)
df <- melt(iris, id.vars = "Species")ggplot(df, aes(Species, value, fill = variable)) +
geom_bar(stat = "identity")
0
250
500
750
setosa versicolor virginicaSpecies
value
variableSepal.LengthSepal.WidthPetal.LengthPetal.Width
![Page 57: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/57.jpg)
ggplot(df, aes(Species, value, fill = variable)) +geom_bar(stat = "identity", position = "dodge")
0
2
4
6
8
setosa versicolor virginicaSpecies
value
variableSepal.LengthSepal.WidthPetal.LengthPetal.Width
What’s going on with the y axis?
![Page 58: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/58.jpg)
ggplot(df, aes(Species, value, fill = variable)) +geom_bar(stat = "identity", position="dodge", color="black")
0
2
4
6
8
setosa versicolor virginicaSpecies
value
variableSepal.LengthSepal.WidthPetal.LengthPetal.Width
![Page 59: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/59.jpg)
Exercise 3
Using the d2 dataset you created earlier, generate this plot below.Take a quick look at the data first to see if it needs to be binned.
0
25
50
75
I1 SI2 SI1 VS2 VS1 VVS2 VVS1 IFclarity
coun
t
cutFairGoodVery GoodPremiumIdeal
![Page 60: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/60.jpg)
Section 13
Density Plots
![Page 61: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/61.jpg)
Density plots
ggplot(faithful, aes(waiting)) + geom_density()
0.00
0.01
0.02
0.03
50 60 70 80 90waiting
density
![Page 62: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/62.jpg)
Density plots
ggplot(faithful, aes(waiting)) +geom_density(fill = "blue", alpha = �.1)
0.00
0.01
0.02
0.03
50 60 70 80 90waiting
density
![Page 63: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/63.jpg)
ggplot(faithful, aes(waiting)) +geom_line(stat = "density")
0.01
0.02
0.03
50 60 70 80 90waiting
density
![Page 64: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/64.jpg)
Section 14
Adding smoothers
![Page 65: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/65.jpg)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3) +geom_smooth(method = "lm")
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 66: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/66.jpg)
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(aes(shape = Species), size = 3) +geom_smooth(method = "lm") +facet_grid(. ˜ Species)
setosa versicolor virginica
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 67: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/67.jpg)
Section 15
Themes
![Page 68: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/68.jpg)
Adding themes
Themes are a great way to define custom plots.
+ theme()# see ?theme() for more options
![Page 69: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/69.jpg)
A more basic theme
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 1.2, shape = 16) +facet_wrap( ˜ Species) +theme_bw()
setosa versicolor virginica
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length
Sepal.W
idth Species
setosaversicolorvirginica
![Page 70: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/70.jpg)
A themed plot
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +geom_point(size = 1.2, shape = 16) +facet_wrap( ˜ Species) +theme(legend.key = element_rect(fill = NA),
legend.position = "bottom",strip.background = element_rect(fill = NA),axis.title.y = element_text(angle = �))
![Page 71: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/71.jpg)
A themed plot
setosa versicolor virginica
2.0
2.5
3.0
3.5
4.0
4.5
5 6 7 8 5 6 7 8 5 6 7 8Sepal.Length
Sepal.Width
Species setosa versicolor virginica
![Page 72: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/72.jpg)
ggthemes library
install.packages(ggthemes)library(ggthemes)# Then add one of these themes to your plot+ theme_stata()+ theme_excel()+ theme_wsj()+ theme_solarized()
![Page 73: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/73.jpg)
Fan of Wes Anderson movies?
![Page 74: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/74.jpg)
Yup, that’s a thing
# install.packages(wesanderson)library("wesanderson")# display a palettewes_palette("Royal2", 5)
Royal2
![Page 75: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/75.jpg)
Section 16
Create functions to automate your plotting
![Page 76: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/76.jpg)
Write functions for day to day plots
my_custom_plot <- function(df, title = "", ...) {ggplot(df, ...) +ggtitle(title) +whatever_geoms() +theme(...)
}
Then just call your function to generate a plot. It’s a lot easier tofix one function that do it over and over for many plots
plot1 <- my_custom_plot(dataset1, title = "Figure 1")
![Page 77: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/77.jpg)
Section 17
Publication quality figures
![Page 78: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/78.jpg)
I If the plot is on your screen
ggsave(˜/path/to/figure/filename.png)
I If your plot is assigned to an object
ggsave(plot1, file = "˜/path/to/figure/filename.png")
I Specify a size
ggsave(file = "/path/to/figure/filename.png", width = 6,height =4)
I or any format (pdf, png, eps, svg, jpg)
ggsave(file = "/path/to/figure/filename.eps")ggsave(file = "/path/to/figure/filename.jpg")ggsave(file = "/path/to/figure/filename.pdf")
![Page 79: Data Visualization Using R & ggplot2 - GitHub Pages · Data Visualization Using R & ggplot2 Naupaka Zimmerman (@naupakaz) Andrew Tredennick (@ATredennick) Hat tip to Karthik Ram (@](https://reader031.vdocuments.net/reader031/viewer/2022022605/5b7574337f8b9a0c188d2cc9/html5/thumbnails/79.jpg)
Further help
I You’ve just scratched the surface with ggplot2.
I Practice
I Read the docs (either locally in R or athttp://docs.ggplot2.org/current/)
I Work together