r programming in life sciences - introduction to r graphics. · 2020-01-23 · r programming in...

31
R programming in Life Sciences Introduction to R graphics. Marcin Kierczak * * Department of Medical Biochemistry and Microbiology Faculty of Medicine and Pharmacy Uppsala University Uppsala, SWEDEN 04 Nov 2016, Uppsala Marcin Kierczak Introduction to R 1 / 20

Upload: others

Post on 18-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

R programming in Life SciencesIntroduction to R graphics.

Marcin Kierczak∗

∗Department of Medical Biochemistry and MicrobiologyFaculty of Medicine and Pharmacy

Uppsala UniversityUppsala, SWEDEN

04 Nov 2016, Uppsala

Marcin Kierczak Introduction to R 1 / 20

Page 2: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsExample – stability of the climate, courtesy of dr. Mats Pettersson

−150 −100 −50 0 50 100 150

−15

0−

100

−50

050

100

150

2

4

6

8

Marcin Kierczak Introduction to R 2 / 20

Page 3: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsExample – airport-based Voronoi tesselation of Poland

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

Marcin Kierczak Introduction to R 3 / 20

Page 4: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsExample – The face of Asia

Marcin Kierczak Introduction to R 4 / 20

Page 5: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsExample – Gapminder-type plot

Marcin Kierczak Introduction to R 5 / 20

Page 6: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGraphical devices

The concept of a graphical device is crucial for understanding Rgraphics. A device can be a screen (default) or a file. Some Rpackages introduce their own devices, e.g. Cairo device.Creating a plot entails:

opening a graphical device (not necessary for plotting onscreen),plotting to the graphical device,closing the graphical device (very important!).

Marcin Kierczak Introduction to R 6 / 20

Page 7: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsThe most common graphical devices

The most commonly used graphical devices are:screen,bitmap/raster devices: png(), bmp(), tiff(), jpeg()vector devices: svg(), pdf(),Cairo versions of the above devices – for Windows users theyoffer higher quality graphics,for more information visit http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/Devices.html

Marcin Kierczak Introduction to R 7 / 20

Page 8: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsWorking with graphical devices

png(filename = 'myplot.png', width = 320, height = 320,antialias = T)

plot(x=c(1,2,7), y=c(2,3,5))dev.off()

Marcin Kierczak Introduction to R 8 / 20

Page 9: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsStandard graphical device viewport

source: rgraphics.limnology.wisc.edu

Marcin Kierczak Introduction to R 9 / 20

Page 10: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – basics

plot() is the basic command that lets you visualize your data andresults. It is very powerful yet takes some effort to fully master it.Let’s begin with plotting three points: A(1,2); B(2,3); C(7,5).

plot(x = c(1, 2, 7), y = c(2, 3, 5))

Marcin Kierczak Introduction to R 10 / 20

Page 11: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – data frame

For convenience, we will create a data frame with our points:

df <- data.frame(x=c(1,2,7), y=c(2,3,5),row.names=c("A","B","C"))

df

## x y## A 1 2## B 2 3## C 7 5

Marcin Kierczak Introduction to R 11 / 20

Page 12: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – basics

Let’s make our plot a bit fancier...

plot(df, pch=c(15,17,19),col=c("tomato", "slateblue"), # recycling rule in action!main="Three points", sub="Stage 1")

Marcin Kierczak Introduction to R 12 / 20

Page 13: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – parameters

There is many parameters one can set in plot(). Let’s have acloser look at some of them:

pch – type of the plotting symbolcol – color of the pointscex – scale for pointsmain – main title of the plotsub – subtitle of the plotxlab – X-axis labelylab – Y-axis labellas – axis labels orientationcex.axis – axis lables scale

Marcin Kierczak Introduction to R 13 / 20

Page 14: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGraphical parameters

Graphical parameters can be set in two different ways:as plotting function arguments, e.g. plot(dat, cex=0.5)using par()

# read current graphical parameterspar()# first, save the current parameters so that you# can set them back if neededopar <- par() # should work in theory, practise varies :-(# now, modify what you wantpar(cex.axis = 0.8, bg='grey')# do your plottingplot(................)# restore the old parameters if you wantpar(opar)

Marcin Kierczak Introduction to R 14 / 20

Page 15: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGraphical parameters – pch

# create a grid of coordinatescoords <- expand.grid(1:6,1:6)# make a vector of numerical pch symbolspch.num <- c(0:25)# and a vector of character pch symbolspch.symb <- c('*', '.', 'o', 'O', '0', '-', '+', '|', '%', '#')# plot numerical pchplot(coords[1:26,1], coords[1:26,2], pch=pch.num,

bty='n', xaxt='n', yaxt='n', bg='red',xlab='', ylab='')

# and character pch'spoints(coords[27:36,1], coords[27:36,2], pch=pch.symb)# label themtext(coords[,1], coords[,2], c(1:26, pch.symb), pos = 1,

col='slateblue', cex=.8)

Marcin Kierczak Introduction to R 15 / 20

Page 16: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGraphical parameters – pch result

Now, make your only cheatsheet for the lty parameter!

Marcin Kierczak Introduction to R 16 / 20

Page 17: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsLayers

Elements are added to a plot in the same order you plot them. It islike layers in a graphical program. Think about it! For instance theauxiliary grid lines should be plotted before the actual data points.

# make an empty plotplot(1:5, type='n', las=1,

bty='n')# plot gridgrid(col='grey', lty=3)# plot the actual datapoints(1:5, pch=19, cex=3)# plot a lineabline(h = 3.1, col='red')# you see, it overlaps the# data point. It is better# to plot it before points()

Marcin Kierczak Introduction to R 17 / 20

Page 18: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGeneral considerations

There is a few points you should have in mind when working withplots:

raster or vector graphics,colors, e.g. color-blind people, warm vs. cool colors andoptical illusions,avoid complications, e.g. 3D plots, pie charts etc.,use black ang greys for things you do not need to emphasize,i.e. basically everything except your main result,avoid 3 axis.

Marcin Kierczak Introduction to R 18 / 20

Page 19: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsMany plots on one figure

par(mfrow=c(2,3))plot(1:5)plot(1:5, pch=19, col='red')plot(1:5, pch=15, col='blue')hist(rnorm(100, mean = 0, sd=1))hist(rnorm(100, mean = 7, sd=3))hist(rnorm(100, mean = 1, sd=0.5))par(mfrow=c(1,1))

Marcin Kierczak Introduction to R 19 / 20

Page 20: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsMany plots on one figure – result

Marcin Kierczak Introduction to R 20 / 20

Page 21: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsMany plots on one figure – layout

layout(matrix(c(1, 2, 2, 2, 3, 2, 2, 2), nrow = 2, ncol = 4, byrow = T))plot(1:5, pch = 15, col = "blue")hist(rnorm(100, mean = 0, sd = 1))plot(1:5, pch = 15, col = "red")

Marcin Kierczak Introduction to R 21 / 20

Page 22: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsColors

mycol <- c(rgb(0, 0, 1), "olivedrab", "#FF0000")plot(1:3, c(1, 1, 1), col = mycol, pch = 19, cex = 3)points(2, 1, col = rgb(0, 0, 1, 0.5), pch = 15, cex = 5)

Marcin Kierczak Introduction to R 22 / 20

Page 23: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsColor palettes

There some built-in palettes: default, hsv, gray, rainbow,terrain.colors, topo.colors, cm.colors, heat.colorsmypal <- heat.colors(10)mypal

## [1] "#FF0000FF" "#FF2400FF" "#FF4900FF" "#FF6D00FF" "#FF9200FF"## [6] "#FFB600FF" "#FFDB00FF" "#FFFF00FF" "#FFFF40FF" "#FFFFBFFF"

pie(x = rep(1, times = 10), col = mypal)

Marcin Kierczak Introduction to R 23 / 20

Page 24: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsCustom palettes

You can easily create custom palettes:mypal <- colorRampPalette(c("red", "green", "blue"))pie(x = rep(1, times = 12), col = mypal(12))

Note that colorRampPalette returns a function!Marcin Kierczak Introduction to R 24 / 20

Page 25: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPalettes cted.

There is an excellent package RColorBrewer that offers anumber of pre-made palettes, e.g. color-blind safe palette.Package wesanderson offers palettes based on Wes Andersonmovies:-)

Marcin Kierczak Introduction to R 25 / 20

Page 26: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – boxplot

There are also more specialized R functions used for creatingspecific types of plots. One commonly used is boxplot()

Figure: source:commons.wikimedia.org

Rule of thumb. If median of oneboxplot is outside the box ofanother, the median difference islikely to be significant (α = 5%)

boxplot(decrease ~ treatment,data = OrchardSprays,log = "y", col = "bisque",varwidth=T)

Marcin Kierczak Introduction to R 26 / 20

Page 27: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – boxplot cted.

add=TRUE parameter in plotsallows you to compose plots usingpreviously plotted stuff!

attach(InsectSprays)boxplot(count~spray,

data=InsectSprays[spray %in% c("C","F"),],col="lightgray")

boxplot(count~spray,data=InsectSprays[spray %in% c("A","B"),],col="tomato", add=T)

boxplot(count~spray,data=InsectSprays[spray %in% c("D","E"),],col="slateblue", add=T)

detach(InsectSprays)

Marcin Kierczak Introduction to R 27 / 20

Page 28: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsPlot – violin plots

Package vioplot lets you plotviolin plots. A violin plot is verysimilar to a boxplot, but it alsovisualizes distribution of yourdatapoints.

library(vioplot)

## Loading required package: sm## Package ’sm’, version2.2-5.4: type help(sm) forsummary information

attach(InsectSprays)vioplot(count[spray=="A"],

count[spray=="F"],col="bisque",names=c("A","F"))

detach(InsectSprays)

Marcin Kierczak Introduction to R 28 / 20

Page 29: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

Graph galleryCategorical data with vcd

library(vcd) # load the vcd package

## Loading required package: grid

data(Titanic) # Load the datamosaic(Titanic, labeling=labeling_border(rot_labels = c(0,0,0,0))) # Plot the data

Marcin Kierczak Introduction to R 29 / 20

Page 30: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

Graph galleryHeatmaps

heatmap(matrix(rnorm(100, mean = 0, sd = 1), nrow = 10),col=terrain.colors(10))

Marcin Kierczak Introduction to R 30 / 20

Page 31: R programming in Life Sciences - Introduction to R graphics. · 2020-01-23 · R programming in Life Sciences IntroductiontoRgraphics. MarcinKierczak Department of Medical Biochemistry

GraphicsGraph gallery

For more awesome examples visit: http://www.r-graph-gallery.com

Marcin Kierczak Introduction to R 31 / 20