data scientist agill@mango-solutions - londonr · available in ggplot2 •use aesthetics (colour,...

78
Angela Castillo-Gill Data Scientist [email protected] @acastillogill

Upload: others

Post on 29-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

Angela Castillo-Gill

Data Scientist

[email protected]

@acastillogill

Page 2: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 3: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Produce standard graphics and understand the range of visualisations available in ggplot2

• Use aesthetics (colour, shape, size) to add information to a visualisation

• Create analytical visualisations by groups (small multiples)

• Use R functionality to export high resolution graphics

Page 4: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 5: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

library(tidyverse)

install.packages(“tidyverse”)

Page 6: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

github.com/rfordatascience/tidytuesday

Page 7: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

https://ig.ft.com/sites/visual-history-of-womens-tennis/

Page 8: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

bit.ly/GrandSlams

Page 9: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

grand_slams <-

read_csv("grand_slams.csv")

Page 10: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 11: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 12: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

Page 13: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

Function used to create the skeleton structure

of a graphic

Page 14: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

Define the data that will be the basis of our

plot

Page 15: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

How do the variables in our data relate to

aesthetics in the plot?

Page 16: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

Use the aes helper function to define how variables relate to plot

elements

Page 17: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point()

Define the type of plot by adding geom

functions as layers

Page 18: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 19: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Creates a plot skeleton when we define:

1. what data we want to use

2. how to map variables in the data to aesthetics in the plot

Page 20: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 21: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Defines the type of plot we want to create

• Added as layers with "+"

• We can include multiple elements by continuing to add them as layers

Page 22: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

www.rstudio.com/resources/cheatsheets/

Page 23: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = grand_slams,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point() +

geom_hline(aes(yintercept = 10),

colour = "red")

Page 24: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 25: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Using the mpg data (in the ggplot2 package) create a scatter plot of city miles per gallon against highway miles per gallon

• Add a smooth line to this plot

• Can you figure out how to change this to use linear regression as the smoothing method?

Page 26: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 27: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Anything that defines the look and feel of the plot:

– x, y, z etc

– colour, fill

– shape, linetype

– size (inc. line size)

– alpha (aka. opacity)

Page 28: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

more_than_10_wins <-

grand_slams %>%

group_by(name) %>%

filter(any(

rolling_win_count > 10))

Page 29: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping = aes(

x = tournament_date,

y = rolling_win_count)) +

geom_point(aes(colour = name))

Page 30: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 31: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Using the scatter plot of cty against hwy, colour the points by drv (whether front, rear or 4 wheel drive.

• How does the plot differ if you define the colour in the ggplot function as opposed to the geom_point layer?

Page 32: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 33: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

www.rstudio.com/resources/cheatsheets/

Page 34: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Counting is done automatically for us from the raw data

Page 35: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

aes(x = name)) +

geom_bar()

Page 36: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 37: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Counting is done automatically for us from the raw data

• Change bar colour with fill

Page 38: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

aes(x = name)) +

geom_bar(aes(fill = grand_slam))

Page 39: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 40: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Create from pre-counted data with

stat = "identity"

• Side by side categories with

position = "dodge"

Page 41: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Use "path" to join by appearance in the data, "line" to join by x-axis value

Page 42: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

more_than_10_wins <- more_than_10_wins %>%

group_by(name) %>%

mutate(

first_win = min(tournament_date),

days_since_first = as.numeric(

tournament_date - first_win))

Page 43: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line()

Page 44: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 45: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• For individual lines for each of some group we need to define the group

Page 46: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping = aes(

x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(group = name))

Page 47: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 48: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Create a bar chart of the number of cars in each class

• Update the plot so that you can compare the year (remember you will need to fill, and the variable should be a factor)

• Can you update your plot so that bars for each year appear side by side?

Page 49: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 50: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Create multiple plots that can be easily compared

• In ggplot2 this is faceting

• We can either facet into a grid structure or a table structure

• Most appropriate depends on the data

Page 51: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 52: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender))

Page 53: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender))

Saying we want these variables to have one row for each category

Page 54: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender))

Could also use cols, or both rows and cols

Page 55: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender))

Use the vars helper function to define the variables in the data

Page 56: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 57: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 58: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping =

aes(x = days_since_first,

y = rolling_win_count)) +

geom_line() +

facet_wrap(vars(name))

Page 59: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 60: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Create a scatter plot of cty against hwy as previously.

• Create a facetted version of this plot, splitting by class. Try both the facet_grid and facet_wrapfunctions. Which is more suitable for this graphic?

Page 61: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 62: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Set the labels

• Consider the scales

• Think about the theme

Page 63: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

bbc.github.io/rcookbook/

Page 64: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 65: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Use the labs function to set:

– x, y axis labels

– legend titles (colour, shape, size etc.)

– title, subtitle

– caption

Page 66: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping = aes(

x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender)) +

labs(x = "Number of Days Since First Title",

y = "Total Number of Grand Slam Titles",

colour = "Player") +

scale_colour_viridis_d() +

theme_bw()

Page 67: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• The scale_* family can set:

– Exact choice of colours

– Break points of axis

– Labels on legends and axis

– Much more!

• Some default functions exist to help

Page 68: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• The theme function sets:

– backgrounds & borders

– grid lines

– axis text rotation

– legend position

– title positions

– Over 80 graphic elements!

• Selection of default functions available

Page 69: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

ggplot(data = more_than_10_wins,

mapping = aes(

x = days_since_first,

y = rolling_win_count)) +

geom_line(aes(colour = name)) +

facet_grid(rows = vars(gender)) +

labs(x = "Number of Days Since First Title",

y = "Total Number of Grand Slam Titles",

colour = "Player") +

scale_colour_viridis_d() +

theme_bw()

Page 70: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations
Page 71: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Use function that reflects the file type (png, jpeg, pdf, etc.)

• Control:

– Width & Height

– Quality (resolution)

• Need to control graphics devices

Page 72: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

png( filename = "TotalWinsByPlayer.png",

width = 600,

height = 350,

res = 100)

# Code to create plot goes here

dev.off()

Page 73: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

png( filename = "TotalWinsByPlayer.png",

width = 600,

height = 350,

res = 100)

# Code to create plot goes here

dev.off()

Open a connection to a new graphics device

(place to send your plot)

Page 74: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

png( filename = "TotalWinsByPlayer.png",

width = 600,

height = 350,

res = 100)

# Code to create plot goes here

dev.off()

Create the plot!

Page 75: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

png( filename = "TotalWinsByPlayer.png",

width = 600,

height = 350,

res = 100)

# Code to create plot goes here

dev.off()Close the connection to

the plot

Page 76: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

• Using one of the graphics you have created today, set the labels to be appropriate for the graphic

• Export a png of your graphic

Page 77: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

Cheat Sheet

www.rstudio.com/resources/cheatsheets

Practice Data

github.com/rfordatascience/tidytuesday

In Production Example

bbc.github.io/rcookbook/

Page 78: Data Scientist agill@mango-solutions - LondonR · available in ggplot2 •Use aesthetics (colour, shape, size) to add information to a visualisation •Create analytical visualisations

Aimee Gott

[email protected]