data analysis and visualization: r workflow

53
Data Analysis and Visualization: R Workflow Dr. Olga Scrivner, Research Scientist January , Indiana University

Upload: olga-scrivner

Post on 23-Jan-2018

88 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Data Analysis and Visualization: R Workflow

Data Analysis and Visualization:R Workflow

Dr. Olga Scrivner, Research ScientistJanuary 16, 2018

Indiana University

Page 2: Data Analysis and Visualization: R Workflow

Visual Insights Talk Series

1

Upcoming Events:

http://cns.iu.edu/netscitalks.html

Page 3: Data Analysis and Visualization: R Workflow

Goals

1. Understand RStudio2. Understand the difference between R scripts and R projects3. Learn how to plan and manage R project4. Learn how to deploy and share R project

TBA:

} R crash course} Shiny basics and Shiny advanced

2

Page 4: Data Analysis and Visualization: R Workflow

Overview

3

1. Project set-up and planning2. Reporting and documenting3. Visualizing4. Sharing

(Wickham and Grolemund, 2017)

Page 5: Data Analysis and Visualization: R Workflow

Project-oriented Workflow

Page 6: Data Analysis and Visualization: R Workflow

Project Set-up

5

Organize each data into a project:

File→ New Project

Page 7: Data Analysis and Visualization: R Workflow

Create Project

6

Page 8: Data Analysis and Visualization: R Workflow

Project Type

7

Page 9: Data Analysis and Visualization: R Workflow

Project Directory

8

Page 10: Data Analysis and Visualization: R Workflow

New Project

9

Page 11: Data Analysis and Visualization: R Workflow

Building a Directory Structure

10

1. Install library ProjectTemplate

2. Open a new R script: File→ New File→ R Script

3. TIP: Change your working directory - one level upfrom your projectSession→ Set Working Directory→ Choose Directory

Page 12: Data Analysis and Visualization: R Workflow

Directory Structure

Extensions:

} R script - .R} Readme file - .md (Markdown)} Project - .Rproj

11

Page 13: Data Analysis and Visualization: R Workflow

Packrat

Packrat - stores your package dependencies inside the project.

Advantages:

1. Isolation: Installing a new or updated package for oneproject will not break your other projects.

2. Portability: Easily transport your projects from onecomputer to another, even across different platforms.

3. Reproducibility: Packrat records the exact packageversions you depend on.

http://rstudio.github.io/packrat/

12

Page 14: Data Analysis and Visualization: R Workflow

Project Planning

Page 15: Data Analysis and Visualization: R Workflow

Project Planning

14

“smart preparation minimizes work” (Berkun, 2005)

https://csgillespie.github.io/efficientR/workflow.html

Page 16: Data Analysis and Visualization: R Workflow

SMART Criteria

1. Specific: is the objective clearly defined and self-contained?2. Measurable: is there a clear indication of its completion?3. Attainable: can the target be achieved?4. Realistic: have sufficient resources been allocated to the

task?5. Time-bound: is there an associated completion date or

milestone?

https://csgillespie.github.io/efficientR/workflow.html

15

Page 17: Data Analysis and Visualization: R Workflow

Gantt chart

16

} Section refers to the project’s section (useful for largeprojects, with milestones)

} Line refers to a task} Example: Planning begins on Jan 16 2018 and lasts for

10 days

https://mermaidjs.github.io/gantt.html

Page 18: Data Analysis and Visualization: R Workflow

Project Management

Large projects: regular meetings, division of labour, trackingprogress, issues and priorities (Gillespie and Lovelace, 2017,Chapter 4)

1. The interactive code sharing site GitHub2. ZenHub, a browser plugin that is “the first and only project

management suite that works natively within GitHub”3. Web-based and easy-to-use tools such as Trello4. Dedicated desktop project management software such as

ProjectLibre and GanttProject

5. Fully featured, enterprise scale open source projectmanagement systems such as OpenProject and redmine

17

Page 19: Data Analysis and Visualization: R Workflow

Documenting and Reporting

Page 20: Data Analysis and Visualization: R Workflow

R Options

19

Page 21: Data Analysis and Visualization: R Workflow

R Script

https://google.github.io/styleguide/Rguide.xml

20

Page 22: Data Analysis and Visualization: R Workflow

General Layout

1. Copyright statement comment2. Author comment (Use #)3. File description comment, including purpose of program,

inputs, and outputs4. source() and library() statements

◦ source(“file name”) - read R code from a file◦ library(name) - package name

5. Function definitions6. Executed statements, if applicable (e.g., plot)

21

Page 23: Data Analysis and Visualization: R Workflow

R Style Guide: Naming

File NamesGood Bad

predict_ad_revenue.R foo.R

Variable NamesGood Bad

variable.name (preferred)variableName (accepted) variable_Name

Function Names (use action verbs)

Good Bad

CalculateAvgClickscalculate_avg_clickscalculateAvgClicks

22

Page 24: Data Analysis and Visualization: R Workflow

R Style Guide: Spacing and Assignment

} Spaces around all binary operators (=, +, -, <-)} Use <-, not =, for assignment} Space after a comma

Incorrect: total == sum(x[1,])

Correct: total <- sum(x[1, ])

Learn more: http://adv-r.had.co.nz/Style.html

23

Page 25: Data Analysis and Visualization: R Workflow

R Style Guide: Spacing and Assignment

} Spaces around all binary operators (=, +, -, <-)} Use <-, not =, for assignment} Space after a comma

Incorrect: total == sum(x[1,])

Correct: total <- sum(x[1, ])

Learn more: http://adv-r.had.co.nz/Style.html

23

Page 26: Data Analysis and Visualization: R Workflow

R Markdown

24

“R Markdown files are the ultimate R reporting tool”(Grolemund, 2014)

R Markdown is a file formatfor making dynamicdocuments with R.

Markdown - an easy-to-writeplain text format.

R Markdown files can beconverted into HTML, PDF,and Word documents.

Page 27: Data Analysis and Visualization: R Workflow

R Markdown

File→ new File→ R Markdown

25

Page 28: Data Analysis and Visualization: R Workflow

Did You Know?

Markdown is used:

} Github} StackOverflow} Reddit

26

Page 29: Data Analysis and Visualization: R Workflow

R Markdown

27https://en.wikipedia.org/wiki/Markdown

Page 30: Data Analysis and Visualization: R Workflow

Rendering

28

https://en.wikipedia.org/wiki/Markdown

Page 31: Data Analysis and Visualization: R Workflow

R Chunks

29http://shiny.rstudio.com/articles/rmarkdown.html

Page 32: Data Analysis and Visualization: R Workflow

R Chunks

30http://shiny.rstudio.com/articles/rmarkdown.html

Page 33: Data Analysis and Visualization: R Workflow

R Chunks

31http://shiny.rstudio.com/articles/rmarkdown.html

Page 34: Data Analysis and Visualization: R Workflow

R Chunks - Inline

32http://shiny.rstudio.com/articles/rmarkdown.html

Page 35: Data Analysis and Visualization: R Workflow

Gallery: Get Inspiration

33

http://rmarkdown.rstudio.com/gallery.html

Page 36: Data Analysis and Visualization: R Workflow

Gallery: Get Inspiration

34

http://rmarkdown.rstudio.com/gallery.html

Page 37: Data Analysis and Visualization: R Workflow

Gallery: Get Inspiration

35

http://rmarkdown.rstudio.com/gallery.html

Page 38: Data Analysis and Visualization: R Workflow

Gallery: Get Inspiration

36

http://rmarkdown.rstudio.com/gallery.html

Page 39: Data Analysis and Visualization: R Workflow

Data Analysis and Visualization Work-flow

Page 40: Data Analysis and Visualization: R Workflow

Tidyverse Workflow

38

1. Import data into R: read_csv(), read_line(), read_delim()2. Tidy data - variables per column, observation per row3. Transformwith dplyr4. Visualizewith ggplot and plotly

(Wickham and Grolemund, 2017)

Page 41: Data Analysis and Visualization: R Workflow

Data Transformation - dplyr

} Pick observations by their values - filter()} Reorder the rows - arrange()} Pick variables by their names -select()} Create new variables with functions of existing variables

mutate()

} Collapse many values down to a single summarysummarise()

(Wickham and Grolemund, 2017, Chapter 5)Practice: http://r4ds.had.co.nz/transform.html

39

Page 42: Data Analysis and Visualization: R Workflow

Visualization - Grammar of Graphics

Recommended Reading -http://vita.had.co.nz/papers/layered-grammar.pdf

Visualization Template

40

Page 43: Data Analysis and Visualization: R Workflow

Visualization Template

41

Page 44: Data Analysis and Visualization: R Workflow

Exploratory Data Visualization

http://r4ds.had.co.nz/exploratory-data-analysis.html

42

Page 45: Data Analysis and Visualization: R Workflow

Interactive Visualization - Plotly

https://plot.ly/ggplot2/43

Page 46: Data Analysis and Visualization: R Workflow

Publishing

Page 47: Data Analysis and Visualization: R Workflow

Publishing

45

Page 48: Data Analysis and Visualization: R Workflow

Publishing

46

Page 49: Data Analysis and Visualization: R Workflow

RPubs

47

Page 50: Data Analysis and Visualization: R Workflow

General Information

Page 51: Data Analysis and Visualization: R Workflow

Course

49

Page 52: Data Analysis and Visualization: R Workflow

References

50

Page 53: Data Analysis and Visualization: R Workflow

THEEND

@katycns@obscrivn#IVMOOC