learn to use dplyr (feb 2015 philly r user meetup)

11
dplyr package Fan Li @ Philly R User Meetup (R<-Gang) Learn to use Demo: http://rpubs.com/lifan/phillyweather Source code: https://github.com/lifan0127/meetup_dplyr_talk

Upload: fan-li

Post on 15-Jul-2015

266 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Learn to use dplyr (Feb 2015 Philly R User Meetup)

dplyr package

Fan Li @ Philly R User Meetup (R<-Gang)

Learn to use

Demo: http://rpubs.com/lifan/phillyweather

Source code: https://github.com/lifan0127/meetup_dplyr_talk

Page 2: Learn to use dplyr (Feb 2015 Philly R User Meetup)

What is dplyr

A package developed by Hadley Wickham to help

transform tabular data.

● Unified, intuitive syntax

● Fast implementation in C++

● Support various data backends (dataframe, RDB, etc.)

install.package(“dplyr”) # Version 0.4

Page 3: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Basic Operators (Verbs)select(data, col.1, col.2, …) select existing variables (columns)

filter(data, condition.1, condition.2, ...) filter table by conditions

arrange(data, col.1, col.2, …) sort table by variables or other logicals

mutate(data, newcol = …) create new variables

group_by() + summarize() summarize data per group

For a graphic explanation, see Garrett Grolemund’s talk.

Page 4: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Other Helper Functions

transmute

tally

top_n

summarize_each

sample_n/sample_frac

distinct

rename

slice

n_distinct

first/last/nth

Type ?function_name in R to find how to use

Page 5: Learn to use dplyr (Feb 2015 Philly R User Meetup)

%>% Pipe Operator

data %>% function() function(data, …)

foo() %>% bar() bar(foo())

Very useful to convert nested structure into

more logical chain expression.

=

=

Page 6: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Examplefeb.snow2 <- weather %>%

select(Year, Month, Day, Snow) %>% # Step 1. Select relevant variables

filter(Year >= 1885, Month == 2) %>% # Step 2. Filter by year and month

group_by(Year) %>% # Step 3. Group by year

summarize(

Snow.Sum = sum(Snow, na.rm = TRUE)) %>% # Step 4. Summarize monthly snowfall

arrange(-Snow.Sum, -Year) # Step 5. Sort table by monthly snowfall/year

Demo with Philly weather data (1872-2001)

Page 7: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Performance Benefit

C++ implementation

Lazy evaluation

Avoid accidental, expensive operations

Usually must faster than base R. Otherwise it will tell you

with a progress bar.

Page 8: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Data Backends

Supports the three most popular open source

databases (sqlite, mysql and postgresql), and

Google’s bigquery.

http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html

Page 10: Learn to use dplyr (Feb 2015 Philly R User Meetup)

Date Meeting Title Speaker Link

20150122 Advanced Data Manipulation Mike McCann [Slide]

20150121 Berkeley Institute for Data Science Pipelines for Data Analysis Hadley Wickham [Video]

20150114 RStudio Webinar Data Wrangling with R Garrett Grolemund [Slide][Video][Data]

20150113 Upstate Data Analytics Wallace Campbell [Video][Data]

20141202 Sheffield R Users Group how to find help online, data manipulation with plyr and dplyr Mathew Hall [Slide]

20141126 Budapest BI Introduction to the dplyr R package Romain Francois [Slide]

20141111 LA R users group Benchmarking dplyr and data.table (with biggish data) Szilard Pafka [Slide][Data]

20141025 ACM DataScience Camp Data Manipulation Using R Ram Narasimhan [Slide][Video]

20141022 Becoming a data ninja with dplyr Devin Pastoor [Slide]

20141007 Davis R Users' Group dplyr: Data manipulation in R made easy Michael A. Levy [Slide][Video]

20140825 RStudio Webinar Hands-on dplyr tutorial for faster data manipulation in R [Slide][Video]

20140701 USER2014 dplyr: a grammar of data manipulation Hadley Wickham [Video]

20140630 USER2014 Data manipulation with dplyr Hadley Wickham [Slide][Video][Data]

20140214 Stanford HCI Group Expressing yourself in R Hadley Wickham [Slide][Data]

See updated list at: https://github.com/lifan0127/meetup_dplyr_talk

Page 11: Learn to use dplyr (Feb 2015 Philly R User Meetup)

“Hadley Ecosystem”

Visualization

ggplot, ggmap, ggvis

Data Wrangling

reshape, plyr, dplyr, tidyr

Web

rvest, httr, xml2

Other tools

stringr, lubridate, heaven

https://github.com/hadley (Github Repo)

http://adv-r.had.co.nz/ (Advanced R Book)

http://r-pkgs.had.co.nz/ (R Packages Book)