microsoft nerd talk - r and tableau - 2-4-2013
DESCRIPTION
This presentation is from a talk I gave at Microsoft NERD for the Boston Predictive Analytics Meetup group.TRANSCRIPT
![Page 1: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/1.jpg)
TABLEAU AND RBeauty and the Beast
Tanya Cashorali
@tanyacash21
![Page 2: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/2.jpg)
R – THE WORKHORSE
![Page 3: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/3.jpg)
TABLEAU – MAKES BEAUTIFUL THINGS HAPPEN
![Page 4: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/4.jpg)
BUT SO CAN R
![Page 5: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/5.jpg)
TOGETHER THEY ARE UNSTOPPABLE
![Page 6: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/6.jpg)
SERIOUSLY THOUGH, WHAT IS R?
Open source Statistical Programming Environment 4,211 community contributed packages on CRAN
as of 1/31/2013 - http://cran.r-project.org/ Interpreted - Terminal or GUI (Rstudio)
![Page 7: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/7.jpg)
WHAT IS TABLEAU?
Data visualization software for interactive business intelligence
Spun out of Stanford University in 2003, current CTO was a founder of Pixar Animation Studios
Drag and drop interface
![Page 8: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/8.jpg)
R AND TABLEAU
Various database drivers
Tableau
Dashboards
R
Write to .csv
Live connection
data mungedata model
Insert using the RODBC package
![Page 9: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/9.jpg)
START WITH THE R WORKHORSE Read data into R
pbp2012 <- read.csv(file=“2012_nfl_pbp_data_reg_season.csv”, header=TRUE)
View the data str(pbp2012)
![Page 10: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/10.jpg)
START WITH THE R WORKHORSE (CONT’D)
Conduct pre-processing or “data munging” is.na(pbp2012$down); as.numeric(pbp2012$ydline)
Slice and dice subset(pbp2012, qtr == 1)
Write to CSV for consumption by Tableau Public write.csv(pbp2012, file=“pbp2012.csv",
row.names=FALSE)
![Page 11: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/11.jpg)
R NO HUDDLE EXAMPLE
## read in the dataseasons <- c(2002:2011)pbp <- read.csv("2012_nfl_pbp_data_reg_season.csv", header=TRUE, stringsAsFactors=FALSE)n1 <- read.csv("2002_nfl_pbp_data.csv", header=TRUE, stringsAsFactors=FALSE)pbp <- pbp[,-which(is.na(match(colnames(pbp), colnames(n1))))]for(season in seasons){
n1 <- read.csv(paste(season, "_nfl_pbp_data.csv", sep=""), header=TRUE, stringsAsFactors=FALSE)
pbp <- rbind(pbp, n1)} ## grab the no huddle playsnh <- pbp[grep("Huddle", pbp$description),] ## calculate the percentage of no-huddle plays each team rannh.by.team <- table(nh$off)
![Page 12: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/12.jpg)
R NO HUDDLE EXAMPLE (CONT’D)
ggplot(nh.by.team, aes(x=reorder(Var1, -Freq), y=Freq)) + geom_bar(stat="identity") + labs(x="Team", y="Number of Plays", title="Number of No Huddle Plays Ran by Team 2002-2012") + theme(axis.text.x = element_text(angle = 50, hjust = 1))
![Page 13: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/13.jpg)
R NO HUDDLE EXAMPLE (CONT’D)## table by offensive team and quarter
nh.df <- data.frame(table(nh$off, nh$qtr))[-1,]
colnames(nh.df) <- c("Team", "Quarter", "Number")
## plot number of no huddle plays by team by quarter
ggplot(nh.df, aes(x=reorder(Team, Number), y=Number, fill=Quarter)) + geom_bar() + labs(x="Team", y="Number", title="Number of No Huddle Plays in the NFL by Team by Quarter") + theme(axis.text.x = element_text(angle = 50, hjust = 1))
![Page 14: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/14.jpg)
TABLEAU-IFIED
http://sportsdataviz.com/percentage-no-huddle-plays-by-nfl-team-by-season-2002-2012/
## write file for Tableauwrite.table(nh.by.team, file=“noHuddles.txt", sep="\t", row.names=FALSE)
![Page 15: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/15.jpg)
IS THE RAVENS OFFENSE PREDICTABLE?
http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/
## Read in the data generated by play_parser.pyplays <- read.csv(“plays.csv", header=TRUE, stringsAsFactors=FALSE)
## extract Baltimore offensive playsplays <- plays[grep("BAL", plays$gameid),]plays <- subset(plays, def != "BAL")
## 1,625 offensive BAL plays in the 2012 regular seasonnrow(plays)
## classify the other play types that are not passes or runsplays$type <- as.character(plays$type)plays[grep("PENALTY", plays$desc),]$type <- "Penalty"plays[grep("kick", plays$desc),]$type <- "Kick"plays[grep("punt", plays$desc),]$type <- "Punt"plays[grep("field goal", plays$desc),]$type <- "FG"
## create a binned variable yardsToGoplays$yardsToGo <- "0"plays[plays$ydline >= 80,]$yardsToGo <- ">= 80"plays[plays$ydline >= 50 & plays$ydline < 80,]$yardsToGo <- "50 <= yardsToGo < 80"plays[plays$ydline >= 30 & plays$ydline < 50,]$yardsToGo <- "30 <= yardsToGo < 50"plays[plays$ydline >= 10 & plays$ydline < 30,]$yardsToGo <- "10 <= yardsToGo < 30"plays[plays$ydline < 10,]$yardsToGo <- "< 10"
## write out file for Tableauwrite.table(plays, file="BALplays2012regSeason.csv", row.names=FALSE)
![Page 16: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/16.jpg)
IS THE RAVENS OFFENSE PREDICTABLE? (CONT’D)
http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/
Set the scenario for each play during the Superbowl and predicted either run or pass based on percentage.
![Page 17: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/17.jpg)
RESULTS AND CONSIDERATIONS
Predicted plays correctly 60.3% of the time Missing variables (defensive and offensive formations, crowd
noise, weather, injured players, power outage, etc.) Change in Ravens’ offensive coordinator in week 15 Lack of data
![Page 18: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/18.jpg)
SUMMARY
Initial analysis in R Explore the data Pre-process Write to file for consumption by Tableau Public or to
database for Tableau Desktop Create interactive dashboards in Tableau in minutes
that can be shared via a web interface (free = publicly available, paid = private internally hosted Tableau Server)
![Page 19: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/19.jpg)
REFERENCES
NFL Play by Play Data (2002 – 2012) http://www.advancednflstats.com/2010/04/play-by-play-data.html
Python parser for NFL PBP Data http://www.10flow.com/
Tableau Public http://www.tableausoftware.com/public/
R http://cran.r-project.org/ SportsDataViz - http://www.sportsdataviz.com/
![Page 20: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/20.jpg)
APPENDIX
![Page 21: Microsoft NERD Talk - R and Tableau - 2-4-2013](https://reader033.vdocuments.net/reader033/viewer/2022051412/54c6d4914a795973528b4581/html5/thumbnails/21.jpg)
TABLEAU DESKTOP FEATURE COMPARISON
Public Edition Personal Edition Professional Edition
Operating System Windows application Windows application Windows application
Saves to the Tableau Public Website?
Only Option Option
Opens Data in Files? Yes Yes Yes
Opens Data in Databases?
No No Yes
Save Work Locally? No Yes Yes
Export Results Locally?
No Yes Yes
Data Limitation? 100,000 rows Unlimited Unlimited
Publish to Tableau Server?
No No Yes
Cost Free $999 $1,999