microsoft nerd talk - r and tableau - 2-4-2013

Post on 27-Jan-2015

107 Views

Category:

Sports

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation is from a talk I gave at Microsoft NERD for the Boston Predictive Analytics Meetup group.

TRANSCRIPT

TABLEAU AND RBeauty and the Beast

Tanya Cashorali

@tanyacash21

R – THE WORKHORSE

TABLEAU – MAKES BEAUTIFUL THINGS HAPPEN

BUT SO CAN R

TOGETHER THEY ARE UNSTOPPABLE

SERIOUSLY THOUGH, WHAT IS R?

Open source Statistical Programming Environment 4,211 community contributed packages on CRAN

as of 1/31/2013 - http://cran.r-project.org/ Interpreted - Terminal or GUI (Rstudio)

WHAT IS TABLEAU?

Data visualization software for interactive business intelligence

Spun out of Stanford University in 2003, current CTO was a founder of Pixar Animation Studios

Drag and drop interface

R AND TABLEAU

Various database drivers

Tableau

Dashboards

R

Write to .csv

Live connection

data mungedata model

Insert using the RODBC package

START WITH THE R WORKHORSE Read data into R

pbp2012 <- read.csv(file=“2012_nfl_pbp_data_reg_season.csv”, header=TRUE)

View the data str(pbp2012)

START WITH THE R WORKHORSE (CONT’D)

Conduct pre-processing or “data munging” is.na(pbp2012$down); as.numeric(pbp2012$ydline)

Slice and dice subset(pbp2012, qtr == 1)

Write to CSV for consumption by Tableau Public write.csv(pbp2012, file=“pbp2012.csv",

row.names=FALSE)

R NO HUDDLE EXAMPLE

## read in the dataseasons <- c(2002:2011)pbp <- read.csv("2012_nfl_pbp_data_reg_season.csv", header=TRUE, stringsAsFactors=FALSE)n1 <- read.csv("2002_nfl_pbp_data.csv", header=TRUE, stringsAsFactors=FALSE)pbp <- pbp[,-which(is.na(match(colnames(pbp), colnames(n1))))]for(season in seasons){

n1 <- read.csv(paste(season, "_nfl_pbp_data.csv", sep=""), header=TRUE, stringsAsFactors=FALSE)

pbp <- rbind(pbp, n1)} ## grab the no huddle playsnh <- pbp[grep("Huddle", pbp$description),]  ## calculate the percentage of no-huddle plays each team rannh.by.team <- table(nh$off) 

R NO HUDDLE EXAMPLE (CONT’D)

ggplot(nh.by.team, aes(x=reorder(Var1, -Freq), y=Freq)) + geom_bar(stat="identity") + labs(x="Team", y="Number of Plays", title="Number of No Huddle Plays Ran by Team 2002-2012") + theme(axis.text.x = element_text(angle = 50, hjust = 1))

R NO HUDDLE EXAMPLE (CONT’D)## table by offensive team and quarter

nh.df <- data.frame(table(nh$off, nh$qtr))[-1,]

colnames(nh.df) <- c("Team", "Quarter", "Number")

## plot number of no huddle plays by team by quarter

ggplot(nh.df, aes(x=reorder(Team, Number), y=Number, fill=Quarter)) + geom_bar() + labs(x="Team", y="Number", title="Number of No Huddle Plays in the NFL by Team by Quarter") + theme(axis.text.x = element_text(angle = 50, hjust = 1))

TABLEAU-IFIED

http://sportsdataviz.com/percentage-no-huddle-plays-by-nfl-team-by-season-2002-2012/

## write file for Tableauwrite.table(nh.by.team, file=“noHuddles.txt", sep="\t", row.names=FALSE)

IS THE RAVENS OFFENSE PREDICTABLE?

http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/

## Read in the data generated by play_parser.pyplays <- read.csv(“plays.csv", header=TRUE, stringsAsFactors=FALSE)

## extract Baltimore offensive playsplays <- plays[grep("BAL", plays$gameid),]plays <- subset(plays, def != "BAL")

## 1,625 offensive BAL plays in the 2012 regular seasonnrow(plays)

## classify the other play types that are not passes or runsplays$type <- as.character(plays$type)plays[grep("PENALTY", plays$desc),]$type <- "Penalty"plays[grep("kick", plays$desc),]$type <- "Kick"plays[grep("punt", plays$desc),]$type <- "Punt"plays[grep("field goal", plays$desc),]$type <- "FG"

## create a binned variable yardsToGoplays$yardsToGo <- "0"plays[plays$ydline >= 80,]$yardsToGo <- ">= 80"plays[plays$ydline >= 50 & plays$ydline < 80,]$yardsToGo  <- "50 <= yardsToGo < 80"plays[plays$ydline >= 30 & plays$ydline < 50,]$yardsToGo  <- "30 <= yardsToGo < 50"plays[plays$ydline >= 10 & plays$ydline < 30,]$yardsToGo  <- "10 <= yardsToGo < 30"plays[plays$ydline < 10,]$yardsToGo  <- "< 10"

## write out file for Tableauwrite.table(plays, file="BALplays2012regSeason.csv", row.names=FALSE)

IS THE RAVENS OFFENSE PREDICTABLE? (CONT’D)

http://sportsdataviz.com/superbowl-xlvii-2013-baltimore-ravens-offense-predictability/

Set the scenario for each play during the Superbowl and predicted either run or pass based on percentage.

RESULTS AND CONSIDERATIONS

Predicted plays correctly 60.3% of the time Missing variables (defensive and offensive formations, crowd

noise, weather, injured players, power outage, etc.) Change in Ravens’ offensive coordinator in week 15 Lack of data

SUMMARY

Initial analysis in R Explore the data Pre-process Write to file for consumption by Tableau Public or to

database for Tableau Desktop Create interactive dashboards in Tableau in minutes

that can be shared via a web interface (free = publicly available, paid = private internally hosted Tableau Server)

APPENDIX

TABLEAU DESKTOP FEATURE COMPARISON

  Public Edition Personal Edition Professional Edition

Operating System Windows application Windows application Windows application

Saves to the Tableau Public Website?

Only Option Option

Opens Data in Files? Yes Yes Yes

Opens Data in Databases?

No No Yes

Save Work Locally? No Yes Yes

Export Results Locally?

No Yes Yes

Data Limitation? 100,000 rows Unlimited Unlimited

Publish to Tableau Server?

No No Yes

Cost Free $999 $1,999

top related