popular industry applications of r
TRANSCRIPT
CLIENT TECHNOLOGY LANDSCAPE
Data Manipulation, Transformation, Automation, Cleansing
Dashboards with data aggregations,
filtering
Dashboards that leverage R’s statistical
and visualization capabilities
4
5
CROSS-INDUSTRY APPLICATIONS OF R
Data Science / Machine Learning R-Based Workflows (migrating away
from SAS, SPSS, Stata) R Wrappers Around APIs Data Munging and Visualization Rapid Prototyping Training
6
DATA SCIENCE / MACHINE LEARNING
Machine learning is industry-agnostic Domain expertise can help identify key
drivers and variables, interpret results Underlying algorithms are similar
7
Visualize and Explore Genealogy Process for Drug Manufacturing– Enable scientists to track the usage of any material throughout the product lifecycle– Predict whether specific raw materials could have a negative influence on drug
substance product quality attributes– Built using d3.js and Shiny
DATA SCIENCE / MACHINE LEARNING
8https://github.com/jennybc/googlesheets
Data cleansing, QC, model
building, export to Google Sheets via
Jenny Bryan’s googlesheets R
Package
scra
pe d
ata
DATA SCIENCE / MACHINE LEARNING
9
R-BASED WORKFLOWS
Build robust data pipelines that minimize human error
Enable data integrity checks and standards at every step of the process
Promote collaboration via reproducible scripts
10
Hospitals– Migrate away from SAS-based workflows due to not
qualifying for academic licensing (too expensive)– Migrate away from Excel-based reporting and toward an
R-based workflow
R-BASED WORKFLOWS
R-BASED WORKFLOWS
City Government– Streamline analysis of various data sources such as Waze
Traffic Data, approved building permits– Identified problematic days, times and traffic routes
11
Waze files were processed and data were extracted via Python
files were marked as malformed as they contained no Waze related data or were completely empty
total rows visualized in Shiny
51,474
881
3,262,078
12
R-BASED WORKFLOWS
Telecommunications– Analysts were manually producing monthly client reports by
copying, pasting and formatting Excel sheets from SQL output– Streamlined process via R and Shiny so that results were
stored, timestamped, produced faster, and referenceable
13
R WRAPPERS AROUND APIS
Get the data into a suitable environment for data analysis and visualization faster
ZendeskR1 – allows users to pull customer service ticket data into R for analysis such as time-to-resolution, text analysis, etc
StattleshipR2 – bring live and up-to-date sports data directly into R
1. https://github.com/tcash21/zendeskR2. https://github.com/stattleship/stattleship-r
14
Transform data for analysis (tidyr, reshape2) Merge and clean disparate Excel sheets in R feed into Tableau
or Shiny to allow clients to easily visualize and interact with data Imputation of missing data, normalization, calculate percentiles
by groups, etc.– Tasks like these are simple and repeatable in R
DATA MUNGING AND VISUALIZATION
15
Structured survey data in Excel– Able to visualize and interact with data in a new way– Data is updated monthly and sent to clients
• run data through R scripts, regenerate reports and Tableau dashboards
DATA MUNGING ANDVISUALIZATION
18
Define the problem or question Load and explore the data in R QC: Identify problematic data, summary statistics Determine if there is enough and the right data to solve
the problem Quickly build Shiny dashboards to allow end user to
interact with the data and ask “what-if” questions Refine requirements based on real data Save time and money on dev cycles
RAPID PROTOTYPING
21
TRAININGTrained Actuaries and Business Analysts to use R and Shiny for reproducibility, analyzing tranches, simulating economies
Developed curriculum for Level, a full-time 8-week bootcamp style data analytics training
Developed curriculum for an 8-week part-time sports-themed data analytics course geared towards non-programmers
Trained Business Analysts to move away from Excel towards R and Shiny for analyzing patient data and predicting health outcomes
22
HOW R HAS INCREASED PRODUCTIVITY FOR CLIENTS
Reproducible analyses and reports Collaborative workflow Advanced statistical analysis Shiny provides more customization and statistical
capabilities that Tableau cannot Rapid prototyping for data-driven products
23
FUTURE TRENDS R continues to increase in popularity due to its cost, effectiveness and
accessibility (free online trainings e.g. www.datacamp.com, https://www.rstudio.com/online-learning/, etc.)
Tableau + R will continue to be a powerful combination Not as much machine learning in the industry as the hype implies “Big” data not as much of a problem as disparate, messy data in silos
http://r4stats.com/2016/04/19/rs-growth-continues-to-accelerate/http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html
24
SUGGESTIONS Keep it simple: start small and prove value first Rapidly prototype solutions via R and Shiny before a full in-
house or outsourced build Create data science sandboxes to encourage collaboration
and innovation Hold regular data hackathon or brainstorming meetings with
data scientists and end users (domain expertise can help uncover the right questions to ask)