popular industry applications of r

25
EARL BOSTON 2016 TANYA CASHORALI @TANYACASH21

Upload: tanya-cashorali

Post on 07-Feb-2017

49 views

Category:

Technology


0 download

TRANSCRIPT

EARL BOSTON 2016TANYA CASHORALI

@TANYACASH21

2004 201220072005 2006 20142013 2015

SPEAKER PROFILE

TANYA CASHORALI@TANYACASH21

2

2016

3

CLIENT TECHNOLOGY LANDSCAPE

Data Manipulation, Transformation, Automation, Cleansing

Dashboards with data aggregations,

filtering

Dashboards that leverage R’s statistical

and visualization capabilities

4

5

CROSS-INDUSTRY APPLICATIONS OF R

Data Science / Machine Learning R-Based Workflows (migrating away

from SAS, SPSS, Stata) R Wrappers Around APIs Data Munging and Visualization Rapid Prototyping Training

6

DATA SCIENCE / MACHINE LEARNING

Machine learning is industry-agnostic Domain expertise can help identify key

drivers and variables, interpret results Underlying algorithms are similar

7

Visualize and Explore Genealogy Process for Drug Manufacturing– Enable scientists to track the usage of any material throughout the product lifecycle– Predict whether specific raw materials could have a negative influence on drug

substance product quality attributes– Built using d3.js and Shiny

DATA SCIENCE / MACHINE LEARNING

8https://github.com/jennybc/googlesheets

Data cleansing, QC, model

building, export to Google Sheets via

Jenny Bryan’s googlesheets R

Package

scra

pe d

ata

DATA SCIENCE / MACHINE LEARNING

9

R-BASED WORKFLOWS

Build robust data pipelines that minimize human error

Enable data integrity checks and standards at every step of the process

Promote collaboration via reproducible scripts

10

Hospitals– Migrate away from SAS-based workflows due to not

qualifying for academic licensing (too expensive)– Migrate away from Excel-based reporting and toward an

R-based workflow

R-BASED WORKFLOWS

R-BASED WORKFLOWS

City Government– Streamline analysis of various data sources such as Waze

Traffic Data, approved building permits– Identified problematic days, times and traffic routes

11

Waze files were processed and data were extracted via Python

files were marked as malformed as they contained no Waze related data or were completely empty

total rows visualized in Shiny

51,474

881

3,262,078

12

R-BASED WORKFLOWS

Telecommunications– Analysts were manually producing monthly client reports by

copying, pasting and formatting Excel sheets from SQL output– Streamlined process via R and Shiny so that results were

stored, timestamped, produced faster, and referenceable

13

R WRAPPERS AROUND APIS

Get the data into a suitable environment for data analysis and visualization faster

ZendeskR1 – allows users to pull customer service ticket data into R for analysis such as time-to-resolution, text analysis, etc

StattleshipR2 – bring live and up-to-date sports data directly into R

1. https://github.com/tcash21/zendeskR2. https://github.com/stattleship/stattleship-r

14

Transform data for analysis (tidyr, reshape2) Merge and clean disparate Excel sheets in R feed into Tableau

or Shiny to allow clients to easily visualize and interact with data Imputation of missing data, normalization, calculate percentiles

by groups, etc.– Tasks like these are simple and repeatable in R

DATA MUNGING AND VISUALIZATION

15

Structured survey data in Excel– Able to visualize and interact with data in a new way– Data is updated monthly and sent to clients

• run data through R scripts, regenerate reports and Tableau dashboards

DATA MUNGING ANDVISUALIZATION

16

DATA MUNGING ANDVISUALIZATION

17

RAPID PROTOTYPING

18

Define the problem or question Load and explore the data in R QC: Identify problematic data, summary statistics Determine if there is enough and the right data to solve

the problem Quickly build Shiny dashboards to allow end user to

interact with the data and ask “what-if” questions Refine requirements based on real data Save time and money on dev cycles

RAPID PROTOTYPING

19

RAPID PROTOTYPING

20

RAPID PROTOTYPING

21

TRAININGTrained Actuaries and Business Analysts to use R and Shiny for reproducibility, analyzing tranches, simulating economies

Developed curriculum for Level, a full-time 8-week bootcamp style data analytics training

Developed curriculum for an 8-week part-time sports-themed data analytics course geared towards non-programmers

Trained Business Analysts to move away from Excel towards R and Shiny for analyzing patient data and predicting health outcomes

22

HOW R HAS INCREASED PRODUCTIVITY FOR CLIENTS

Reproducible analyses and reports Collaborative workflow Advanced statistical analysis Shiny provides more customization and statistical

capabilities that Tableau cannot Rapid prototyping for data-driven products

23

FUTURE TRENDS R continues to increase in popularity due to its cost, effectiveness and

accessibility (free online trainings e.g. www.datacamp.com, https://www.rstudio.com/online-learning/, etc.)

Tableau + R will continue to be a powerful combination Not as much machine learning in the industry as the hype implies “Big” data not as much of a problem as disparate, messy data in silos

http://r4stats.com/2016/04/19/rs-growth-continues-to-accelerate/http://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

24

SUGGESTIONS Keep it simple: start small and prove value first Rapidly prototype solutions via R and Shiny before a full in-

house or outsourced build Create data science sandboxes to encourage collaboration

and innovation Hold regular data hackathon or brainstorming meetings with

data scientists and end users (domain expertise can help uncover the right questions to ask)

25

QUESTIONS?

[Insert Subsection Picture Here]