managing large (and small) r based solutions with r suite

62
Copyright (c) WLOG Solutions Managing large (and small) R based solutions with R Suite Wit Jakuczun, WLOG Solutions 2017-09-29

Upload: wit-jakuczun

Post on 22-Jan-2018

304 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Copyright (c) WLOG Solutions

Managing large (and small) R based solutions with

R SuiteWit Jakuczun,

WLOG Solutions2017-09-29

Copyright (c) WLOG Solutions 2

Open-source is a leader the race in advanced analytics

using GPU engines and power of its community.

Copyright (c) WLOG Solutions 3

Copyright (c) WLOG Solutions 4

Example?

Copyright (c) WLOG Solutions 5

4000x4 elastic-net models (CV-5) for 45Kx10K datasetin 1,5 minute!

Copyright (c) WLOG Solutions 6

Join 21st centuRy today!

Copyright (c) WLOG Solutions

What is R?

7

Copyright (c) WLOG Solutions 8

Dynamically interpreted general programming language

Copyright (c) WLOG Solutions 9

Stable open-source productdeveloped by R Foundation

since ~1995 year.

Copyright (c) WLOG Solutions 10

Created for data analysis.

Copyright (c) WLOG Solutions 11

flights %>%

group_by(year, month, day) %>%

select(arr_delay, dep_delay) %>%

summarise(

arr = mean(arr_delay, na.rm = TRUE),

dep = mean(dep_delay, na.rm = TRUE)

) %>%

filter(arr > 30 | dep > 30)

z <- scaled_input %>%

layer_convolution2D(c(5,5), 32, pad = TRUE) %>%

layer_max_pooling(c(3,3), c(2,2)) %>%

layer_convolution2D(c(3,3), 48) %>%

layer_max_pooling(c(3,3), c(2,2)) %>%

layer_convolution2D(c(3,3), 64) %>%

layer_dense(96) %>%

layer_dropout(0.5) %>%

layer_dense(num_output_classes,

activation = activation_softmax())

Data preparation Predictive model building

Copyright (c) WLOG Solutions 12

R is a community.

Copyright (c) WLOG Solutions 13

CRAN10K+ packages

GitHubmore and more

popular

Copyright (c) WLOG Solutions 14

However, out-of-the-box R does notprovide many essential features requiredIn a large-scale production deployment!

Copyright (c) WLOG Solutions 15

R Software DevelopmentWhat is a large scale?

Copyright (c) WLOG Solutions 16

R software development vs

R scripting

Copyright (c) WLOG Solutions 17

Large scale ~ 10K+ LOCSmall scale ~ 1K LOC

Copyright (c) WLOG Solutions 18

Examples of large scale projects

Copyright (c) WLOG Solutions 19

Historicaldata

Traffic forecasting

Workforce optimiser

Busines rulesHuman intervention

ResultsEfficiency

curves

Workforce optimisation

Copyright (c) WLOG Solutions 20

Historicaldata

Cash flow “forecasting”

Optimisation phase I

Optimisation phase II

Business rulesHuman intervention

Results

FONG

Cash optimization

Copyright (c) WLOG Solutions 21

Production DevContinuousintegration

Version control

R Studio Server

Data-science team

Copyright (c) WLOG Solutions 22

R Software DevelopmentWhat does R give us “out-of-the-box”?

Copyright (c) WLOG Solutions 23

Dev

Version controlContinuous

Integration & Deployment

Prod

Software development process

Copyright (c) WLOG Solutions 24

What is nice about R?

Copyright (c) WLOG Solutions 25

Package help system

Package dependency

system

External data in packages

Vignettes Tests

Copyright (c) WLOG Solutions 26

Where does R have its rough edges?

Copyright (c) WLOG Solutions 27

CRAN (MRAN) Github Other

Dev environment

Installed packages

Local CRANSource code repo

Copyright (c) WLOG Solutions 28

install.packages(“ggplot2”)

Copyright (c) WLOG Solutions 29

CRAN (MRAN) Github Other

Dev environment

Installed packages

Local CRANSource code repo

Copyright (c) WLOG Solutions 30

TeaserAlcohol concentration in blood

Copyright (c) WLOG SolutionsCopyright (c) WLOG Solutions 31

Instruction (Windows only!)

1. Install vanilla R 3.4.22. Download package

(https://goo.gl/RnZAQg)3. Unzip4. Open CMD5. Rscript R/master.R \

--port=71376. In browser open

http://localhost:7137

http://www.sumsar.net/blog/2014/07/estimate-your-bac-using-drinkr/

Copyright (c) WLOG Solutions 32

R Software DevelopmentR Suite walk-through

Copyright (c) WLOG Solutions 33

Preliminaries

Copyright (c) WLOG Solutions 34

Install R

https://cran.r-project.org/bin/windows/base/R-3.4.2-win.exe

Copyright (c) WLOG Solutions

Install R Suite

35

http://rsuite.io/RSuite_Download.php

Copyright (c) WLOG Solutions

Development cycle with R Suite

36

1. Start a project2. Add a package to the project

a. Develop the package (using devtools)3. Add dependencies to the project4. Build the package5. Build a deployment package

Copyright (c) WLOG Solutions 37

Open tutorial

http://rsuite.io/RSuite_Tutorial.php?article=basic_workflow.md

Copyright (c) WLOG Solutions 38

Copyright (c) WLOG Solutions 39

Project structure

Copyright (c) WLOG Solutions 40

deployment

logs

packages

R

tests

import

export

work

Where all the packages are installed.

Copyright (c) WLOG Solutions 41

deployment

logs

packages

R

tests

import

export

work

Where all the logs are stored.

Copyright (c) WLOG Solutions 42

deployment

logs

packages

R

tests

import

export

work

Where all the project packages are stored.

Copyright (c) WLOG Solutions 43

deployment

logs

packages

R

tests

import

export

work

Where master scripts (~ main) are stored.

Copyright (c) WLOG Solutions 44

deployment

logs

packages

R

tests

import

export

work

Where project tests are stored.

Copyright (c) WLOG Solutions 45

deployment

logs

packages

R

tests

import

export

work

(Optional)

Raw data to be imported.

Copyright (c) WLOG Solutions 46

deployment

logs

packages

R

tests

import

export

work

(Optional)

Results generated by our R program.

Copyright (c) WLOG Solutions 47

deployment

logs

packages

R

tests

import

export

work

(Optional)

Temporary results.

Copyright (c) WLOG Solutions 48

Configuration files.

Copyright (c) WLOG Solutions 49

RSuiteVersion: 0.10.212RVersion: 3.3Project: myprojectRepositories: MRAN[2017-09-28], Dir[]Artifacts: config_templ.txt

Example

PARAMETERS● RVersion - R version for this

project● Project - project name● Repositories

○ MRAN[YYYY-MM-DD],○ S3[URL]○ URL[URL]○ Dir[]

● Artifacts - what else should be added to deployment package.

Copyright (c) WLOG Solutions 50

LogLevel: INFON_days: 365solver_max_iterations: 10solver_opt_horizon: 8

Example

config_templ.txt (config.txt)

● LogLevel - level for loggers● _templ.txt - template for config● config.txt - deployment version

Copyright (c) WLOG Solutions 51

Copyright (c) WLOG Solutions 52

master.R

Here goes your code...

Autodected path of master.R

There can be many master scripts!

Copyright (c) WLOG Solutions 53

Put all logic into packages

Copyright (c) WLOG Solutions 54

Copyright (c) WLOG Solutions 55

Select external packages carefully.And control their versions!

Copyright (c) WLOG Solutions 56

data.table

Copyright (c) WLOG Solutions 57

Copyright (c) WLOG Solutions 58

print is not for logging.

Forbidden

Copyright (c) WLOG Solutions 59

loginfo("Phase 1 passed")

logdebug("Iter %d done", i)

logwarning("Are you sure?")

logerror("I failed :(")

Copyright (c) WLOG Solutions 60

pkg_loginfo("Phase 1 passed")

pkg_logdebug("Iter %d done", i)

pkg_logwarning("Are you sure?")

pkg_logerror("I failed :(")

Copyright (c) WLOG Solutions 61

Automate building, deploying, testing, etc.

Copyright (c) WLOG Solutions

Wit Jakuczun, PhD

[email protected]

62

Field tested R ecosystem for Enterprise

http://rsuite.io