r software development - how to write and maintain 30k+ loc in r and survive?

44
Copyright (c) WLOG Solutions R software development How to write and maintain 30K+ LOC in R and survive? Wit Jakuczun, WLOG Solutions 2017-06-20

Upload: wit-jakuczun

Post on 22-Jan-2018

463 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions

R software development How to write and maintain 30K+ LOC in R and

survive?

Wit Jakuczun, WLOG Solutions

2017-06-20

Page 2: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 2

World of analytics has changed.

Page 3: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 3

Page 4: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 4

4000x4 elastic-net models (CV-5) for 45Kx10K datasetin 1,5 minute!

Page 5: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 5

Join 21st centuRy today!

Page 6: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions

What is R?

6

Page 7: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 7

Dynamically interpreted general programming language

Page 8: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 8

Stable open-source productdeveloped by R Foundation

since ~1995 year.

Page 9: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 9

Created for data analysis.

Page 10: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 10

flights %>%

group_by(year, month, day) %>%

select(arr_delay, dep_delay) %>%

summarise(

arr = mean(arr_delay, na.rm = TRUE),

dep = mean(dep_delay, na.rm = TRUE)

) %>%

filter(arr > 30 | dep > 30)

z <- scaled_input %>%

layer_convolution2D(c(5,5), 32, pad = TRUE) %>%

layer_max_pooling(c(3,3), c(2,2)) %>%

layer_convolution2D(c(3,3), 48) %>%

layer_max_pooling(c(3,3), c(2,2)) %>%

layer_convolution2D(c(3,3), 64) %>%

layer_dense(96) %>%

layer_dropout(0.5) %>%

layer_dense(num_output_classes, activation = activation_softmax())

Page 11: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 11

R is a community.

Page 12: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 12

CRAN10K+ packages

Githubmore and more

popular

Page 13: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 13

http://githut.info

Page 14: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 14

R is really popular

Page 15: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 15

Tiobe Index, 2017

Estimated 2M+ users all over the world.

Page 16: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 16

Sounds like python?

Page 18: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 18

RPackage reticulate

PythonPackage rpy2

Page 19: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 19

R Software DevelopmentWhat is large scale?

Page 20: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 20

R software development vs

R scripting

Page 21: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 21

Large scale ~ 10K+ LOCSmall scale ~ 1K LOC

Page 22: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 22

CRAN (MRAN) Github Other

R environment

Installed packages

Local CRANSource code repo

Page 23: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 23

CRAN (MRAN) Github Other

R environment

Installed packages

Local CRANSource code repo

Page 24: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 24

R Software DevelopmentBest practices by WLOG

Page 25: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 25

Always make final test from command line.

Page 26: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 26

Rscript my_script.R

Page 27: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 27

Put all logic into packages.

Page 28: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 28

Package help system

Package dependency

system

External data in packages

Vignettes Tests

Page 29: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 29

Use any source code version control system.

Yes, even if you are working alone. :)

Page 30: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 30

print is not for logging.

Forbidden

Page 31: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 31

logging::loginfo(“Phase 1 passed”)

logging::logdebug(“Iter %d done”, i)

logging::logwarning(“Are you sure?”)

logging::logerror(“I failed :(”)

Page 32: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 32

Select external packages carefully.And control their versions!

Page 33: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 33

data.table

Page 34: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 34

Use configuration files.

Page 35: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 35

SnapshotDate: 2015-11-01PackagesPath: packagesLocalRepoPath: repositoryScriptPath: executionScriptsProject: XXXZipVersion:Artifacts:

LogLevel: INFOwork_path: ../workdata_path: ../dataexport_path: ../exportN_days: 365solver_max_iterations: 10solver_opt_horizon: 8

PARAMETERS CONFIG

Page 36: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 36

Use standard project structure.

Page 37: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 37

Master scripts

Project local packages

Tests

External packages

Logs

Work

Import

Export

Configura

tion

Page 38: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 38

Automate building, deploying, testing, etc.

Page 39: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 39

Jenkins exemplary pipeline

Page 40: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 40

Go to hell :)

Page 42: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions 42

Summary

Page 43: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG SolutionsCopyright (c) WLOG Solutions 43

Seamless integration with existing systems and IT infrastructure

Dev/Test/Prod processes according to current software

development standards

Fast development to production cycle

Continuous integration & deployment

Repositories – models, builds, code,

dependencies, configuration

Controllable distributed job

scheduling

Resource usage monitoring

Secure access control, protected password

repositories

A well deployed R based analytical platform must have the following features

Page 44: R Software development - How to write and maintain 30K+ LOC in R and survive?

Copyright (c) WLOG Solutions

Wit Jakuczun, PhD

[email protected]

44

WLOG R Suite™Field tested R ecosystem for Enterprise