large scale machine learning projects with r suite

121

Upload: wit-jakuczun

Post on 22-Jan-2018

86 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 2: Large scale machine learning projects with r suite
Page 4: Large scale machine learning projects with r suite
Page 5: Large scale machine learning projects with r suite
Page 6: Large scale machine learning projects with r suite
Page 7: Large scale machine learning projects with r suite
Page 8: Large scale machine learning projects with r suite
Page 9: Large scale machine learning projects with r suite
Page 10: Large scale machine learning projects with r suite
Page 12: Large scale machine learning projects with r suite
Page 13: Large scale machine learning projects with r suite
Page 14: Large scale machine learning projects with r suite
Page 15: Large scale machine learning projects with r suite
Page 16: Large scale machine learning projects with r suite
Page 18: Large scale machine learning projects with r suite
Page 19: Large scale machine learning projects with r suite
Page 20: Large scale machine learning projects with r suite
Page 21: Large scale machine learning projects with r suite
Page 22: Large scale machine learning projects with r suite

○○

Page 23: Large scale machine learning projects with r suite
Page 25: Large scale machine learning projects with r suite
Page 26: Large scale machine learning projects with r suite
Page 27: Large scale machine learning projects with r suite
Page 31: Large scale machine learning projects with r suite

> rsuite install Detecting repositories ... Will use repositories: CRAN.CRAN = https://mran.microsoft.com/snapshot/2017-10-15 CRAN.CRANextra = http://www.stats.ox.ac.uk/pub/RWin Other = http://wlog-rsuite.s3.amazonaws.com Installing RSuite(v0.17x) package ... installing the source package 'RSuite' All done.

Page 33: Large scale machine learning projects with r suite

> rsuite proj start -n spmf

Page 34: Large scale machine learning projects with r suite

Commands: update Checks if newest version of RSuite CLI is installed. If not installer for newest version is downloaded and installation is initiated. install Install RSuite with all the dependencies. proj Use it to manage project, its dependencies, and build project packages. repo Use to manage repositories. e.g. upload packages. pkgzip Use to create PKGZIP packages to fillup remove repository. version Show RSuite CLI version. help Show this message and exit. Call 'rsuite [command] help' to get information on acceptable [args].

Page 35: Large scale machine learning projects with r suite
Page 36: Large scale machine learning projects with r suite
Page 37: Large scale machine learning projects with r suite
Page 38: Large scale machine learning projects with r suite
Page 39: Large scale machine learning projects with r suite
Page 40: Large scale machine learning projects with r suite
Page 41: Large scale machine learning projects with r suite
Page 42: Large scale machine learning projects with r suite
Page 43: Large scale machine learning projects with r suite
Page 44: Large scale machine learning projects with r suite
Page 45: Large scale machine learning projects with r suite
Page 46: Large scale machine learning projects with r suite
Page 47: Large scale machine learning projects with r suite
Page 48: Large scale machine learning projects with r suite
Page 49: Large scale machine learning projects with r suite

logs/.gitignore

Page 50: Large scale machine learning projects with r suite
Page 51: Large scale machine learning projects with r suite
Page 52: Large scale machine learning projects with r suite
Page 53: Large scale machine learning projects with r suite
Page 54: Large scale machine learning projects with r suite
Page 55: Large scale machine learning projects with r suite
Page 56: Large scale machine learning projects with r suite
Page 57: Large scale machine learning projects with r suite

PARAMETERS

●●

○○○○

Page 58: Large scale machine learning projects with r suite

●●●

LogLevel: INFON_days: 365solver_max_iterations: 10solver_opt_horizon: 8

Page 59: Large scale machine learning projects with r suite
Page 60: Large scale machine learning projects with r suite

●●

○ main○ if __name__ == "__main__":

Page 61: Large scale machine learning projects with r suite
Page 62: Large scale machine learning projects with r suite
Page 63: Large scale machine learning projects with r suite
Page 64: Large scale machine learning projects with r suite
Page 65: Large scale machine learning projects with r suite
Page 66: Large scale machine learning projects with r suite

predmodel

Page 67: Large scale machine learning projects with r suite
Page 68: Large scale machine learning projects with r suite

● ==● >=● <=

Page 69: Large scale machine learning projects with r suite

master.R

Page 70: Large scale machine learning projects with r suite
Page 71: Large scale machine learning projects with r suite

spmf/libs

Page 72: Large scale machine learning projects with r suite

packages_import.R

Page 73: Large scale machine learning projects with r suite

master.R

Page 74: Large scale machine learning projects with r suite
Page 75: Large scale machine learning projects with r suite
Page 76: Large scale machine learning projects with r suite
Page 77: Large scale machine learning projects with r suite

import_training.R (I)

● import/<session_id>/● work/<session_id>/

library(predmodel)

import_path <- file.path(script_path, "../import")work_path <- file.path(script_path, "../work")

# requiredsession_id <- args$get(name = "session_id", default = "201711122000", required = FALSE)

loginfo("--> Session id:%s", session_id)

session_work <- file.path(work_path, session_id)

if(!dir.exists(session_work)) { dir.create(session_work)}

import_training_data(file.path(import_path, session_id), session_work)

Page 78: Large scale machine learning projects with r suite

import_training.R (II)

Page 79: Large scale machine learning projects with r suite
Page 80: Large scale machine learning projects with r suite

devtools

Page 81: Large scale machine learning projects with r suite

import_training_data

#' @exportimport_training_data <- function(import_path, work_path) { pkg_loginfo("Importing from %s into %s", import_path, work_path)

n <- 10000 dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n)) m <- round(n*0.3) dt[, resp := c(rep(1, m), rep(0, n - m))] fwrite(x = dt, file = file.path(work_path, "training.csv"), sep = ";")}

Page 82: Large scale machine learning projects with r suite
Page 83: Large scale machine learning projects with r suite
Page 84: Large scale machine learning projects with r suite

estimate_model.R (I)

●●library(predmodel)

work_path <- file.path(script_path, "../work")# requiredsession_id <- args$get(name = "session_id", required = FALSE, default = "201710111655")loginfo("--> Session id:%s", session_id)session_work <- file.path(work_path, session_id)

h2o.init(max_mem_size = "4g", nthreads = 2)

logdebug("---> H2O started")

train_file <- file.path(session_work, "training.csv")

stopifnot(file.exists(train_file))

train_file %>% transform_training() %>% estimate_model(session_id) %>% save_model(session_work)

Page 85: Large scale machine learning projects with r suite

transform_training

#' @exporttransform_training <- function(train_file) { dt <- h2o.importFile(path = train_file, destination_frame = "train_dt", parse = TRUE, header = TRUE, sep = ";") dt$resp <- as.factor(dt$resp) dt <- h2o.assign(data=dt, key = "train_dt")

return(dt)}

Page 86: Large scale machine learning projects with r suite

estimate_model

#'@exportestimate_model <- function(dt, session_id) { model <- h2o.gbm(x = colnames(dt), y = "resp", training_frame = dt, model_id = sprintf("gbm_%s", session_id), ntrees = 10, learn_rate = 0.1)}

Page 87: Large scale machine learning projects with r suite

save_model

#' @exportsave_model <- function(model, session_work) { h2o.saveModel(model, path = session_work, force =TRUE)}

Page 88: Large scale machine learning projects with r suite
Page 89: Large scale machine learning projects with r suite
Page 90: Large scale machine learning projects with r suite

import_test.R (I)

● import/<session_id>/● work/<session_id>/

library(predmodel)

import_path <- file.path(script_path, "../import")work_path <- file.path(script_path, "../work")

# requiredsession_id <- args$get(name = "session_id", default = "201711122000", required = FALSE)

loginfo("--> Session id:%s", session_id)

session_work <- file.path(work_path, session_id)

if(!dir.exists(session_work)) { dir.create(session_work)}

import_test_data(file.path(import_path, session_id), session_work)

Page 91: Large scale machine learning projects with r suite

import_test_data

#' @exportimport_test_data <- function(import_path, work_path) { pkg_loginfo("Importing from %s into %s", import_path, work_path)

n <- 1000 dt <- data.table(feature1 = rnorm(n), feature2 = rnorm(n)) fwrite(x = dt, file = file.path(work_path, "test.csv"), sep = ";")}

Page 92: Large scale machine learning projects with r suite
Page 93: Large scale machine learning projects with r suite
Page 94: Large scale machine learning projects with r suite

score_model.R (I)

● work/<score_session_id>● work/<train_session_id>● export/<score_session_id>

Page 95: Large scale machine learning projects with r suite

score_model.R (II)

library(h2o)library(magrittr)library(predmodel)

work_path <- file.path(script_path, "../work")export_path <- file.path(script_path, "../export")

# requiredtrain_session_id <- args$get(name = "train_session_id", required = FALSE, default = "201710111655")score_session_id <- args$get(name = "score_session_id", required = FALSE, default = "201710111655")

loginfo("--> train session id:%s", train_session_id)loginfo("--> score session id:%s", score_session_id)

score_session_export <- export_pathtrain_session_work <- file.path(work_path, train_session_id)score_session_work <- file.path(work_path, score_session_id)

h2o.init(max_mem_size = "4g", nthreads = 2)

logdebug("---> H2O started")

test_file <- file.path(score_session_work, "test.csv")model_file <- file.path(train_session_work, sprintf("gbm_%s", train_session_id))

stopifnot(file.exists(test_file))stopifnot(file.exists(model_file))

test_dt <- test_file %>% transform_test()

score_model(test_dt = test_dt, model_path = model_file) %>% export_score(export_path = export_path, score_session_id = score_session_id)

Page 96: Large scale machine learning projects with r suite

transform_test

#' @exporttransform_test <- function(test_file) { h2o.importFile(path = test_file, destination_frame = "test_dt", parse = TRUE, header = TRUE, sep = ";")}

Page 97: Large scale machine learning projects with r suite

score_model

#' @exportscore_model <- function(test_dt, model_path) { model <- h2o.loadModel(model_path) pred_dt <- h2o.predict(model, test_dt) pred_dt}

Page 98: Large scale machine learning projects with r suite

export_score

#' @exportexport_score <- function(score_dt, score_session_id, export_path) { score_dt <- as.data.table(score_dt) score_dt[, score_session_id := score_session_id] fwrite(x = score_dt, file = file.path(export_path, "score.csv"), sep = ";", append = TRUE)}

Page 99: Large scale machine learning projects with r suite
Page 100: Large scale machine learning projects with r suite
Page 101: Large scale machine learning projects with r suite
Page 102: Large scale machine learning projects with r suite
Page 103: Large scale machine learning projects with r suite
Page 104: Large scale machine learning projects with r suite
Page 105: Large scale machine learning projects with r suite

Productionspmf_0.1_001.zipProduction/spmf import export

work

Page 106: Large scale machine learning projects with r suite

Production/spmf/R

a. Rscript import_training.Rb. Rscript estimate_model.Rc. Rscript import_test.Rd. Rscript score_model.R

Production/spmf/export

Page 107: Large scale machine learning projects with r suite
Page 108: Large scale machine learning projects with r suite

print

Page 109: Large scale machine learning projects with r suite
Page 110: Large scale machine learning projects with r suite

loginfo("Phase 1 passed")

logdebug("Iter %d done", i)

logtrace("Iter %d done", i)

logwarning("Are you sure?")

logerror("I failed :(")

Packages

pkg_loginfo("Phase 1 passed")

pkg_logdebug("Iter %d done", i)

pkg_logtrace("Iter %d done", i)

pkg_logwarning("Are you sure?")

pkg_logerror("I failed :(")

Page 111: Large scale machine learning projects with r suite

2017-11-13 13:47:03 INFO::--> Session id:201711122000

2017-11-13 13:47:03 INFO:predmodel:Importing from C:/Workplace/Sandbox/Production/spmf/R/../import/201711122000 into C:/Workplace/Sandbox/Production/spmf/R/../work/201711122000

2017-11-13 13:47:14 INFO::--> Session id:201711122000

2017-11-13 13:47:51 INFO::--> Session id:201711131000

2017-11-13 13:47:51 INFO:predmodel:Importing from C:/Workplace/Sandbox/Production/spmf/R/../import/201711131000 into C:/Workplace/Sandbox/Production/spmf/R/../work/201711131000

2017-11-13 13:47:57 INFO::--> train session id:201711122000

2017-11-13 13:47:57 INFO::--> score session id:201711131000

Page 112: Large scale machine learning projects with r suite

LogLevel: INFO

LogLevel: DEBUG

LogLevel: TRACE

Page 113: Large scale machine learning projects with r suite

import_training.R

Page 114: Large scale machine learning projects with r suite
Page 115: Large scale machine learning projects with r suite
Page 116: Large scale machine learning projects with r suite

tests/test_spmf.R

library(predmodel)library(testthat)

context("Testing context")

test_that(desc = "Test", code = { expect_true(5 > 3) expect_true(pi < 3) })

Page 117: Large scale machine learning projects with r suite
Page 118: Large scale machine learning projects with r suite
Page 119: Large scale machine learning projects with r suite
Page 120: Large scale machine learning projects with r suite
Page 121: Large scale machine learning projects with r suite