train, predict, serve: how to go into production your machine learning model

41
1 © Cloudera, Inc. All rights reserved. Aki Ariga | Field Data Scientist Train, Predict, Serve: How to go into production your Machine Learning model

Upload: cloudera-japan

Post on 21-Jan-2018

134 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Train, predict, serve: How to go into production your machine learning model

1© Cloudera, Inc. All rights reserved.

Aki Ariga | Field Data Scientist

Train, Predict, Serve: How to go into production your Machine Learning model

Page 2: Train, predict, serve: How to go into production your machine learning model

2© Cloudera, Inc. All rights reserved.

Aki Ariga● Field Data Scientist at Cloudera● Previously research engineer at Toshiba, Rails developer at Cookpad● Contributor of sparklyr and creator of tabula-py● co-author of Machine Learning at work (in Japanese)● Twitter: @chezou● Linkedin: https://www.linkedin.com/in/aki-ariga/● GitHub: https://github.com/chezou/● Slideshare: https://www.slideshare.net/chezou

Page 3: Train, predict, serve: How to go into production your machine learning model

3© Cloudera, Inc. All rights reserved.

What does machine learning?

Page 4: Train, predict, serve: How to go into production your machine learning model

4© Cloudera, Inc. All rights reserved.

● Classification○ Classify data into categories. e.g.) Spam detection, Image classification

● Regression○ Predict continuous value. e.g.) Power consumption prediction

● Clustering○ Grouping similar data into same group e.g.) Exploratory analysis

● Anomaly detection○ Detect anomalous data. e.g.) Fraud detection

● Recommendation○ Suggest contents which are interested in. e.g.) Amazon, Netflix

● Reinforcement learning○ Learn strategy to maximize reward. e.g.) Alpha-Go, Self-driving car

● etc..

What machine learning does?

Page 5: Train, predict, serve: How to go into production your machine learning model

5© Cloudera, Inc. All rights reserved.

Classify email SPAM or not

Non-Spam

Spam

Example: Spam detection

Page 6: Train, predict, serve: How to go into production your machine learning model

6© Cloudera, Inc. All rights reserved.

[0, 1, 0, 2.5, 0, -1, ...][1, 0.5, 0.1, -2, 3, 2, ...][1, 0, 1.0, 1.1, 0, 0, ...]

Documents

Feature Extraction

Feature vector

Logistic Regression,SVM, Random Forest, NN...

Model Training

Algorithm & parameters

w1=1, w2=-1, w3=0 ...

Predictive model

e.g.) Extract: from Image: Array of RGB from Text: word frequency

Training phase of Machine Learning pipeline (supervised learning)

Page 7: Train, predict, serve: How to go into production your machine learning model

7© Cloudera, Inc. All rights reserved.

Sample data science/machine learning workflowFrom data to exploration to action

Data Engineering Data Science (Exploratory) Production (Operational)

Data Wrangling

Visualization and Analysis

Model Training & Testing

ProductionData Pipelines Batch Scoring

Online ScoringServing

Data GovernanceGovernance

Processing

AcquisitionReports,

Dashboards

Data Models Predictions Business value

Page 8: Train, predict, serve: How to go into production your machine learning model

8© Cloudera, Inc. All rights reserved.

What is “production” of a ML system?

Page 9: Train, predict, serve: How to go into production your machine learning model

9© Cloudera, Inc. All rights reserved.

Production of ML systemsReports Dashboards Scoring

Page 10: Train, predict, serve: How to go into production your machine learning model

10© Cloudera, Inc. All rights reserved.

Patterns of ML scoring systems

Page 11: Train, predict, serve: How to go into production your machine learning model

11© Cloudera, Inc. All rights reserved.

1. Train by batch, predict on the fly, serve via REST API2. Train by batch, predict by batch, serve through the shared DB3. Train, predict, serve by streaming4. Train by batch, predict on mobile app

Patterns of ML scoring systems

Page 12: Train, predict, serve: How to go into production your machine learning model

12© Cloudera, Inc. All rights reserved.

1. Train by batch, predict on the fly, serve via REST API2. Train by batch, predict by batch, serve through the shared DB3. Train, predict, serve by streaming4. Train by batch, predict on mobile app

Patterns of ML scoring systems

Page 13: Train, predict, serve: How to go into production your machine learning model

13© Cloudera, Inc. All rights reserved.

Web Application

DB

Trained Model

Execute training

Extract feature

Prediction result

Activity log/Contents data

Feature

Training result

Feature

Batch SystemAPI Server

RESTAPI

User ID/Item ID

ML System

Pattern 1: Train by batch, predict on the fly, serve via REST API

Page 14: Train, predict, serve: How to go into production your machine learning model

14© Cloudera, Inc. All rights reserved.

ML System

Web Application

DB

Execute training

Prediction result

Activity log/Contents data

Feature

Training result

Feature

User ID/Item ID

What is the deployment target?

Production

RESTAPI

Development

Validated Model

Extract feature

Validated Model

Extract feature

Deploy

Page 15: Train, predict, serve: How to go into production your machine learning model

15© Cloudera, Inc. All rights reserved.

Web Application

Execute training

Prediction result

Feature

Training result

Feature

API Server

REST API

User ID/Item ID

Batch System

Trained Model

Extract feature

Train using activity logs in the DB and store model into the storage.

DB

Activity log/Contents data

Pattern 1-a: Training phase

ML System

Page 16: Train, predict, serve: How to go into production your machine learning model

16© Cloudera, Inc. All rights reserved.

Batch SystemWeb Application

Execute training

Prediction result

Feature

Training result

API Server

RESTAPI

User ID/Item ID

Trained Model

Extract feature

Predict from logs/contents triggered by REST API

DB

Activity log/Contents data

Feature

Pattern 1-b,c: Prediction & serving phase

ML System

Page 17: Train, predict, serve: How to go into production your machine learning model

17© Cloudera, Inc. All rights reserved.

Extract feature & Train/update model

Extract feature & Predict

Trained Model

Activity log

Export model as PMML

Model building layerPredicting & serving layer

Updated model

CDSW

Prediction results

HDFSRequest to predictLoad model

Example architecture 1: PMML + OpenScoring

Page 18: Train, predict, serve: How to go into production your machine learning model

18© Cloudera, Inc. All rights reserved.

Extract feature & Train/update model

Extract feature & Predict

Trained Model

Activity log

Save model on object storage

Model building layerPredicting & serving layer

Updated modelPrediction results

HDFSRequest to predictLoad model

Object storage

Pack the runtime env with Docker

CDSW

Example architecture 2: Docker based API Server

Page 19: Train, predict, serve: How to go into production your machine learning model

19© Cloudera, Inc. All rights reserved.

Demo:Docker based prediction APIhttps://github.com/chezou/cdsw-serve-docker

Page 20: Train, predict, serve: How to go into production your machine learning model

20© Cloudera, Inc. All rights reserved.

Demo architecture

CDSW Docker HUBDocker image

Source code

Trained model

Amazon ECS

Application Load Balancer

Prediction request

Amazon S3

Page 21: Train, predict, serve: How to go into production your machine learning model

21© Cloudera, Inc. All rights reserved.

● Pros○ Able to choose a different system configuration for the front-end web

application and the ML system.■ Choose favorite languages for both systems■ Use different machine spec for both systems■ Deploy independently

○ Able to serve with low latency○ Rapid prototyping with flexible ML libraries○ Easy to A/B testing○ Prevent reimplementation of ML algorithm

● Cons○ Complex to implement scalable API server○ Unable to use slow algorithm

Pattern 1: Train by batch, predict on the fly, serve via REST API

Page 22: Train, predict, serve: How to go into production your machine learning model

22© Cloudera, Inc. All rights reserved.

1. Train by batch, predict on the fly, serve via REST API2. Train by batch, predict by batch, serve through the shared DB3. Train, predict, serve by streaming4. Train by batch, predict on mobile app

Patterns of ML scoring systems

Page 23: Train, predict, serve: How to go into production your machine learning model

23© Cloudera, Inc. All rights reserved.

Web Application

DB

Trained Model

Batch System

Execute training

Extract feature

Prediction result

Activity log/Contents data

Feature

Training result

Feature

Serve prediction

Training BatchPrediction Batch

Pattern 2: Train by batch, predict by batch, serve through the shared DB

Page 24: Train, predict, serve: How to go into production your machine learning model

24© Cloudera, Inc. All rights reserved.

Web Application

Execute training

Prediction result

Feature

Training result

Feature

Serve prediction

Training BatchPrediction Batch

DB

Trained Model

Batch System

Extract featureActivity log/Contents data

Training using activity logs in the DB and store model into the storage.Pattern 2-a: Training phase

Page 25: Train, predict, serve: How to go into production your machine learning model

25© Cloudera, Inc. All rights reserved.

Web Application

Execute training

Prediction result

Feature

Training result

Feature

Serve prediction

Training BatchPrediction Batch

DB

Activity log/Contents data

Trained Model

Batch System

Extract feature

Predict from contents data in the DB and store prediction results.Pattern 2-b: Prediction phase

Page 26: Train, predict, serve: How to go into production your machine learning model

26© Cloudera, Inc. All rights reserved.

Web Application

Execute training

Prediction result

Feature

Training result

Feature

Serve prediction

Training BatchPrediction Batch

Activity log/Contents data

Trained Model

Batch System

Extract feature

Serve prediction results stored in the shared DB.

DB

Pattern 2-c: Serving phase

Page 27: Train, predict, serve: How to go into production your machine learning model

27© Cloudera, Inc. All rights reserved.

Kudu/HBase

Extract feature & Train/update model

Extract feature & Predict

Activity log

Prediction results

Model building & predicting layerServing layer

Updated model

Activity log Load trained model

Prediction results

HDFS

CDSW

Historical data

Historical data

Example architecture: Serving by HBase/Kudu

Trained Model

Page 28: Train, predict, serve: How to go into production your machine learning model

28© Cloudera, Inc. All rights reserved.

● Pros○ Able to choose a different system configuration for the front-end web

application and the batch system.■ Chose favorite languages for both systems■ Use different machine spec for both systems■ Deploy independently

○ Able to use slow and complex algorithm○ Easy to manage: versioning model/prediction results○ Rapid prototyping with flexible ML libraries○ Prevent reimplementation of ML algorithm

● Cons○ Unable to predict triggered with certain event (e.g. PV, purchase)○ Unable to prevent time lag from prediction to serving

Pattern 2: Train by batch, predict by batch, serve through the shared DB

Page 29: Train, predict, serve: How to go into production your machine learning model

29© Cloudera, Inc. All rights reserved.

1. Train by batch, predict on the fly, serve via REST API2. Train by batch, predict by batch, serve through the shared DB3. Train, predict, serve by streaming4. Train by batch, predict on mobile app

Patterns of ML scoring systems

Page 30: Train, predict, serve: How to go into production your machine learning model

30© Cloudera, Inc. All rights reserved.

Web Application

Trained Model

Stream-based ML System(e.g. Spark Streaming)

Train & Predict

Extract feature

Prediction results

Recent log data

Feature Model updates

Model

- Querying for prediction- Showing or sending alerts- This component may work

with message queue like Kafka

Message queue (e.g. Kafka)

Log data

Prediction results

Pattern 3: Train, predict, serve by streaming

Page 31: Train, predict, serve: How to go into production your machine learning model

31© Cloudera, Inc. All rights reserved.

Example architecture: Lambda architecture with Oryx

Page 32: Train, predict, serve: How to go into production your machine learning model

32© Cloudera, Inc. All rights reserved.

● Pros○ Predict and serve with low latency○ Update model interactively○ Able to predict triggered with certain event (e.g. purchase, PV)○ Strong capability for streamed data, especially for anomaly detection and

recommendation● Cons

○ Complex to manage model and system■ Hard to versioning models

Pattern 3: Train, predict, serve by streaming

Page 33: Train, predict, serve: How to go into production your machine learning model

33© Cloudera, Inc. All rights reserved.

1. Train by batch, predict on the fly, serve via REST API2. Train by batch, predict by batch, serve through the shared DB3. Train, predict, serve by streaming4. Train by batch, predict on mobile app

Patterns of ML scoring systems

Page 34: Train, predict, serve: How to go into production your machine learning model

34© Cloudera, Inc. All rights reserved.

Mobile Application

DB

Trained Model

Batch System

Execute training

Extract feature

Extract featureRequest for prediction Activity logs/

Contents data

Prediction result

Activity log/Contents data

Feature

Training resultFeature

DB

Trained Model

Convert model

Pattern 4: Train by batch, predict on a mobile app

Page 35: Train, predict, serve: How to go into production your machine learning model

35© Cloudera, Inc. All rights reserved.

Extract feature & Train/update model

Extract feature & Predict

Trained Model

Activity log

Convert model to TFLite/CoreML

Model building layerPredicting & serving layer

Updated modelPrediction results

HDFSRequest to predict

Load model

Storage in a smart phone

CDSW

Example architecture: Serving on a mobile app

Page 36: Train, predict, serve: How to go into production your machine learning model

36© Cloudera, Inc. All rights reserved.

Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app)

Training by batch by batch by streaming by batch

Prediction on the fly by batch by streaming on the fly

Prediction result delivery via REST API through the shared DB by streaming via MQ via in-process API on mobile

Latency for prediction from getting new data

So so So so ~ Long Very low Low

Required time to predict Short Long Short Short

Tight/loose coupling with app

Loose Loose Loose Tight

Dependency of languages

Independent Independent Independent Depends on frameworks

System management difficulty

So so Easy Very Hard So so

4 patterns Comparison

Page 37: Train, predict, serve: How to go into production your machine learning model

37© Cloudera, Inc. All rights reserved.

● Sculley et al., “Hidden Technical Debt in Machine Learning Systems”, NIPS, 2015.

● Lucy Park, “My model has higher BLEU, can I ship it? The Joel Test for machine

learning systems”, ACML-MLAIP, 2017

● David, “When models go rogue: Hard-earned lessons about using machine learning in production”, Strata Data NYC, 2017

Good to read/watch for related topics

Page 38: Train, predict, serve: How to go into production your machine learning model

38© Cloudera, Inc. All rights reserved.

● Monitor model performance with recurring model update with CI

Model versioning & monitoring

DB

Trained Model

Batch System

Execute/update training

Extract feature

Activity log/Contents data

Feature

Training result

Gold standard data with labels

Evaluate model

Validated modelsObject storage

Evaluated performance

BIShow dashboard of evaluation results

Versioned modelVersioned model

Versioned model

Page 39: Train, predict, serve: How to go into production your machine learning model

39© Cloudera, Inc. All rights reserved.

Thank you

Aki [email protected]

Page 40: Train, predict, serve: How to go into production your machine learning model

40© Cloudera, Inc. All rights reserved.

Appendix

Page 41: Train, predict, serve: How to go into production your machine learning model

41© Cloudera, Inc. All rights reserved.

● Reports○ Send Email via CDSW Job

● Dashboards○ NA (Web app can be run while container is running)

● Scoring○ Pattern 1

■ Export model with PMML serve OpenScoring/Use MLeap etc...■ Create API server with Web framework and pack the environment with Docker■ Need to manage API server by users

○ Pattern 2■ Store prediction results into HBase/Kudu/RDB from CDSW

○ Pattern 3■ You can develop and deploy jobs working on the same cluster

○ Pattern 4■ Convert to TFLite/CoreML model and bring the model on your app

How does CDSW help for each deployment pattern?