geek night - continuous delivery for machine learning

31
5/15/2017 Continuous Delivery Principles for Machine Learning Rajesh Muppalla [email protected] @codingnirvana

Upload: rajesh-muppalla

Post on 21-Jan-2018

81 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Geek Night - Continuous delivery for machine learning

5/15/2017

Continuous Delivery Principles for Machine Learning

Rajesh [email protected]@codingnirvana

Page 2: Geek Night - Continuous delivery for machine learning

About Me

● Co-Founder @ Indix● Earlier - Distributed Systems, Big Data problems● Currently - Machine Learning problems● Ex-Thoughtworks

○ Tech lead on Go-CD - CI/CD tool● Previous Talks @ Geek Night

○ Building Distributed Crawler using Akka○ Big Data Testing Challenges

Page 3: Geek Night - Continuous delivery for machine learning

Six Business Critical Indexes

People

Documents Businesses

Places Products

ConnectedDevices

Content plus platform capability makes them very valuable

Page 4: Geek Night - Continuous delivery for machine learning

Enabling businesses to build location-aware software.

~3.6 million websites use Google maps

Enabling businesses to build product-aware software.

Indix catalogs over 2.1 billion product offers

Indix – the “Google Maps” of Products Building a platform for product information

Page 5: Geek Night - Continuous delivery for machine learning

Structure Refine

Organize

AI & Machine Learning

Brand & Retailer Websites

Organizing the World’s Product Information

Brand & Retailer Feeds

1.1 B Products | 2.1B Offers | 60K Brands

APIReal-time

APIBulk

Customizable Feeds

Page 6: Geek Night - Continuous delivery for machine learning

Data Scale @ Indix

2.1 BillionProduct

URLs 8 TB HTML Data

Crawled Daily

1B Unique

Products

5000Categories

100 BPrice

Points

3000Sites

Page 7: Geek Night - Continuous delivery for machine learning

3/31/16

Auto Parsers to detect and extract Product content from Web pages, using Machine Vision algorithms

Predictive Scheduler for deciding re-crawl frequency using various signals like Seasonality, Product Type, Store

Multi-label classifier Categorizing products into a hierarchical taxonomy using text information

Inferring Product vs Listing vs Other Pages using either just URL patterns or using Page Content

Adaptive Crawlers that modifies the crawl rate based dynamic characteristics like Site traffic, Number of products, Robots.txt settings

Deep learning - Categorizing products using Product images

Predicting which products are an exact match or similar products

NER based Attribute extraction algorithm that mines text like Title, Descriptions, Specifications to build structured Key:Value Attributes

Fusion/Enrichment - An algorithm that uses the data to learn and build golden product record using disparate sources

Product Rank - algorithm that uses multiple signals like product popularity, price, data quality, store popularity, brand popularity to build dynamic relevance/rank score

Recommendation Engines that suggest Tags where Product information can be found on a web page

Deep learning - Extracting visual product attributes using Product images

NLG algorithms to generate product descriptions

Product GPS - Universal Product Identifier using machine learning algorithms and allowing Search & Discovery

Machine Learning at Indix

Page 8: Geek Night - Continuous delivery for machine learning

5/15/2017

Machine Learning Workflow

Page 9: Geek Night - Continuous delivery for machine learning

Define Business Objective

Explore & Transform

Pull and Acquire Data

Develop Model

Model Evaluation & Validation

Meets Business Needs?

Build Production System

DeployMeasure Metrics

Yes!

Not Yet!

Human in the Loop

Machine Learning Workflow

Test Data

Training Data

Page 10: Geek Night - Continuous delivery for machine learning

Machine Learning Sandwich?*

* - https://techcrunch.com/2017/08/08/the-evolution-of-machine-learning/

Explore & Transform

Pull and Acquire Data

DeployBuild Production

System

Develop ModelModel Evaluation &

Validation

The MEAT is not in the middle

Page 11: Geek Night - Continuous delivery for machine learning

Machine Learning Sandwich?*

* - https://techcrunch.com/2017/08/08/the-evolution-of-machine-learning/

Explore & Transform

Pull and Acquire Data

DeployBuild Production

System

Develop ModelModel Evaluation &

Validation

Data Pipelines

App

Model

Page 12: Geek Night - Continuous delivery for machine learning

5/15/2017

Pain Points

Page 13: Geek Night - Continuous delivery for machine learning

Pain Points● A key employee in the team had to abruptly go on leave

○ Unable to reproduce the performance of an existing production model■ Training Data Missing/Not known■ Scripts not there for Pre-processing■ Hyperparameters not known

● It takes 3 Months to productionize a model■ Lot of glue code■ Custom code developed every time■ Frequent updates to model takes long time

● Confidence in Test Set != Confidence in Production■ Confidence of model performance on a sample set not good enough

● Heterogeneous Systems for performance reasons■ Eg. - Sharing stuff between Python and JVM

Page 14: Geek Night - Continuous delivery for machine learning

These are solved problems in Software Development

And have been solved using principles of

Continuous Delivery

Page 15: Geek Night - Continuous delivery for machine learning

Continuous Delivery is a software engineering approach that aims at building, testing and

releasing software faster and more frequently.

A straightforward and repeatable process is important from continuous delivery

What is Continuous Delivery?

Page 16: Geek Night - Continuous delivery for machine learning

● Use source and artifact repository labels for reproducibility○ Data and model management (incl. versioning)

● Use containers to package and run services○ Model containers for model prediction services

● A/B Testing using Canary Releases & Blue Green Deployments○ Variation of BG Deployment for A/B testing of models

● Automation via CI + CD pipelines○ Pipelines for Training, Evaluation and for Offline Predictions

Principles from CD in ML

Page 17: Geek Night - Continuous delivery for machine learning

Model Repository

● Organization, Versioning, Publishing and Resolving of latest version○ Similar to an artifact repository like Maven, Ivy

● For a model, stores ○ Metadata

■ Training/Validation/Test Datasets (From MDA or Custom)■ Hyper-parameters used■ Evaluation Metrics

○ Data■ Different formats - parquet (Spark MLLib), pickle (scikit-learn), h5 (keras)

● Has clients for most commonly used frameworks - scikit-learn, Spark MLLib, Keras

Page 18: Geek Night - Continuous delivery for machine learning

5/15/2017Confidential and Proprietary Do Not Distribute

Model Productionization

Page 19: Geek Night - Continuous delivery for machine learning

Model Promotion

● Tagging the “latest good” version that needs to be deployed● Not all models need/can be promoted

○ Experimental models○ Models that fail the test set metrics

● Easy rollback - tag the “last good” version as the latest

Page 20: Geek Night - Continuous delivery for machine learning

Model Container

● Hosts a single model to be used for predictions● Exposes API for prediction and are “dockerized”● Containers can be replicated to handle scale● Two µServices

○ Scala ■ Handles pre-processing

○ Python■ Loads model and exposes the predict on the model■ Can also predict in batches for better throughput

○ Scala µservice delegates the predict and predict_batch functions to the Python µservice

Page 21: Geek Night - Continuous delivery for machine learning

Model Container

Docker Host

Scala µService

predict(input)predict_batch(inputs)_preprocess(input)

Python µService

ModelModel

Model

predict(input)predict_batch(inputs)

Page 22: Geek Night - Continuous delivery for machine learning

Model Deployment

● Two Modes - Offline (Batch) and/or Online● Offline Mode

○ Package model containers into an AMI (Amazon Machine Image)○ Start the container as part of your Spark/Hadoop clusters on the

Executors/Task Trackers○ Within a job call the local Scala Service for prediction for each record

● Online Mode○ Deploy the model containers into a Mesos + Marathon or a Kubernetes

cluster○ (Auto) Scaling is managed by the cluster

Page 23: Geek Night - Continuous delivery for machine learning

Model A/B Testing

● Most common approach (MAB) - Multi-Armed Bandit Testing

Source - https://www.slideshare.net/turi-inc/model-managementalice

Page 24: Geek Night - Continuous delivery for machine learning

Model A/B Testing

Source - https://www.slideshare.net/turi-inc/model-managementalice

Page 25: Geek Night - Continuous delivery for machine learning

Model A/B Testing

● We don’t use MAB○ Reason - Payout is not easily measurable

● Instead we use a variation of the Blue Green Deployment pattern○ Input to both old and new both, but serve output only from old○ Find deltas and do spot checking

● Advantages○ Zero Downtime while pushing new models○ Easy rollback

● For Offline, BG not needed, only deltas + spot checking● We have built an in-house data turking tool for spot checking

Page 26: Geek Night - Continuous delivery for machine learning

Spot Checking Example 1

Page 27: Geek Night - Continuous delivery for machine learning

Spot Checking Example 2

Page 28: Geek Night - Continuous delivery for machine learning

ML Pipelines

● ML Pipelines could be modelled after build pipelines ● Customized Go-CD, a CI & CD tool to automate our ML workflows● Created plugins to help us with our ML workflows

Read Training Data(MDA Job)

Pre-process Data(Spark Job)

Build Model(Python)

Evaluate Model(Python) Publish Model Promote Model

Training PipelineManual Stage

Publish Container

Create Docker Image

(Docker)

Push to Docker Registry (Docker)

Create AMI(Shell)

Page 29: Geek Night - Continuous delivery for machine learning

5/15/2017Confidential and Proprietary Do Not Distribute

Page 30: Geek Night - Continuous delivery for machine learning

Future Work

● Open source the template model container● Add more plugins in Go-CD to better support stuff natively● Model Repository visualization

Page 31: Geek Night - Continuous delivery for machine learning

5/15/2017Confidential and Proprietary Do Not Distribute

Thank YouQuestions