your ml code productionizing seamlessly · some fun facts yelpers have written 155 million reviews...

36
Productionizing your ML code seamlessly lauris #EuroPython

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Productionizing your ML codeseamlessly

lauris #EuroPython

Page 2: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

IMAGEConnecting people with great local businesses

Page 3: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and
Page 4: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Yelpers have written 155 million reviews since 2004.Some fun factsabout Yelp

We have over 500 developers.

We have over 300 services and our monolith yelp-main has over 3 million lines of code!

We have 74 million desktop and 30 million mobile app monthly unique visitors.

Page 5: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Agendafor today

How to improve your development workflow to make the path to production easier

What does running an ML model in production involve?

Page 6: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Starting Point: your notebook

Page 7: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

predicts a desirable behavior

detects when an event is about to happen

recommends the best items to your usersIt solves your problem

STARTING POINT: YOUR NOTEBOOK

ranks items in search

forecasts trends in stocks

Your notebook ...

Page 8: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Your notebook can be simple

STARTING POINT: YOUR NOTEBOOK

Page 9: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

… or not so much

STARTING POINT: YOUR NOTEBOOK

Page 10: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Verify that your initial objective is accomplished, regularly

Making a model work in a notebook is only the first step to make it useful

STARTING POINT: YOUR NOTEBOOK

Getting data to predict or train on regularly

Evaluate the model

Use the predictions in your product

Page 11: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

IMAGE

“Only a small fraction of real-world ML systems is composed of the ML code [...] The required surrounding infrastructure is vast and complex.”

Hidden Technical Debt in Machine Learning Systems - Google NIPS 2015

STARTING POINT: YOUR NOTEBOOK

You are here

Page 12: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

What does running an ML model in production involve?

Page 13: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

IMAGE

WHAT DOES ML IN PROD INVOLVE?

This presentation is not about tooling

Page 14: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

IMAGE

WHAT DOES ML IN PROD INVOLVE?

This is a framework on how to tackle the problem

Page 15: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

ML pipelineA simplified view

WHAT DOES ML IN PROD INVOLVE?

Page 16: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Your data source is updating

WHAT DOES ML IN PROD INVOLVE?

Page 17: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Updatingthe model

WHAT DOES ML IN PROD INVOLVE?

Regular training

Re-run strategies

Scale

Page 18: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Evaluating Model

WHAT DOES ML IN PROD INVOLVE?

Page 19: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Evaluating Model

WHAT DOES ML IN PROD INVOLVE?

Does my evaluation metric reflect how this model will be used in production?

Consider both a classic metric and something that makes sense business wise.

Think about feedback loops

Page 20: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Generatingpredictions

WHAT DOES ML IN PROD INVOLVE?

Regular training

Re-run strategies

Scale

Page 21: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Usingpredictions

WHAT DOES ML IN PROD INVOLVE?

Page 22: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Measuring Success

WHAT DOES ML IN PROD INVOLVE?

Beware about predicting what will happen anyway. Think about having a hold-out set.

A/B test different models

Track the business metrics you are trying to move

Confront your hypothesis to reality

Page 23: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

How to improve your development workflow to make the path to production easier

Page 24: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

General Advice

MAKE PRODUCTIONIZING EASIER

Use containers, virtualenvs

Use prod technologies from the get go

Persist your models, logs, code ...

Rely on already existing tools

Page 25: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Feature Extraction

MAKE PRODUCTIONIZING EASIER

=

Page 26: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Feature Extraction

MAKE PRODUCTIONIZING EASIER

Code should be the same between training and prediction

Think about edge cases

Write features as codeGiant SQL queries are hard to review, test and maintain

Unit Test the feature extraction code

Page 27: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Evaluating Model

MAKE PRODUCTIONIZING EASIER

Page 28: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Evaluating Model

MAKE PRODUCTIONIZING EASIER

Perform feature importance analysis, to be able to detect issues or big change between one training and the other.

Set aside a sanity check test set for evaluation

Page 29: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

VersionAll the things

Keep track of change

MAKE PRODUCTIONIZING EASIER

Versioned

Versioned

Versioned

Versioned

Versioned

Page 30: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

LogAll the things

And persist it

MAKE PRODUCTIONIZING EASIER

log

log

log

log

log

log

log

log

Page 31: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Training Prediction

Your product

Data sources Data sources

SamplingSampling

Feature Extraction

Model Training

Model

Evaluation

Feature Extraction

Predictions

Monitoring the pipeline

MAKE PRODUCTIONIZING EASIER

Page 32: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Monitoring the pipeline

MAKE PRODUCTIONIZING EASIER

Keep track of the number of prediction generated (and alert when it’s 0)

Alert on errors in your pipeline’s code

Alert on the business metrics you are trying to move.

Keep track of timings, to be able to see problems earlier

Write runbooks

Page 33: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Closing word

ML code is code, and all good practices still apply.

Verify your assumption against reality. Really.

Design for change, and evolution.

Page 34: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

www.yelp.com/careers/

Offices @HamburgLondon

San Francisco

We're Hiring!

Page 35: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Thank you!

Page 36: your ML code Productionizing seamlessly · Some fun facts Yelpers have written 155 million reviews since 2004. about Yelp We have over 500 developers. We have over 300 services and

Questions?