pragmatic machine learning @ ml spain
TRANSCRIPT
![Page 1: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/1.jpg)
Pragmatic Machine Learning
@louisdorard #MLSpain - 18 Jan 2016
![Page 2: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/2.jpg)
I’M LAZY
![Page 3: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/3.jpg)
![Page 4: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/4.jpg)
![Page 5: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/5.jpg)
“Programming is for lazy people who want to automate things
— AI is for lazier people who want to automate programming”
![Page 6: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/6.jpg)
• Consider manual classification task
• Automate with ML model?
• Build PoC
• Deploy in production
• Maintain
• Monitor performance
• Update with new data6
The Laz y MLer
Phrase problem as ML task
Engineer features
Prepare data (csv)
Learn model
Make predictions
Deploy model & integrate
Evaluate model
Measure impact
![Page 7: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/7.jpg)
• Florian Douetteau at PAPIs Connect in May 2015)
• Top companies invested more than 5M$ in their ML production platform (Facebook, Amazon, LinkedIn, Spotify…)
7
Cost of ML projec ts
![Page 8: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/8.jpg)
• Real-world ML is/was complicated and costly (especially at web scale)
• Do I really need ML?
• How about Human API? (e.g. Amazon Mechanical Turk)
• → Back to Square 1 (but someone else’s problem!)
• → Baseline! (performance, time, cost)8
The Laz y MLer
![Page 9: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/9.jpg)
Performance evaluation
![Page 10: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/10.jpg)
How do you evaluate the performance of an ML system?
![Page 11: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/11.jpg)
Accuracy
Latency
Throughput
11
Per formance measures
![Page 12: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/12.jpg)
![Page 13: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/13.jpg)
• Go beyond accuracy… example: recommendations
• Get clicks!
• → Simulate how many you’d get with your model
• → Need to learn accurately what people like — not what they dislike
• Better decisions with ML
• Revenue increase (A/B test)
• Decisions can have a cost (e.g. give special offer/pricing to customer)… ROI?13
Domain-specif ic evaluation
![Page 14: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/14.jpg)
Decisions from predictions
![Page 15: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/15.jpg)
1. Descriptive
2. Predictive
3. Prescriptive
15
Types of analyt ics
![Page 16: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/16.jpg)
![Page 17: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/17.jpg)
1. Show churn rate against time
2. Predict which customers will churn next
3. Suggest what to do about each customer (e.g. propose to switch plan, send promotional offer, etc.)
17
Churn analysis
![Page 18: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/18.jpg)
• Who: SaaS company selling monthly subscription
• Question asked: “Is this customer going to leave within 1 month?”
• Input: customer
• Output: no-churn or churn
• Data collection: history up until 1 month ago
18
Churn predic t ion
![Page 19: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/19.jpg)
• #TP (we predict customer churns and he does)
• #FP (we predict customer churns but he doesn’t)
• #FN (we predict customer doesn’t churn but he does)
19
Churn predic t ion accurac y
![Page 20: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/20.jpg)
Assume we know who’s going to churn. What do we do?
• Contact them (in which order?)
• Switch to different plan
• Give special offer
• No action?
20
Churn prevention
![Page 21: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/21.jpg)
“3. Suggest what to do about each customer” → prioritised list of actions, based on…
• Customer representation + context (e.g. competition)
• Churn prediction (& action prediction?)
• Uncertainty in predictions
• Revenue brought by customer & cost of action
• Constraints on frequency of solicitations21
Churn prevention
![Page 22: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/22.jpg)
• Taking action for each TP (and FP) has a cost
• For each TP we “gain”: (success rate of action) * (revenue /cust. /month)
• Imagine…
• perfect predictions
• revenue /cust. /month = 10€
• success rate of action = 20%
• cost of action = 2€
• Which ROI?22
Churn prevention ROI
![Page 23: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/23.jpg)
• We predicted customer would churn but they didn’t…
• That’s actually good! Prevention worked!
• Need to store which actions were taken
• Is ML really helping?
• Compare to baseline,e.g. if no usage for more than 15 days then predict churn
• Is fancy model really improving bottom line?23
Churn prevention evaluation
![Page 24: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/24.jpg)
1. Show past demand against calendar
2. Predict demand for [product] at [store] in next 2 days
3. Suggest how much to ship
• Trade-off: cost of storage vs risk of lost sales
• Constraints on order size, truck volume, capacity of people putting stuff into shelves
24
Replenishment
![Page 25: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/25.jpg)
• Context
• Predictions
• Uncertainty in predictions
• Constraints
• Costs / benefits
• Competing objectives (⇒ trade-offs to make)
• Business rules25
Decis ions are based on…
![Page 26: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/26.jpg)
APIs are key
![Page 27: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/27.jpg)
Software components for automated decisions:
• Create training dataset from historical data (merge sources, aggregate…)
• Provide predictive model from given training set (i.e. learn)
• Provide prediction against model for given context
• Provide optimal decision from given contextual data, predictions, uncertainties, constraints, objectives, costs
• Apply given decision27
S eparation of concerns
![Page 28: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/28.jpg)
Software components for automated decisions:
• Create training dataset from historical data (merge sources, aggregate…)
• Provide predictive model from given training set (i.e. learn)
• Provide prediction against model for given context
• Provide optimal decision from given contextual data, predictions, uncertainties, constraints, objectives, costs
• Apply given decision28
Operations Research component
![Page 29: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/29.jpg)
Software components for automated decisions:
• Create training dataset from historical data (merge sources, aggregate…)
• Provide predictive model from given training set (i.e. learn)
• Provide prediction against model for given context
• Provide optimal decision from given contextual data, predictions, uncertainties, constraints, objectives, costs
• Apply given decision29
M achine Learning components
![Page 30: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/30.jpg)
Software components for automated decisions:
• Create training dataset from historical data (merge sources, aggregate…)
• Provide predictive model from given training set (i.e. learn)
• Provide prediction against model for given context
• Provide optimal decision from given contextual data, predictions, uncertainties, constraints, objectives, costs
• Apply given decision30
Predic t ive APIs
![Page 31: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/31.jpg)
The two methods of predictive APIs:
• model = create_model(‘training.csv’)
• predicted_output = create_prediction(model, new_input)
31
Predic t ive APIs
![Page 32: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/32.jpg)
Amazon ML
BigML
Google Prediction
PredicSis
… or your own company!
32
Providers of REST http Predic t ive APIs
![Page 33: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/33.jpg)
![Page 34: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/34.jpg)
![Page 35: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/35.jpg)
![Page 36: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/36.jpg)
?
![Page 37: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/37.jpg)
![Page 38: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/38.jpg)
Experiment on “ScienceCluster”• Distributed jobs • Collaborative workspace • Serialize chosen model
Deploy model as API on “ScienceOps”• Load balancing • Auto scaling • Monitoring (API calls, accuracy)
![Page 39: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/39.jpg)
![Page 40: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/40.jpg)
• “Open source prediction server” in Scala
• Based on Spark, MLlib, Spray
• DASE framework: Data preparation, Algorithm, Serving, Evaluation
• Amazon CloudFormation template → cluster
• Manual up/down scaling40
![Page 41: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/41.jpg)
→ PAPI+
![Page 42: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/42.jpg)
→ PAPI+
![Page 43: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/43.jpg)
![Page 44: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/44.jpg)
Interesting research problems
![Page 45: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/45.jpg)
45
Concurrenc y for high-throughput ML APIs
Brian Gawalt (Senior Data Scientist at Upwork) Talk at PAPIs ’15
![Page 46: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/46.jpg)
upwork.com use case:
• predict freelancer availability
• huge web platform (millions of users)→ need very high throughput and low latency
• things change quickly → need freshest data & predictions
46
Concurrenc y for high-throughput ML APIs
![Page 47: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/47.jpg)
• event: invitation sent to freelancer
• steps to prediction:
• gather raw data from all sources
• featurize event
• make prediction
Concurrenc y for high-throughput ML APIs
![Page 48: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/48.jpg)
• An actor…
• gets & sends messages
• makes computations
• Actors we need:
• “Historians”: one per data source
• “Featurizer”
• “Scorer”48
Concurrenc y with Ac tor framework
![Page 49: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/49.jpg)
49
Concurrenc y for high-throughput ML APIs
before
![Page 50: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/50.jpg)
50
Concurrenc y for high-throughput ML APIs
after
![Page 51: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/51.jpg)
• Python defacto standard: scikit-learn
• “Sparkit-learn aims to provide scikit-learn functionality and API on PySpark. The main goal of the library is to create an API that stays close to sklearn’s.”
• REST standard: PSI (Protocols & Structures for Inference)
• Pretty similar to BigML API!
• Implementation for scikit available
• Easier benchmarking! Ensembles!51
API standards?
![Page 52: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/52.jpg)
• “AzureML: Anatomy of a machine learning service”
• “Deploying high throughput predictive models with the actor framework”
• “Protocols and Structures for Inference: A RESTful API for Machine Learning”
• Coming soon… JMLR W&CP Volume 50
• Get updates: @papisdotio or papis.io/updates52
PAPIs ’15 Proceedings
![Page 53: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/53.jpg)
53
Simple MLaaS comparison
Amazon Google PredicSis BigML
Accuracy 0,862 0,743 0,858 0,790
Training 135s 76s 17s 5s
Test time 188s 369s 5s 1s
louisdorard.com/blog/machine-learning-apis-comparison
![Page 54: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/54.jpg)
• With SKLL (SciKit Learn Laboratory)
• Wrap each service in a scikit estimator
• Specify evaluations to perform in a config file (datasets, metrics, eval procedure)
• Need to also measure time…
• See papiseval on Github54
Automated B enchmark?
![Page 55: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/55.jpg)
• Return of the Lazy MLer!
• Model selection
• Find optimal values for n (hyper-)parameters → optimisation problem (function in n dimensions)
• Search space of parameters, efficiently → explore vs exploit
• Bayesian optimization?55
AutoML
![Page 56: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/56.jpg)
56
Bayesian Optimization in 1 dimension
From CODE517E
![Page 57: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/57.jpg)
57
Bayesian Optimization in 1 dimension
From CODE517E
![Page 58: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/58.jpg)
• Building ensembles
• Decide to continue training existing model, or to train new one
• Explore vs exploit again!
• Reward is accuracy. Let’s estimate reward for all options.
• Choose option with highest expected reward + uncertainty? (i.e. upper confidence bound)
• Limited computational budget… 58
AutoML
![Page 59: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/59.jpg)
• Zoubin Gharahmani & James Lloyd @ Uni Cambridge
• Gaussian Processes: find (mixture of ) kernel(s) that maximises data likelihood
• Also Bayesian!
59
Automatic Stat ist ic ian
![Page 60: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/60.jpg)
• Spearmint: “Bayesian optimization” for tuning parameters → Whetlab → Twitter
• Auto-sklearn: “automated machine learning toolkit and drop-in replacement for a scikit-learn estimator”
• See automl.org and challenge
60
Open S ource AutoML l ibrar ies
![Page 61: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/61.jpg)
61
S cik it
from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
![Page 62: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/62.jpg)
62
S cik it
from sklearn import svmmodel = svm.SVC(gamma=0.001, C=100.)
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
![Page 63: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/63.jpg)
63
AutoML S cik it
import autosklearnmodel = autosklearn.AutoSklearnClassifier()
from sklearn import datasetsdigits = datasets.load_digits() model.fit(digits.data[:-1], digits.target[:-1])
model.predict(digits.data[-1])
![Page 64: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/64.jpg)
• Before learning:
• Automatic feature extraction from text?
• After learning:
• Monitor new predictions and automatically retrain models when necessary?
• See panel discussion at PAPIs ‘1564
M ore automation ideas…
![Page 65: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/65.jpg)
• Same as Azure ML?
• Scaling up? down?
65
Open S ource Auto S cal ing?
![Page 67: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/67.jpg)
Tech talks:
• Intro to Spark
• Using ML to build an autonomous drone
• Demystifying Deep Learning (speaker needed!)
• Distributed Deep Learning with Spark on AWS
67
PAPIs Connec t (14-15 M arch, Valencia)
![Page 68: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/68.jpg)
Topics:
• Managing technology
• FinTech
• Enterprise, Retail, Operations
• AI for Society (Nuria Oliver, Scientific Director at Telefonica R&D)
• Future of AI (Ramon Lopez de Mantaras, Director AI Research at Spanish Research Council)
68
PAPIs Connec t (14-15 M arch, Valencia)
![Page 69: Pragmatic Machine Learning @ ML Spain](https://reader031.vdocuments.net/reader031/viewer/2022021919/5878c83d1a28ab26728b6679/html5/thumbnails/69.jpg)
• Dev? Bring your manager!
• Manager? Bring your devs!
• Discount code: MLSVLC20
• papis.io/connect
69
PAPIs Connec t (14-15 M arch, Valencia)