unifying twitter around a single ml platform · 2020. 7. 9. · unifying twitter around a single ml...

44
Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Upload: others

Post on 09-Aug-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Unifying Twitter Around a Single ML Platform

Yi Zhuang (@yz), Nicholas Leonard (@strife076)April 17, 2019

Page 2: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Summary of Lessons Learned• Future of Our ML Platform

Page 3: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Lessons Learned• Future of Our ML Platform

Page 4: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Use Cases: Tweet Ranking

Page 5: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Use Cases at Twitter: Ads

pCTR =p ( “click” | if we show this Candidate Ad to this User in this Context)

User

Candidate Ad

Context

“Click”

Page 6: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Use Cases at Twitter• Other use cases

• Recommending Tweets, Users, Hashtags, News, etc.• Detecting Abusive Tweets and Spam• Detecting NSFW Images and Videos• And so on …

Page 7: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Use Cases at Twitter

ML is Everywhere

Page 8: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Summary of Lessons Learned• Future of Our ML Platform

Page 9: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Requirements of ML Platform

Data ScalePBs of data per day

Some models train on Tens of TBs of data per day

Page 10: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Requirements of ML Platform

Prediction Throughput Tens of millions of predictions per second

Page 11: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Requirements of ML Platform

Prediction Latency Budgettens of milliseconds

Page 12: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Example Use CaseAds Prediction

Training examples everyday1+B

1+M Features

40ms Serving latency

Predictions every second10+M

Page 13: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Summary of Lessons Learned• Future of Our ML Platform

Page 14: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Challenges of Old ML Platform

Fragmentation of ML Practice

PyTorch

Scikit Learn

In-houseFrameworks

VW

Lua Torch

TensorFlow

Page 15: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Challenges of Old ML Platform

Difficulty Sharing

Models

Tooling & ResourcesKnowledge

Page 16: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Challenges of Old ML Platform

InefficienciesWork Duplication

Page 17: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Example Duplicate Work

Various Ways to doModel Training & Serving

Model RefreshesData Cleaning and Preprocessing

Experiment TrackingEtc.

Page 18: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Lessons Learned• Future of Our ML Platform

Page 19: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

New Unified ML Platform OverviewA Single Consistent ML Platform Across Twitter

5Producio

n Model S

erving

Lorem ipsu

m dolor sit a

met, conse

ctetur

adipiscing elit,

sed do eiusm

od tempor.

Donec facil

isis l

acus e

get mauris

.

4Exp

erimentatio

n Tracking

3Model T

raining and Evaluatio

n

Preprocess

ing and Featurizatio

n

21Pipelin

e Orch

estratio

n

Page 20: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges• Unifying Twitter Around a Single ML Platform• Technology migrations• Health ML Use Case• Summary of Lessons Learned• Future of Our ML Platform

Page 21: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Technology Migrations

● Data Analysis: Scalding + PySpark/Notebooks● Featurization: Feature Store● ML Frameworks: Java ML -> Lua Torch -> TensorFlow● Training and deployment cycles: Apache Airflow

Page 22: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Data Analysis: Scalding

● Scala● Abstraction over hadoop● Distributed data processing● Great for large scale data● Slow-iteration

Page 23: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Data analysis: Notebook + Spark● iPython Notebook + PySpark● Easier for Python engineers● Data visualization● Faster iteration

Page 24: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Lessons learned

ML Practitioner DiversityProduction ML Engineers

Deep Learning ResearcherData Scientists

Page 25: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Featurization: Ad Hoc● Teams use common data sources

○ E.g. user data, tweet data, engagement data● Every team does their own featurization

○ Duplication of effort● Difficult to validate features at serving time

○ Inconsistent featurization schemes for training vs serving

Page 26: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Featurization: Feature Store● Teams can share, discover and access features● Consistent training-time vs serving-time featurization

Page 27: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Lessons learned

ConsistencyConsistency across teams => sharing & efficiency

Important: feature consistency between training and serving

Page 28: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Frameworks: Java ML

● Logistic regression○ Relies on feature discretization

● Typically used in an online learning environment:○ Model learns new data as it becomes available (~15 min delay)

Page 29: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Frameworks: Lua Torch● Deep learning● Feature discretization parity● ML Engineers didn’t want to learn Lua:

○ Lua hidden via YAML○ Hard to debug and unit test

● Complex production setup○ JVM -> JNI -> Lua VMs -> C/C++

Page 30: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Frameworks: TensorFlow● Google support● Production ready

○ Export graphs as protobuf○ Serve graphs from Java/Scala:

■ JVM -> TensorFlow● TensorBoard ● Large ecosystem (E.g. TFX)

Page 31: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Lessons learned

Reproducibility is hard... across different ML framework: small differences, large impacts

Online experiments take timeNeed simple setup, fast iterations

Page 32: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Train and Deploy CyclesDifferent approaches to productionizing training algorithms:● Manually re-train and re-deploy the model periodically

○ Retraining frequency varies● Automate training and deployment cycles:

○ Cron, Aurora, Airflow Jobs○ Helps reduce model staleness

Page 33: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Train and Deploy CycleApache Airflow: DAGs

Page 34: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Hyperparameter Tuning

Page 35: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Lessons learned

Automation is crucialML models become stale over time

ML Hyperparameter tunings are often tedious

Page 36: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges at Twitter• Unifying Twitter Around a Single ML Platform• Technology Migrations• Health ML Use Case• Summary of Lessons Learned• Future of Our ML Platform

Page 37: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Health ML Case Study● Situation:

○ Models still running using Lua Torch○ Retrained manually every ~6 months.

● Mission: ○ Migrate Health ML models to new ML Platform○ Reach metric parity with existing models (minimum)

Page 38: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

ML Pipeline Overview

Training Data

Preprocessing

Feature Store

Data Exploration Training Offline

Evaluation

Model Tuning

Experiment Loop

Online A/B

Testing

Production

Experiment

Prediction Servers

Page 39: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Lessons Learned

Teamwork: Platform, Modeling, ProductIntegration of All Components

Page 40: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges at Twitter• Unifying Twitter Around a Single ML Platform• Technology Migrations• Summary of Lessons Learned• Future of Our ML Platform

Page 41: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Summary of Lessons Learned● Consistency brings efficiency● DL Reproducibility is hard● Automation is crucial● ML practitioner Diversity

○ ML engineers vs DL researchers○ Production vs exploration

● Collaboration of platform, modeling, product teams

Page 42: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Overview• ML Use Cases at Twitter• ML Platform Requirements & Challenges at Twitter• Unifying Twitter Around a Single ML Platform• Technology Migrations• Summary of Lessons Learned• Future of Our ML Platform

Page 43: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Future

2018 Strategy: Consistency & Adoption2019 Strategy: Ease of Use & Velocity

10x, 50x training speed

Auto model evaluation & validation

Auto model deploy & auto scaling

Auto hyperparameter tuning & architecture search

Continuous Deep Learning Model Training

and so on ...

Page 44: Unifying Twitter Around a Single ML Platform · 2020. 7. 9. · Unifying Twitter Around a Single ML Platform Yi Zhuang (@yz), Nicholas Leonard (@strife076) April 17, 2019

Thank YouIf you are interested in learning more about Twitter Cortex, please contact: @yz @strife076