THE MISSING PIECE IN COMPLEX ANALYTICS: SCALABLE, LOW LATENCY MODEL SERVING
AND MANAGEMENT WITH VELOX
Daniel Crankshaw, Peter Bailis, Joseph Gonzalez, Haoyuan Li, Zhao Zhang, Ali Ghodsi, Michael Franklin, and Michael I. Jordan
UC Berkeley AMPLab
CIDR 2015
Talk Outline
• ML model management today
• Velox system architecture• Key idea: Split model family• Prediction serving• Model management• Next directions
Catify: Music for Cats
MODELING TASK
Rating
Songs
MODELING TASK
Ratings
Songs
Prediction
Data
Data Model
Data ModelTraining
BERKELEY DATA ANALYTICS STACK (BDAS)
Spark
SparkStreaming Spark SQL
BlinkDBGraphX
MLlib
MLBase
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Catify: Music for Cats
Catify: Music for CatsCatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Tachyon + HDFS
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Tachyon + HDFS
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
PredictionLatency
Prediction Error
PredictionLatency
Prediction ErrorLower isBetter
PredictionLatency
Prediction ErrorLower isBetter
Lower isBetter
PredictionLatency
Prediction Error
Online Retraining e.g.,
Lower isBetter
Lower isBetter
Pipeline
Tachyon + HDFS
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Catify: Music for Cats
Node.js App Server
Apache Web Server
MySQL
Pipeline
Tachyon + HDFS
Catify: Music for Cats
Node.js App Server
Apache Web Server
MySQL
Catify: Music for Cats
Songs
Users
Songs
Users
O(users * songs)
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
NGINX
MySQL
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
NGINX
MySQL
New Model
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
NGINX
MySQL
Training Data
New Model
Catify: Music for Cats
PredictionLatency
Prediction Error
Online Retraining e.g.,
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g.,
What’s wrong?
1. Predictions have either :
What’s wrong?
1. Predictions have either :a. High latency, low staleness
What’s wrong?
1. Predictions have either :a. High latency, low stalenessb. Low latency, high staleness
What’s wrong?
1. Predictions have either :a. High latency, low stalenessb. Low latency, high staleness
2. Limited optimization of model semantics
What’s wrong?
1. Predictions have either :a. High latency, low stalenessb. Low latency, high staleness
2. Limited optimization of model semantics
3. Ad-hoc lifecycle management
What’s wrong?
Talk Outline
• ML model management today
• Velox system architecture• Split model family• Prediction serving• Model management• Next directions
Talk Outline
• ML model management today• Velox system architecture
• Split model family• Prediction serving• Model management• Next directions
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g.,
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g.,
VELOX
VELOX GOALS
VELOX GOALS1. Low latency and low error
predictions
VELOX GOALS1. Low latency and low error
predictions2. Cross-cutting model-specific
optimizations
VELOX GOALS1. Low latency and low error
predictions2. Cross-cutting model-specific
optimizations3. Unified system eases operation
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g., VELOX
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g., VELOX
key idea:split model into staleness insensitiveand staleness sensitive components
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g., VELOX
key idea:split model into staleness insensitiveand staleness sensitive components
BATCH
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g., VELOX
key idea:split model into staleness insensitiveand staleness sensitive components
INCREMENTAL
BATCH
Spark
SparkStreaming Spark SQL
BlinkDBGraphX
MLlib
MLbase
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
BERKELEY DATA ANALYTICS STACK (BDAS)
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
Training
THE MISSING PIECE
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
Training Management + Serving
THE MISSING PIECE
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
THE MISSING PIECE
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
THE MISSING PIECE
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
THE MISSING PIECE
ModelManager
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
THE MISSING PIECE
ModelManager
PredictionService
PREDICTION SERVICE
PREDICTION SERVICE
1. Implements model serving API
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time3. “Fuzzy” materialized view of model state
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time3. “Fuzzy” materialized view of model state
MODEL MANAGER
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time3. “Fuzzy” materialized view of model state
MODEL MANAGER
1. Maintains models via online and batch retraining
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time3. “Fuzzy” materialized view of model state
MODEL MANAGER
1. Maintains models via online and batch retraining2. Stores model catalog, metadata, versioning
PREDICTION SERVICE
1. Implements model serving API2. Low latency; < 10ms response time3. “Fuzzy” materialized view of model state
MODEL MANAGER
1. Maintains models via online and batch retraining2. Stores model catalog, metadata, versioning3. Contains library of standard models + custom API
Talk Outline
• ML model management today• Velox system architecture
• Key idea: Split model family• Prediction serving• Model management• Next directions
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family
• Prediction serving• Model management• Next directions
PERSONALIZED MODELING
PERSONALIZED MODELING
PERSONALIZED MODELING
A Separate Model for Each User?
PERSONALIZED MODELING
Computationally Inefficient many complex models
A Separate Model for Each User?
PERSONALIZED MODELING
Statistically Inefficient not enough data per user
Computationally Inefficient many complex models
A Separate Model for Each User?
Input(Song) Rating
Input(Song) Rating
Input(Song) Rating
Split
Rating
Split
Input(Song)
PERSONALIZED MODELING
Input(Song)
PersonalizedUser Model
PERSONALIZED MODELING
Input(Song)
Shared Basis Feature Model PersonalizedUser Model
PERSONALIZED MODELING
Input(Song)
Shared Basis Feature ModelTrained across users
Changes Slowly
PersonalizedUser Model
PERSONALIZED MODELING
Input(Song)
Shared Basis Feature ModelTrained across users
Changes Slowly
Trained for each userChanges Quickly
PersonalizedUser Model
Input(Song)
Shared Basis Feature Model PersonalizedUser Model
SPLIT MODEL FORMULATION
Input(Song)
Shared Basis Feature Model PersonalizedUser Model
Input(Song)
SPLIT MODEL FORMULATION
Input(Song)
Shared Basis Feature Model PersonalizedUser Model
Meow
Input(Song)
SPLIT MODEL FORMULATION
Input(Song)
PersonalizedUser Model
Meow
Input(Song)
Terrible
Shared Basis Feature Model
SPLIT MODEL FORMULATION
MATHEMATICAL FORMULATION
Input(Song)
MATHEMATICAL FORMULATION
Input(Song)
x
Shared BasisFeature Models
MATHEMATICAL FORMULATION
Input(Song)
x
Shared BasisFeature Models
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓)x
Shared BasisFeature Models
Changes slowly
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓)x
Shared BasisFeature Models
PersonalizedUser Model
Changes slowly
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓)x
Shared BasisFeature Models
PersonalizedUser Model
Changes slowly
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓) ·wu
x
Shared BasisFeature Models
PersonalizedUser Model
Changes slowly
Highly dynamic
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓) ·wu
x
Shared BasisFeature Models
PersonalizedUser Model
Changes slowly
Highly dynamic
= Rating
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓) ·wu
x
Shared BasisFeature Models
PersonalizedUser Model
Changes slowly
Highly dynamic
= Rating
MATHEMATICAL FORMULATION
Input(Song)
f(x; ✓) ·wu
x
Terrible
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family
• Prediction serving• Model management• Next directions
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family• Prediction serving
• Model management• Next directions
Mesos Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Straming Shark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
System Architecture
ModelManager
PredictionService
PREDICTION API
GET /velox/catify/predict?userid=22&song=27632Simple point queries:
PREDICTION API
GET /velox/catify/predict_top_k?userid=22&k=100
GET /velox/catify/predict?userid=22&song=27632Simple point queries:
More complex ordering queries:
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Look up user weight
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Look up user weight
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Primary key lookup
Look up user weight
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Primary key lookupPartition query by user : always local
Compute Features
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)user independent
}
Compute Features
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Feature computation could be costly
user independent
}
Compute Features
PREDICTIONS
def predict( u: UUID, x: Context )
wu · f(x; ✓)
Feature computation could be costly
user independent
}Cache features forreuse across users
FEATURE CACHING GAINS
Feature caching leads to order-of-magnitude reduction in latency.
Without Caching
Caching Enabled
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords CandidateSongs
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords CandidateSongs
Score andrank allcandidates
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords CandidateSongs
By exploiting split model design we can leverage:
Score andrank allcandidates
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords CandidateSongs
By exploiting split model design we can leverage:
Score andrank allcandidates
A. Shrivastava, P. Li. “Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).” NIPS’14 Best Paper
TOP-K QUERIESQuery predicate to pre-filter candidate set
All Songs Playlist Keywords CandidateSongs
By exploiting split model design we can leverage:
Score andrank allcandidates
A. Shrivastava, P. Li. “Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS).” NIPS’14 Best Paper
Y. Low and A. X. Zheng. “Fast Top-K Similarity Queries Via Matrix Compression.” CIKM 2012
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family• Prediction serving
• Model management• Next directions
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family• Prediction serving• Model management
• Next directions
Mesos Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Straming Shark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
System Architecture
ModelManager
PredictionService
Mesos Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Straming Shark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
System Architecture
ModelManager
PredictionService
1. Online and offline model training2. Sample bias problem
FEEDBACK API
POST /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
FEEDBACK API
POST /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update user models in Velox
Online Learning
FEEDBACK API
POST /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update user models in Velox
Online Learning Offline LearningLogged to DFS for
feature learning in Spark
FEEDBACK API
POST /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update user models in Velox
Online Learning Offline LearningLogged to DFS for
feature learning in Spark
EvaluationContinuously assessmodel performance
ONLINE LEARNING
def observe(u: UUID, x: Context, y: Score)
wu · f(x; ✓)
Update wu with new training point
ONLINE LEARNING
def observe(u: UUID, x: Context, y: Score)
wu · f(x; ✓)
Update wu with new training point
ONLINE LEARNING
def observe(u: UUID, x: Context, y: Score)
wu · f(x; ✓)
Stochastic gradient descent
Update wu with new training point
ONLINE LEARNING
def observe(u: UUID, x: Context, y: Score)
wu · f(x; ✓)
Stochastic gradient descentIncremental linear algebra
OFFLINE LEARNINGdef retrain(trainingData: RDD)
Spark BasedTraining Algs.
wu · f(x; ✓)Efficient batch training using Spark
OFFLINE LEARNINGdef retrain(trainingData: RDD)
Spark BasedTraining Algs.
wu · f(x; ✓)
When do we retrain?
Efficient batch training using Spark
OFFLINE LEARNINGdef retrain(trainingData: RDD)
Spark BasedTraining Algs.
wu · f(x; ✓)
Periodically
When do we retrain?
Efficient batch training using Spark
OFFLINE LEARNINGdef retrain(trainingData: RDD)
Spark BasedTraining Algs.
wu · f(x; ✓)
Periodically Trigger by theevaluation system
When do we retrain?
Efficient batch training using Spark
PREDICTION LATENCYWITH ONLINE TRAINING
Velox keeps models updated at low latency
Velox
Spark
Data Model
Data Model
Sample Bias: model affects the training data.
ALWAYS SERVE THE BEST SONG?
Songs
PredictedRating
ALWAYS SERVE THE BEST SONG?
Songs
PredictedRating
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted song
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted song
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song
Epsilon Greedy
VELOX SOLUTION
PredictedRating
Songs
With prob. 1- ϵ serve the best predicted songWith prob. ϵ pick a random song
Epsilon Greedy
Active Learning Opportunity to explore new systems for
this emerging analytics workload
Talk Outline
• ML model management today• Velox system architecture• Key idea: Split model family• Prediction serving• Model management
• Next directions
Talk Outline
• ML model management today• Velox system architecture• Split model family• Prediction serving• Model management• Next directions
OPEN CHALLENGES FOR DATABASE SYSTEMS
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines• standard set of physical operators
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines• standard set of physical operators
• Automatically choose split for online & offline training
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines• standard set of physical operators
• Automatically choose split for online & offline training• view maintenance and query optimization
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines• standard set of physical operators
• Automatically choose split for online & offline training• view maintenance and query optimization
• Ensure user privacy
OPEN CHALLENGES FOR DATABASE SYSTEMS
• Going beyond the split model family• logical model pipeline language
• More generic training pipelines• standard set of physical operators
• Automatically choose split for online & offline training• view maintenance and query optimization
• Ensure user privacy• Privacy-Preserving DBMS
Data
Data Model
Data
Model
Training
Data
ModelPredictionsServing
Training
Data
ModelPredictionsServing
TrainingFeedb
ack
The future of research in scalable learning systems will be in the integration of the learning lifecycle:
Data
ModelPredictionsServing
TrainingFeedb
ack
Mesos
HDFS, S3, … Tachyon
Hadoop Yarn
Spark Streaming Spark
SQL
Graph X ML
library
BlinkDB MLbase
Spark
VeloxTraining Management + Serving
THE MISSING PIECE
ModelManager
PredictionService
PredictionLatency
Prediction Error
Online Retraining e.g.,
Full pre-materialization e.g., VELOX
key idea:split model into staleness insensitiveand staleness sensitive components
INCREMENTAL
BATCH
SUMMARY
Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems
SUMMARY
Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems
The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions
SUMMARY
Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems
The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions
Velox is coming soon as part of BDAS
SUMMARY
Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems
The Velox system automatically maintains multiple models while providing low latency, scalable, and personalized predictions
Velox is coming soon as part of BDAS
https://amplab.cs.berkeley.edu/projects/velox/
SUMMARY
QUESTIONS?