applied machine learning at facebook: a datacenter ......applied machine learning at facebook: a...

Applied Machine Learning at Facebook: A Datacenter Infrastructure PerspectiveFACEBOOK, INC

PRESENTED BY RAVI RAHMAN

Overview of Presentation1. Applications of Machine Learning at Facebook

2. Hardware Architectures for Machine Learning

3. FBLearner Flow and Predictor

Scale of Facebook2,000,000,000+ Users

12 Datacenters

11+ Machine Learning Services

Trillions of ML inference queries per day

Machine Learning Applications at Facebook

Mod

el C

ompl

exity

Machine Learning Process

Machine Learning Process at Facebook

Machine Learning Process at FacebookService Layer

Hardware Layer

Application Layer

CPU vs GPU ArchitectureCPU

General purpose

Ideal for sequential or independent operations

Low throughput

Low latency

High power consumption per flop

GPU

Specialized for machine learning

Ideal for parallelized operations on batch inputs

High Throughput

High Latency

Low power consumption per flop

Facebook Hardware Options2U Single Socket 2U Dual Socket 3U: Big Sur 3U: Big Basin

CPUs 1x Broadwell/Skylake

2x Broadwell/Skylake

1 1

GPUs -- -- 8x M40 8x P100 / V100

Memory 32 GB >> 32GB 96GB 128 GB

Teraflops 0.8 1.6 56 125.6

Use Cases Web Tier Compute and Memory Intensive

Deep Neural ML Training

Deep Neural Network ML Training

Products Facebook.com, Facer, Search

News Feed, Sigma Deprecated Facer, Lumos, Language Translation, Speech Recognition

Training Pipelines

Deep Neural Network ParallelismData Parallelism◦ Train the full model on multiple machines, and combine the results◦ Best suited for small models that can fit on the GPU and that support large batches◦ Achieved 4x throughput using 5x machine count

Model Parallelism◦ Train a portion of the model on each machine and synchronize intermediate results across machines◦ Best suited for large models that require small batch sizes

Data Parallelism

https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

https://static.googleusercontent.com/media/research.google.com/en/archive/large_deep_networks_nips2012.pdf

Model Parallelism

https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

https://static.googleusercontent.com/media/research.google.com/en/archive/large_deep_networks_nips2012.pdf

Deep Neural Network ParallelismQuestions:◦ What is the overhead with parallelism?

◦ Which type of parallelism is better?

◦ What are some alternative solutions to parallelism?

Service Layer

PyTorchResearch ML workloads

Python DSL

Generic support for CPU and GPU backends

Open Source

OnnxOpen Neural Network

Exchange

Framework-independent way of machine learning

models and networks

Vendor implementations optimize for performance

Caffe2Productionized ML

workloads

Specialized, Hardware-Specific Backends

Optimized for Performance

Deprecated – now part of PyTorch

FBLearnerFeature Store: Collection

of Features

Flow: Pipeline management system for

training workflows, including job schedule

Predictor: Low-latency inference engine

PyTorch Examplehttps://colab.research.google.com/drive/1yhOm3g5EiNSElrZcZFLPawP6rbkazlkY?usp=sharing

https://colab.research.google.com/drive/1yhOm3g5EiNSElrZcZFLPawP6rbkazlkY?usp=sharing

FBLearner Flow

https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

Data Workload vs Training WorkloadData Workers◦ Retrieves, pre-processes, and condenses data◦ Must adapt to new data and features

Training Workers◦ Trains the underlying model with data from the data workers◦ Fully utilizes machine resources

“Free” Compute

Complexity of Datacenter-Scale ML TrainingHyperparameter Optimization

Optimal utilization of heterogenous compute resources

Proper placement of machine learning jobs relative to data

Network overhead

Code and infrastructure re-use for a wide variety of applications and machine learning architectures

Regulations (e.g. GDPR)

Question: How does the Facebook datacenter differ from that of Google, Amazon, or Microsoft?

FBLearner PredictorServes online predictions for models trained using FBLearner Flow

Online Inference runs on CPU workloads

Stringent SLA Requirements for Inference Applications◦ “Nice to Have” deadline: Approximate results, that can be later overridden, can be returned early to the

user◦ Firm deadline: Proper content must be returned to the user

Fitting Models for the Inference Hardware Latency vs Throughput◦ SLA requirements

Quantization of weights without sacrificing model accuracy ◦ Fewer bits

Pruning the model so it fits within the LLC◦ Reduced tail latency

New Hardware for Machine LearningTPU CEREBRAS

ConclusionFacebook’s machine learning requirements span a diverse set of applications, which must be compatible with a heterogenous set of hardware

Spare capacity from diurnal compute cycles are available for machine learning training workloads

Distributed training must account for network performance

Optimal performance requires tight coupling of training and inference hardware with model design

QuestionsWhat data would you like to see?

Is their approach generalizable to machine learning use cases outside of Facebook?

How does Facebook’s approach compare to that of Microsoft, Amazon, and Google – all of whom offer a public cloud?

Applied Machine Learning at Facebook: A Datacenter Infrastructure PerspectiveOverview of PresentationScale of FacebookMachine Learning Applications at FacebookMachine Learning ProcessMachine Learning Process at FacebookMachine Learning Process at FacebookCPU vs GPU ArchitectureFacebook Hardware OptionsTraining PipelinesDeep Neural Network ParallelismData Parallelism�Model ParallelismDeep Neural Network ParallelismService LayerPyTorch ExampleFBLearner FlowData Workload vs Training Workload“Free” ComputeComplexity of Datacenter-Scale ML TrainingFBLearner PredictorFitting Models for the Inference Hardware New Hardware for Machine LearningConclusionQuestions

applied machine learning at facebook: a datacenter ......applied machine learning at facebook: a...

Documents