applied machine learning at facebook: a datacenter ......applied machine learning at facebook: a...

25
Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective FACEBOOK, INC PRESENTED BY RAVI RAHMAN

Upload: others

Post on 18-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Applied Machine Learning at Facebook: A Datacenter Infrastructure PerspectiveFACEBOOK, INC

    PRESENTED BY RAVI RAHMAN

  • Overview of Presentation1. Applications of Machine Learning at Facebook

    2. Hardware Architectures for Machine Learning

    3. FBLearner Flow and Predictor

  • Scale of Facebook2,000,000,000+ Users

    12 Datacenters

    11+ Machine Learning Services

    Trillions of ML inference queries per day

  • Machine Learning Applications at Facebook

    Mod

    el C

    ompl

    exity

  • Machine Learning Process

  • Machine Learning Process at Facebook

  • Machine Learning Process at FacebookService Layer

    Hardware Layer

    Application Layer

  • CPU vs GPU ArchitectureCPU

    General purpose

    Ideal for sequential or independent operations

    Low throughput

    Low latency

    High power consumption per flop

    GPU

    Specialized for machine learning

    Ideal for parallelized operations on batch inputs

    High Throughput

    High Latency

    Low power consumption per flop

  • Facebook Hardware Options2U Single Socket 2U Dual Socket 3U: Big Sur 3U: Big Basin

    CPUs 1x Broadwell/Skylake

    2x Broadwell/Skylake

    1 1

    GPUs -- -- 8x M40 8x P100 / V100

    Memory 32 GB >> 32GB 96GB 128 GB

    Teraflops 0.8 1.6 56 125.6

    Use Cases Web Tier Compute and Memory Intensive

    Deep Neural ML Training

    Deep Neural Network ML Training

    Products Facebook.com, Facer, Search

    News Feed, Sigma Deprecated Facer, Lumos, Language Translation, Speech Recognition

  • Training Pipelines

  • Deep Neural Network ParallelismData Parallelism◦ Train the full model on multiple machines, and combine the results◦ Best suited for small models that can fit on the GPU and that support large batches◦ Achieved 4x throughput using 5x machine count

    Model Parallelism◦ Train a portion of the model on each machine and synchronize intermediate results across machines◦ Best suited for large models that require small batch sizes

  • Data Parallelism

    https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

    https://static.googleusercontent.com/media/research.google.com/en/archive/large_deep_networks_nips2012.pdf

  • Model Parallelism

    https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

    https://static.googleusercontent.com/media/research.google.com/en/archive/large_deep_networks_nips2012.pdf

  • Deep Neural Network ParallelismQuestions:◦ What is the overhead with parallelism?

    ◦ Which type of parallelism is better?

    ◦ What are some alternative solutions to parallelism?

  • Service Layer

    PyTorchResearch ML workloads

    Python DSL

    Generic support for CPU and GPU backends

    Open Source

    OnnxOpen Neural Network

    Exchange

    Framework-independent way of machine learning

    models and networks

    Vendor implementations optimize for performance

    Caffe2Productionized ML

    workloads

    Specialized, Hardware-Specific Backends

    Optimized for Performance

    Deprecated – now part of PyTorch

    FBLearnerFeature Store: Collection

    of Features

    Flow: Pipeline management system for

    training workflows, including job schedule

    Predictor: Low-latency inference engine

  • PyTorch Examplehttps://colab.research.google.com/drive/1yhOm3g5EiNSElrZcZFLPawP6rbkazlkY?usp=sharing

    https://colab.research.google.com/drive/1yhOm3g5EiNSElrZcZFLPawP6rbkazlkY?usp=sharing

  • FBLearner Flow

    https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

    https://engineering.fb.com/2016/05/09/core-data/introducing-fblearner-flow-facebook-s-ai-backbone/

  • Data Workload vs Training WorkloadData Workers◦ Retrieves, pre-processes, and condenses data◦ Must adapt to new data and features

    Training Workers◦ Trains the underlying model with data from the data workers◦ Fully utilizes machine resources

  • “Free” Compute

  • Complexity of Datacenter-Scale ML TrainingHyperparameter Optimization

    Optimal utilization of heterogenous compute resources

    Proper placement of machine learning jobs relative to data

    Network overhead

    Code and infrastructure re-use for a wide variety of applications and machine learning architectures

    Regulations (e.g. GDPR)

    Question: How does the Facebook datacenter differ from that of Google, Amazon, or Microsoft?

  • FBLearner PredictorServes online predictions for models trained using FBLearner Flow

    Online Inference runs on CPU workloads

    Stringent SLA Requirements for Inference Applications◦ “Nice to Have” deadline: Approximate results, that can be later overridden, can be returned early to the

    user◦ Firm deadline: Proper content must be returned to the user

  • Fitting Models for the Inference Hardware Latency vs Throughput◦ SLA requirements

    Quantization of weights without sacrificing model accuracy ◦ Fewer bits

    Pruning the model so it fits within the LLC◦ Reduced tail latency

  • New Hardware for Machine LearningTPU CEREBRAS

  • ConclusionFacebook’s machine learning requirements span a diverse set of applications, which must be compatible with a heterogenous set of hardware

    Spare capacity from diurnal compute cycles are available for machine learning training workloads

    Distributed training must account for network performance

    Optimal performance requires tight coupling of training and inference hardware with model design

  • QuestionsWhat data would you like to see?

    Is their approach generalizable to machine learning use cases outside of Facebook?

    How does Facebook’s approach compare to that of Microsoft, Amazon, and Google – all of whom offer a public cloud?

    Applied Machine Learning at Facebook: A Datacenter Infrastructure PerspectiveOverview of PresentationScale of FacebookMachine Learning Applications at FacebookMachine Learning ProcessMachine Learning Process at FacebookMachine Learning Process at FacebookCPU vs GPU ArchitectureFacebook Hardware OptionsTraining PipelinesDeep Neural Network ParallelismData Parallelism�Model ParallelismDeep Neural Network ParallelismService LayerPyTorch ExampleFBLearner FlowData Workload vs Training Workload“Free” ComputeComplexity of Datacenter-Scale ML TrainingFBLearner PredictorFitting Models for the Inference Hardware New Hardware for Machine LearningConclusionQuestions