misha smelyanskiy… · misha smelyanskiy director, ai system software/hardware, facebook....

28

Upload: others

Post on 25-Jun-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy
Page 2: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Misha SmelyanskiyDirector, AI System Software/Hardware, Facebook

Page 3: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Challenges and Opportunities

of Architecting AI Systems at

Datacenter Scale

Misha SmelyanskiyDirector, AI Systems Co-Design — Facebook

Page 4: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Machine

Learning

Deep Learning

Due to deep learning success some equate it to ML and even AI

Deep

Learning

Artificial

Intelligence

Programs with ability to learn and

reason like humans

Set of statistical techniques that enable

machines to improve with experience

Multi-layer neural networks which adapt

and learn from vast amounts of data

Page 5: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

The World According to Deep Learning

Page 6: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

1

501

1001

1501

2001

2501

3001

'86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15 '16 '17

Increase

Year

Increase of the term "deep learning" in research

Source: MIT Technology Review

What

Happened

Here?

Page 7: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

1

501

1001

1501

2001

2501

3001

'86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 '12 '13 '14 '15 '16 '17

Increase

Year

Increase of the term "deep learning" in research

Source: MIT Technology Review

AlexNet

Happened!

Page 8: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Deep Learning is Unique

Data & Model Complexity / Hardware Resources

Accuracy

Deep Learning

Other Methods

Page 9: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Facebook Example

Source: Advancing state-of-the-art image recognition with deep learning on hashtags

Page 10: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

ML Growth and Scale at Facebook

ML data growth

• Usage in 2018: 30%

• Usage today: 50%

• Growth in one year: 3X

1-year Training growth

• Ranking engineers: 2X

• Workflows trained: 3X

• Compute consumed: 3X

Inference Scale per Day

• # of predictions: 200T

• # of translations: 6.5B

• Fake accounts removed: 99%

DATA FEATURES DEPLOYMENTTRAINING EVALUTION

Page 11: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Infrastructure Challenges

• Strains compute, memory, storage, and network

• Speed of innovation requires high-performance and flexibility

DATA FEATURES DEPLOYMENTTRAINING EVALUTION

STORAGE

CHALLENGE

COMPUTE AND MEMORY

CHALLENGES

NETWORK

CHALLENGE

Page 12: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

“No Exponential is Forever”‘The important thing is that Moore's Law is exponential, and no exponential is forever…

But we can delay forever’ – Gordon Moore

• Data & Model Complexity ➔ Hardware Resources

• Moore’s Law has declined!

• Solution: Specialization via HW/SW co-design

Source: John Hennessy and David

Patterson, Computer Architecture: A

Quantitative Approach, 6/e. 2018

Page 13: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

What Are The Workloads?

• Ranking and recommendation

• news feed, and search

• Computer vision

• image classification, object detection, and video understanding

• Language

• translation, speech recognition, content understanding

• Recommendation models are among most important models

Page 14: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Deep Learning Recommendation Models

• Embedding look-ups result in sparse irregular accesses

Dense

Features

Sparse

Features

Bottom

NN

EMB

Lookup

Top NN

EMB

Lookup…

Sparse

Features

Feature Interaction

• DL recommendation models help user choose small set of items out of many

Page 15: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Compute

Page 16: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

It’s not all about Matrix Multiplications (MM)

Only ~40% is spent in MM in FB production

➔ Should not over-design hardware for MM and convolutions

Skinny MMs due to depth- and group-wise convolutions, small batch, beam search

➔ Fewer smaller tensor units is better than few big ones

Page 17: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Memory and Storage

Page 18: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Workload Characteristics

Recommendation

Systems

Convolutional

Neural Nets

Ari

thm

eti

c I

nte

nsit

y

Size of Model & Neurons

Neural

Machine

Translation

Rec systems are huge; low arithmetic intensity

➔Need high capacity, high bandwidth memory

➔Unstructured accesses benefit from caches

CV and language models are smaller

➔ Larger on-chip memory helps and gives compiler more flexibility

Page 19: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Network

Page 20: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Interconnect Matters!

Model parallelism

• Different communication patterns

• Needs high bisection bandwidth

Graph learning

• New emerging application

• Need low latency, low diameter

Bottom MLP EMB Lookup

Top MLP

EMB Lookup…

Feature Interaction

Model-parallelism partitions

embedding tables

Page 21: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Programmability

Page 22: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

It is Not All About Peak Flops

Those of us who build ML HW need to think about SW at scale

System2 Perf.

System1 Perf.

Programmer’s Time

Performance

System2 Peak

System1 Peak

Page 23: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

What Makes Programmability Easy (Hard)

Programmability Features Easier To Program Harder to Program

Concurrency & control Few cores Many cores

Computation Scalar, SIMD Tensor units

Data Reuse Caches SW-controlled SRAM

Communication Cache coherence Explicit

Latency Hiding HW prefetcher SW prefetch

• Specialization improves energy efficiency but limits programmability

• How do we get the best of both worlds?

Page 24: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Facebook Approach

Page 25: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Yosemite V2 Inference Platform

NIC

to ToR Switch

PCIe Switch

Storage

M.2M.2M.2 …

CPU Server

Inf ASIC

LPDDR

Intel

Habana

QUALCOMM

Esperanto …

• Scale-up compute, mem/SRAM capacity & BW: tightly couple via PCIe switch

• Common M.2 module, common compute and memory requirements for vendors

• Community-driven approach to programmability via GLOW compiler

Page 26: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Zion Training PlatformZion Training Platform

CPU 0

ASIC

CPU 1

ASIC

CPU 7

ASIC

CPU Fabric

Accelerator Fabric

NIC NIC NIC

System

• Unified 100s TFLOPs of BFLOAT16

• High capacity DDR, high bandwidth HBM

• High bandwidth disaggregated fabric

Where does flexibility come from?

• OCP Accelerator Module(OAM)

• Incremental SW enablement

See Zion talk on Friday @ 9:30am by Whitney Zhao and Dheevatsa Mudigere

Page 27: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy

Call to Action

• Moore’s Law slow-down requires specialization and co-design

• Need to tackle problems holistically: memory, compute, network, storage

• The Only Constant Is Change: exciting new developments in sparsity, graph

learning, unsupervised learning, architecture search, backprop free training, …

• Performance is a start of the conversation; programmability will keep it alive!

• Our journey is only 1% finished

• Let us work together!

Page 28: Misha Smelyanskiy… · Misha Smelyanskiy Director, AI System Software/Hardware, Facebook. Challenges and Opportunities of Architecting AI Systems at Datacenter Scale Misha Smelyanskiy