adaptable intelligence the next computing era · 2018-08-21 · source: john hennessy and david...
TRANSCRIPT
![Page 1: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/1.jpg)
© Copyright 2018 Xilinx
Hot Chips, August 21, 2018Victor Peng, CEO, Xilinx
Adaptable Intelligence The Next Computing Era
![Page 2: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/2.jpg)
© Copyright 2018 Xilinx
Pervasive Intelligence from Cloud to Edge to Endpoints
>> 1
![Page 3: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/3.jpg)
© Copyright 2018 Xilinx
Exponential Growth and Opportunities
>> 2
Source: Patrick Cheesman
Data Explosion
Source: Cisco Global Cloud Index, 2016-2021
Hyperscale Data Centers
Source: Bloomberg; company reports; The Economist estimates
Hyperscale Market Value
![Page 4: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/4.jpg)
© Copyright 2018 Xilinx
Challenges: The End of Moore’s Law and Scaling
>> 3
Source: John Hennessy and David Patterson, Computer Architecture: A Quantitative Approach, 6/e 2018
2um VAX
.75um VAX
.35um Alpha21264
180nm MIPSIP Core
55nm Radeon HD487016nm Zynq RFSoC
![Page 5: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/5.jpg)
© Copyright 2018 Xilinx
Challenge: Exponential Power Density Growth
>> 4
Source: John Hennessy and David Patterson: A New Golden Age for Computer ArchitectureDomain-Specific Hardware/Software Co-Design, Enhanced Security, Open Instruction Sets, and Agile Chip Development Power consumption based on models in “Dark Silicon and the End of Multicore Scaling”
Hadi Esmaelizadeh, ISCA, 2011
![Page 6: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/6.jpg)
© Copyright 2018 Xilinx
The Third Wave: Domain Specific Architectures on Adaptable HW
>> 5
Cache
CPUCISC RISC Multi-Core
Fixed HW AcceleratorsGPU, ASSP, ASIC
DSA’s on Adaptable PlatformsFPGA, Zynq, ACAP
![Page 7: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/7.jpg)
© Copyright 2018 Xilinx
Massive Scale Out Requires DSA’s and Adaptable Platforms
>> 6
Mainframe Era Units (M)
PC EraUnits (100M’s)
Mobile EraUnits (B’s)
Pervasive Intelligence EraUnits (50B’s)
Integration of AI
![Page 8: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/8.jpg)
© Copyright 2018 Xilinx
The Innovation to Deployment Acceleration Imperative
>> 7
Source: Scopus
AI Papers Published Over-the-air Updates
![Page 9: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/9.jpg)
© Copyright 2018 Xilinx
Application Acceleration with DSA’s on FPGA PlatformsFPGA’s AccelerateEntire Application
CPU Plane
Server CPU FPGA Accelerator
Hyperscale Data Centers with FPGA Accelerators
Virtualized & Scaled Out Adaptable AccelerationDynamic Optimization for Changing Workloads & Mix
Security Workloads
Speech (ML) Workloads
Search (Database + ML)
Database Workloads
FPGA Plane
Front End ML Back EndCPU only
Front End ML Back EndCPU + ML ASIC
Front End ML Back EndCPU + FPGA
Execution Time
Front End ML Back EndApp 1
Front End ML Back EndApp 2
Front End ML Back EndApp N
>> 8
![Page 10: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/10.jpg)
© Copyright 2018 Xilinx
Development for DC Compute, Storage, Network Apps
>> 9
![Page 11: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/11.jpg)
© Copyright 2018 Xilinx
Stack for Application Acceleration including ML
>> 10
DSA Shell
SiliconFPGA or SoC or ACAP
xDNN xDNNGenomics Video Search Financial DatabaseOpenCV SmartNICXilinx & Partner
Provided Accelerators
Application-Specific Framework SW
User ApplicationsC++ with Framework API
Standard Platform Interface (DDR, PCIe, Network)
Xilinx Provided ML Engines
Frameworks Provide API to User Applications
Application-Specific Acceleration ML Acceleration
SWHW
SQL
![Page 12: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/12.jpg)
© Copyright 2018 Xilinx
Cloud: Latency-sensitive High Resolution Imaging
>> 11
2.7X Faster
OpenCV
CPU/GPU
Xilinx FPGA
CPU/GPU Results
FPGA ResultsOpenCV
PCIe
GPUCPU
Motion Analysis
Custom Operation
20ms 0.85s
100ms 2.3s
CNN
CNN
2.7X Faster and Lower Power
OpenCV: Pre-processing a BottleneckCNN: Object Detection/Feature ExtractionData Sharing Between CPU and GPU
OpenCV: 5x FasterCNN: Up to 3x Faster Based on Image Size Model Parallelism for High Res ProcessingLower Power
CNN
GPU
PCIe
CPU
CNN
FPGA
Image Scaling, Tiling
Custom Operation
![Page 13: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/13.jpg)
© Copyright 2018 Xilinx
Cloud: Security / Anomaly Detection
>> 12
5X Lower Latency
OpenCV
CPU/GPU
Xilinx FPGA
CPU/GPU Results
FPGA ResultsOpenCV
PCIe
GPUCPU
Custom Operation
10 µs
50 µs
MLP
MLP
High Performance NIC: 100+ GMLP: MLP/RF AccelerationSecurity Vulnerability: Data Travels thru CPU to GPU
Integrated Single Card/Chip SolutionHigh Speed Smart NICReal-time MLP/RF at 5x Lower LatencyTCO and Power Advantage: 1 Card vs 2 CardsIn-line Security Detection is Much Higher Security
PCIe
CPU
MLP/Random Forrest
FPGA
Smart NIC
Custom Operation
5X Lower Latency with High Security
PCIeMLP/Random Forrest
GPU
NIC
![Page 14: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/14.jpg)
© Copyright 2018 Xilinx
Cloud: Smart City / Security
>> 13
CPU/GPU
CPU/Xilinx FPGA
GPUCPU
H.265 Decode: 4x 1080p Streams (P4)OpenCV: Pre-Processing/Motion Analysis on CPUCNN: Object Detection on GPUData Sharing Between CPU and GPU
H.265 Decode: 4x 1080p Streams (P4)OpenCV: Up to 20x Higher PerformanceCNN: 5ms Object Detection (xDNN Acceleration)Integrated Single Chip Solution: Cloud (VU9/13P)Lower Power
PCIe
CPU
CNN
FPGA
Motion Analysis
H.264 Decode
10x Lower Latency and Lower Power
PCIe
GPU
Motion Analysis
Custom Operation
CNN
Motion Analysis
H.264 Decode10 msH.264 16 ms
H.264
10X Lower Latency
Decode OpenCV CNN
CPU/GPU Results
FPGA ResultsDecode OpenCV
CNN
0.9ms 1.7ms
Custom Operation
![Page 15: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/15.jpg)
© Copyright 2018 Xilinx
Xilinx Programmable Architecture Milestones
>> 14
First FPGA Introduced
First VirtexFPGA
Virtex-2 Pro
First 3D FPGA & HW/SW
Programmable SoC
First MPSoC& RFSoC
ACAP
1980
2019
19902000
20102016
![Page 16: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/16.jpg)
© Copyright 2018 Xilinx>> 15
˃ HW/SW Programmable Engines
˃ IP Subsystems and a Network-on-Chip
˃ Platform Offerings for Compute / Storage / Networking
New Device Category for Adaptive Workload-specific Acceleration
From FPGA to Adaptive Compute Acceleration Platform
![Page 17: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/17.jpg)
© Copyright 2018 Xilinx>> 16
˃ Dynamically Adaptable to Workloads
˃ Exponential Increase in Acceleration
˃ Software Programmable
Adaptive Compute Acceleration Platform
![Page 18: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/18.jpg)
© Copyright 2018 Xilinx>> 17
˃ 20x ML Inference Performance
˃ 4x 5G Communications Bandwidth
˃ 112G Transceivers
Adaptive Compute Acceleration Platform
![Page 19: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/19.jpg)
© Copyright 2018 Xilinx
ML Inference 5G Wireless Power
Application-level Performance Enabled by SW Programmable Engine
Everest 7nm 16nm
20x*
40%Less
Power
4x
New HW/SW Programmable Architecture
>> 18
Application-level Performance Enabled by SW Programmable Engine
Domain Specific EngineGreater Compute DensityXilinx 7nm Everest
Heterogenous ArchitecturePE Throughout and Efficiency PL FlexibilityCustomized Memory Hierarchy
SW ProgrammableSW Programmable (e.g., C/C++)Compile, Execute, DebugIncreased Productivity
Multiple ApplicationsML Inference for Cloud DCWireless 5G: Radio, BasebandADAS/AD Embedded VisionWired: DOCSIS Cable Access
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Core
Mem
ory
Array Architecture
Compute Efficiency
![Page 20: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/20.jpg)
© Copyright 2018 Xilinx
xDNN: Adaptable Overlay DNN Processor for Xilinx FPGA
>> 19
No FPGA Expertise Needed
Trained Model Compiler + Runtime
HWIn-LineRelu+Bias+Conv
HWIn-LineRelu+Bias+Conv
HWIn-LineRelu+Bias+Conv
HWIn-LineRelu+Bias+Conv
HWIn-LineRelu+Bias+Conv
HWIn-LineRelu+Bias+Conv
Next
Previous
INPUT
Pool
..
OUTPUT
“One Shot”Network
Deployment
INPUT
OUTPUT
Overlay DNN Processor60-80% Computation Efficiency
Low Latency & High Throughput (Batch = 1)
![Page 21: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/21.jpg)
© Copyright 2018 Xilinx
Soft Overlay Architectures for ML
>> 20
DeePhi Aristotle Architecture DeePhi Descartes Architecture
DMA
AXI4Lite
Interface
DMAHost
Interconnect
Acceleration Kernel
Sparse_LSTM IP
Input Buffer Output Buffer
AXI4
DMA
PE
PE
. . .
DMA
PE
PE DMA
PE
PE
. . . . . .
. . .
Memory Memory Memory
DMA
HostCPU Instruction
FetchUnit
Global Memory Pool
Hybrid Computing Array
PE PE PE PE
High Speed Data Tube
HighPerformance
Scheduler
DPU
RAM
![Page 22: Adaptable Intelligence The Next Computing Era · 2018-08-21 · Source: John Hennessy and David Patterson: A New Golden Age for Computer Architecture Domain-Specific Hardware/Software](https://reader036.vdocuments.net/reader036/viewer/2022070723/5f0209277e708231d40241ce/html5/thumbnails/22.jpg)
© Copyright 2018 Xilinx© Copyright 2018 Xilinx
Building the Adaptable, Intelligent World