![Page 1: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/1.jpg)
1
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Yu-Hsin Chen1, Joel Emer1, 2, Vivienne Sze1
1 MIT 2 NVIDIA
![Page 2: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/2.jpg)
2
Contributions of This Work• A novel energy-efficient CNN dataflow that has been
verified in a fabricated chip, Eyeriss.
• A taxonomy of CNN dataflows that classifies previouswork into three categories.
• A framework that compares the energy efficiency ofdifferent dataflows under same area and CNN setup.
4000 µm
4000 µm
On-c
hip
Buffe
r Spatial PE ArrayEyeriss [ISSCC, 2016]
A reconfigurable CNN processor
35 fps @ 278 mW*
* AlexNet CONV layers
![Page 3: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/3.jpg)
3
Deep Convolutional Neural Networks
ClassesFCLayers
Modern deep CNN: up to 1000 CONV layers
CONVLayer
CONVLayer Low-level
FeaturesHigh-level Features
![Page 4: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/4.jpg)
4
Deep Convolutional Neural Networks
CONVLayer
CONVLayer Low-level
FeaturesHigh-level Features
ClassesFCLayers
1 – 3 layers
![Page 5: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/5.jpg)
5
Deep Convolutional Neural Networks
ClassesCONVLayer
CONVLayer
FCLayers
Convolutions account for morethan 90% of overall computation,dominating runtime and energyconsumption
![Page 6: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/6.jpg)
6
R
Filter
R
High-Dimensional CNN Convolution
E
EPartial Sum (psum)
Accumulation
Input Image (Feature Map) Output Image
Element-wiseMultiplication
H
a pixel
H
![Page 7: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/7.jpg)
7
HR
Filter
R
High-Dimensional CNN Convolution
E
Sliding Window Processing
Input Image (Feature Map)a pixel
Output Image
H E
![Page 8: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/8.jpg)
8
H
High-Dimensional CNN Convolution
R
R
C
Input Image
Output ImageCFilter
Many Input Channels (C)
E
H E
![Page 9: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/9.jpg)
9
High-Dimensional CNN Convolution
E
Output ImageManyFilters (M)
ManyOutput Channels (M)
M
…
R
R1
R
R
C
M
H
Input ImageC
C
H E
![Page 10: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/10.jpg)
10
High-Dimensional CNN Convolution
…
M
…
ManyInput Images (N) Many
Output Images (N)…
R
R
R
R
C
C
Filters
E
E
H
C
H
H
C
E1 1
N N
H E
![Page 11: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/11.jpg)
11
Memory Access is the Bottleneck
ALUfilter weightimage pixelpartial sum updated partial sum
Memory Read Memory WriteMAC*
* multiply-and-accumulate
![Page 12: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/12.jpg)
12
Memory Access is the Bottleneck
ALU
Memory Read Memory WriteMAC*
* multiply-and-accumulate
DRAM DRAM
• Example: AlexNet [NIPS 2012] has 724M MACs à 2896M DRAM accesses required
Worst Case: all memory R/W are DRAM accesses
![Page 13: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/13.jpg)
13
Memory Access is the Bottleneck
ALU
Memory Read Memory WriteMAC*
Extra levels of local memory hierarchy
MemDRAM DRAMMem
![Page 14: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/14.jpg)
14
Memory Access is the Bottleneck
ALU
Memory Read Memory Write
Extra levels of local memory hierarchy
1
Opportunities: data reuse local accumulation1
MemDRAM DRAMMem
MAC*
![Page 15: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/15.jpg)
15
Types of Data Reuse in CNNFilter ReuseConvolutional Reuse Image Reuse
CONV layers only(sliding window)
CONV and FC layers CONV and FC layers(batch size > 1)
Filter ImageFilters
2
1
Image Filter
2
1
Images
Image pixelsFilter weights
Reuse: Image pixelsReuse: Filter weightsReuse:
![Page 16: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/16.jpg)
16
Memory Access is the Bottleneck
ALU
Memory Read Memory Write
Extra levels of local memory hierarchy
** AlexNet CONV layers1) Can reduce DRAM reads of filter/image by up to 500×**
1
Opportunities: data reuse local accumulation1
MemDRAM DRAMMem
1
MAC*
![Page 17: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/17.jpg)
17
Memory Access is the Bottleneck
1) Can reduce DRAM reads of filter/image by up to 500×2) Partial sum accumulation does NOT have to access DRAM12
ALU
Memory Read Memory Write
Extra levels of local memory hierarchy
2
1
Opportunities: data reuse local accumulation1 2
MemDRAM DRAMMem
MAC*
![Page 18: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/18.jpg)
18
Memory Access is the Bottleneck
Opportunities: data reuse local accumulation
• Example: DRAM access in AlexNet can be reducedfrom 2896M to 61M (best case)
1) Can reduce DRAM reads of filter/image by up to 500×2) Partial sum accumulation does NOT have to access DRAM
1 2
ALU
Memory Read Memory Write
Extra levels of local memory hierarchy
2
1
MemDRAM DRAMMem
12
MAC*
![Page 19: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/19.jpg)
19
Spatial Architecture for CNN
ProcessingElement (PE)
Global Buffer (100 – 500 kB)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
DRAM
Local Memory Hierarchy• Global Buffer• Direct inter-PE network• PE-local memory (RF)
Control
Reg File 0.5 – 1.0 kB
![Page 20: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/20.jpg)
20
Low-Cost Local Data Access
DRAM GlobalBuffer PE
PE PE
ALU fetch data to run a MAC here
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×6×
PE ALU 2×1×1× (Reference)
DRAM ALU
0.5 – 1.0 kB
100 – 500 kB
NoC: 200 – 1000 PEs
* measured from a commercial 65nm process
![Page 21: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/21.jpg)
21
Low-Cost Local Data Access
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×6×
PE ALU 2×1×1× (Reference)
DRAM ALU
0.5 – 1.0 kB
100 – 500 kB
* measured from a commercial 65nm process
How to exploit data reuse and local accumulationwith limited low-cost local storage?
1 2
NoC: 200 – 1000 PEs
specialized processing dataflow required!
![Page 22: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/22.jpg)
22
Taxonomy ofExisting Dataflows
• Weight Stationary (WS)
• Output Stationary (OS)
• No Local Reuse (NLR)
![Page 23: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/23.jpg)
23
Weight Stationary (WS)
• Minimize weight read energy consumption− maximize convolutional and filter reuse of weights
• Examples:[Chakradhar, ISCA 2010] [nn-X (NeuFlow), CVPRW 2014][Park, ISSCC 2015] [Origami, GLSVLSI 2015]
Global Buffer
W0 W1 W2 W3 W4 W5 W6 W7
Psum Pixel
PEWeight
![Page 24: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/24.jpg)
24
• Minimize partial sum R/W energy consumption− maximize local accumulation
• Examples:
Output Stationary (OS)
[Gupta, ICML 2015] [ShiDianNao, ISCA 2015][Peemen, ICCD 2013]
Global Buffer
P0 P1 P2 P3 P4 P5 P6 P7
Pixel Weight
PEPsum
![Page 25: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/25.jpg)
25
• Use a large global buffer as shared storage− Reduce DRAM access energy consumption
• Examples:
No Local Reuse (NLR)
[DianNao, ASPLOS 2014] [DaDianNao, MICRO 2014][Zhang, FPGA 2015]
PEPixel
Psum
Global BufferWeight
![Page 26: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/26.jpg)
26
Energy Efficiency Comparison
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RS
Nor
m. E
nerg
y/O
p
DataflowsNLRWS OSA OSB OSC Row
Stationary
Normalized Energy/MAC
CNN Dataflows
Variants of OS
• Same total area • 256 PEs• AlexNet Configuration* • Batch size = 16
* AlexNet CONV layers
![Page 27: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/27.jpg)
27
Energy-Efficient Dataflow:Row Stationary (RS)
• Maximize reuse and accumulation at RF
• Optimize for overall energy efficiencyinstead for only a certain data type
![Page 28: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/28.jpg)
28
1D Row Convolution in PE
* =Filter Output Image
Input Image
![Page 29: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/29.jpg)
29
1D Row Convolution in PE
* =Filter Partial Sumsa b c a b c
a b c d e
PEReg Fileb ac
d ce ab
Input Image
![Page 30: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/30.jpg)
30
1D Row Convolution in PE
* =Filtera b c a b c
a b c d e
e d
PEb ac
Reg File
b ac
a
Partial SumsInput Image
![Page 31: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/31.jpg)
31
1D Row Convolution in PE
* =a b c
a b c d e Partial SumsInput Image
PEb ac
Reg File
c bd
b
ea
Filtera b c
![Page 32: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/32.jpg)
32
1D Row Convolution in PE
* =a b c
a b c d e Partial SumsInput Image
PEb ac
Reg File
d ce
cb a
Filtera b c
![Page 33: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/33.jpg)
33
1D Row Convolution in PE
PEb ac
Reg File
d ce
cb a
• Maximize row convolutional reuse in RF− Keep a filter row and image sliding window in RF
• Maximize row psum accumulation in RF
![Page 34: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/34.jpg)
34
2D Convolution in PE Array
Row 1 Row 1
=*
*PE 1
![Page 35: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/35.jpg)
35
2D Convolution in PE Array
Row 1 Row 1
Row 2 Row 2
Row 3 Row 3
Row 1
=*
*
*
*
PE 1
PE 2
PE 3
![Page 36: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/36.jpg)
36
2D Convolution in PE Array
Row 1 Row 1
Row 2 Row 2
Row 3 Row 3
Row 1
=*
Row 1 Row 2
Row 2 Row 3
Row 3 Row 4
=*
* *
* *
* *
Row 2
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
![Page 37: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/37.jpg)
37
2D Convolution in PE Array
PE 1Row 1 Row 1
PE 2Row 2 Row 2
PE 3Row 3 Row 3
Row 1
=*
PE 4Row 1 Row 2
PE 5Row 2 Row 3
PE 6Row 3 Row 4
Row 2
=*
PE 7Row 1 Row 3
PE 8Row 2 Row 4
PE 9Row 3 Row 5
Row 3
=*
* * *
* * *
* * *
![Page 38: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/38.jpg)
38
Convolutional Reuse Maximized
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
Row 4
Row 2
Row 3
Row 4
Row 5
Row 3
* * *
* * *
* * *
Filter rows are reused across PEs horizontally
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
![Page 39: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/39.jpg)
39
Convolutional Reuse Maximized
Row 1
Row 2
Row 3
Row 1
Row 1
Row 2
Row 3
Row 2
Row 1
Row 2
Row 3
Row 3
* * *
* * *
* * *
Image rows are reused across PEs diagonally
Row 1
Row 2
Row 3
Row 2
Row 3
Row 4
Row 3
Row 4
Row 5
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
![Page 40: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/40.jpg)
40
Maximize 2D Accumulation in PE Array
Row 1 Row 1
Row 2 Row 2
Row 3 Row 3
Row 1 Row 2
Row 2 Row 3
Row 3 Row 4
Row 1 Row 3
Row 2 Row 4
Row 3 Row 5
* * *
* * *
* * *
Partial sums accumulate across PEs vertically
Row 1 Row 2 Row 3
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
![Page 41: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/41.jpg)
41
Dimensions Beyond 2D Convolution3 Multiple Channels1 Multiple Images 2 Multiple Filters
![Page 42: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/42.jpg)
42
Filter Reuse in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters
R
R
C
H
C
H
H
C
H
Row 1 Row 1Channel 1Image 1
* Row 1=Psum 1Filter 1
Channel 1 Row 1 Row 1Image 2
* Row 1=Psum 2Filter 1
Processing in PE: concatenate image rows
Channel 1 *Row 1Image 1 & 2
=Psum 1 & 2Filter 1
Row 1 Row 1 Row 1 Row 1
share the same filter row
![Page 43: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/43.jpg)
43
Image Reuse in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters
R
R
C
R
R
C H
C
H
Row 1 Row 1Channel 1Image 1
* Row 1=Psum 1Filter 1
Channel 1 Row 1 Row 1Image 1
* Row 1=Psum 2Filter 2
share the same image row
Processing in PE: interleave filter rows
*Image 1
=Psum 1 & 2Filter 1 & 2
Row 1Channel 1
![Page 44: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/44.jpg)
44
Channel Accumulation in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters
R
R
C
H
C
H
Row 1 Row 1Channel 1Image 1
* Row 1=Psum 1Filter 1
Channel 2 Row 1 Row 1Image 1
* Row 1=Psum 1Filter 1
accumulate psums
Row 1 Row 1+ = Row 1
![Page 45: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/45.jpg)
45
Channel Accumulation in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters
R
R
C
H
C
H
Row 1 Row 1Channel 1Image 1
* Row 1=Psum 1Filter 1
Channel 2 Row 1 Row 1Image 1
* Row 1=Psum 1Filter 1
Channel 1 & 2Image 1
=PsumFilter 1
* Row 1
Processing in PE: interleave channels
accumulate psums
![Page 46: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/46.jpg)
46
CNN Convolution – The Full Picture
Multiple images:
Multiple filters:
Multiple channels:
PERow 1 Row 1
PERow 2 Row 2
PERow 3 Row 3
PERow 1 Row 2
PERow 2 Row 3
PERow 3 Row 4
PERow 1 Row 3
PERow 2 Row 4
PERow 3 Row 5
* * *
* * *
* * *
Image 1=
PsumFilter 1
**
Image 1=
Psum 1 & 2Filter 1 & 2*
Image 1 & 2=
Psum 1 & 2Filter 1
![Page 47: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/47.jpg)
47
Simulation Results• Same total hardware area
• 256 PEs
• AlexNet Configuration
• Batch size = 16
![Page 48: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/48.jpg)
48
Dataflow Comparison: CONV Layers
RS uses 1.4× – 2.5× lower energy than other dataflows
NormalizedEnergy/MAC
ALURF
NoCbufferDRAM
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RSCNN Dataflows
![Page 49: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/49.jpg)
49
Dataflow Comparison: CONV Layers
0
0.5
1
1.5
2
NormalizedEnergy/MAC
WS OSA OSB OSC NLR RS
psums
weightspixels
RS optimizes for the best overall energy efficiency
CNN Dataflows
![Page 50: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/50.jpg)
50
Dataflow Comparison: FC Layers
0
0.5
1
1.5
2
psums
weightspixels
NormalizedEnergy/MAC
WS OSA OSB OSC NLR RSCNN Dataflows
RS uses at least 1.3× lower energy than other dataflows
![Page 51: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/51.jpg)
51
Row Stationary: Layer Breakdown
ALURF
NoCbufferDRAM
2.0e10
1.5e10
1.0e10
0.5e10
0L1 L8L2 L3 L4 L5 L6 L7
NormalizedEnergy
(1 MAC = 1)
CONV Layers FC Layers
RF dominates DRAM dominates
Total Energy80% 20%
![Page 52: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/52.jpg)
52
Summary• We propose a Row Stationary (RS) dataflow to exploit
the low-cost local memories in a spatial architecture.
• RS optimizes for best overall energy efficiency whileexisting CNN dataflows only focus on certain data types.
• RS has higher energy efficiency than existing dataflows− 1.4× – 2.5× higher in CONV layers− at least 1.3× higher in FC layers. (batch size ≥ 16)
• We have verified RS in a fabricated CNN processor chip, Eyeriss
4000 µm
4000 µm
![Page 53: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of](https://reader034.vdocuments.net/reader034/viewer/2022042214/5eba03a4877687328872334d/html5/thumbnails/53.jpg)
53
Thank You
Learn more about Eyeriss at
http://eyeriss.mit.edu