eyeriss: a spatial architecture for energy -efficient dataflow for convolutional...

53
1 Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks Yu-Hsin Chen 1 , Joel Emer 1, 2 , Vivienne Sze 1 1 MIT 2 NVIDIA

Upload: others

Post on 12-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

1

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks

Yu-Hsin Chen1, Joel Emer1, 2, Vivienne Sze1

1 MIT 2 NVIDIA

Page 2: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

2

Contributions of This Work• A novel energy-efficient CNN dataflow that has been

verified in a fabricated chip, Eyeriss.

• A taxonomy of CNN dataflows that classifies previouswork into three categories.

• A framework that compares the energy efficiency ofdifferent dataflows under same area and CNN setup.

4000 µm

4000 µm

On-c

hip

Buffe

r Spatial PE ArrayEyeriss [ISSCC, 2016]

A reconfigurable CNN processor

35 fps @ 278 mW*

* AlexNet CONV layers

Page 3: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

3

Deep Convolutional Neural Networks

ClassesFCLayers

Modern deep CNN: up to 1000 CONV layers

CONVLayer

CONVLayer Low-level

FeaturesHigh-level Features

Page 4: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

4

Deep Convolutional Neural Networks

CONVLayer

CONVLayer Low-level

FeaturesHigh-level Features

ClassesFCLayers

1 – 3 layers

Page 5: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

5

Deep Convolutional Neural Networks

ClassesCONVLayer

CONVLayer

FCLayers

Convolutions account for morethan 90% of overall computation,dominating runtime and energyconsumption

Page 6: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

6

R

Filter

R

High-Dimensional CNN Convolution

E

EPartial Sum (psum)

Accumulation

Input Image (Feature Map) Output Image

Element-wiseMultiplication

H

a pixel

H

Page 7: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

7

HR

Filter

R

High-Dimensional CNN Convolution

E

Sliding Window Processing

Input Image (Feature Map)a pixel

Output Image

H E

Page 8: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

8

H

High-Dimensional CNN Convolution

R

R

C

Input Image

Output ImageCFilter

Many Input Channels (C)

E

H E

Page 9: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

9

High-Dimensional CNN Convolution

E

Output ImageManyFilters (M)

ManyOutput Channels (M)

M

R

R1

R

R

C

M

H

Input ImageC

C

H E

Page 10: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

10

High-Dimensional CNN Convolution

M

ManyInput Images (N) Many

Output Images (N)…

R

R

R

R

C

C

Filters

E

E

H

C

H

H

C

E1 1

N N

H E

Page 11: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

11

Memory Access is the Bottleneck

ALUfilter weightimage pixelpartial sum updated partial sum

Memory Read Memory WriteMAC*

* multiply-and-accumulate

Page 12: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

12

Memory Access is the Bottleneck

ALU

Memory Read Memory WriteMAC*

* multiply-and-accumulate

DRAM DRAM

• Example: AlexNet [NIPS 2012] has 724M MACs à 2896M DRAM accesses required

Worst Case: all memory R/W are DRAM accesses

Page 13: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

13

Memory Access is the Bottleneck

ALU

Memory Read Memory WriteMAC*

Extra levels of local memory hierarchy

MemDRAM DRAMMem

Page 14: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

14

Memory Access is the Bottleneck

ALU

Memory Read Memory Write

Extra levels of local memory hierarchy

1

Opportunities: data reuse local accumulation1

MemDRAM DRAMMem

MAC*

Page 15: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

15

Types of Data Reuse in CNNFilter ReuseConvolutional Reuse Image Reuse

CONV layers only(sliding window)

CONV and FC layers CONV and FC layers(batch size > 1)

Filter ImageFilters

2

1

Image Filter

2

1

Images

Image pixelsFilter weights

Reuse: Image pixelsReuse: Filter weightsReuse:

Page 16: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

16

Memory Access is the Bottleneck

ALU

Memory Read Memory Write

Extra levels of local memory hierarchy

** AlexNet CONV layers1) Can reduce DRAM reads of filter/image by up to 500×**

1

Opportunities: data reuse local accumulation1

MemDRAM DRAMMem

1

MAC*

Page 17: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

17

Memory Access is the Bottleneck

1) Can reduce DRAM reads of filter/image by up to 500×2) Partial sum accumulation does NOT have to access DRAM12

ALU

Memory Read Memory Write

Extra levels of local memory hierarchy

2

1

Opportunities: data reuse local accumulation1 2

MemDRAM DRAMMem

MAC*

Page 18: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

18

Memory Access is the Bottleneck

Opportunities: data reuse local accumulation

• Example: DRAM access in AlexNet can be reducedfrom 2896M to 61M (best case)

1) Can reduce DRAM reads of filter/image by up to 500×2) Partial sum accumulation does NOT have to access DRAM

1 2

ALU

Memory Read Memory Write

Extra levels of local memory hierarchy

2

1

MemDRAM DRAMMem

12

MAC*

Page 19: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

19

Spatial Architecture for CNN

ProcessingElement (PE)

Global Buffer (100 – 500 kB)

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

DRAM

Local Memory Hierarchy• Global Buffer• Direct inter-PE network• PE-local memory (RF)

Control

Reg File 0.5 – 1.0 kB

Page 20: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

20

Low-Cost Local Data Access

DRAM GlobalBuffer PE

PE PE

ALU fetch data to run a MAC here

ALU

Buffer ALU

RF ALU

Normalized Energy Cost*

200×6×

PE ALU 2×1×1× (Reference)

DRAM ALU

0.5 – 1.0 kB

100 – 500 kB

NoC: 200 – 1000 PEs

* measured from a commercial 65nm process

Page 21: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

21

Low-Cost Local Data Access

ALU

Buffer ALU

RF ALU

Normalized Energy Cost*

200×6×

PE ALU 2×1×1× (Reference)

DRAM ALU

0.5 – 1.0 kB

100 – 500 kB

* measured from a commercial 65nm process

How to exploit data reuse and local accumulationwith limited low-cost local storage?

1 2

NoC: 200 – 1000 PEs

specialized processing dataflow required!

Page 22: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

22

Taxonomy ofExisting Dataflows

• Weight Stationary (WS)

• Output Stationary (OS)

• No Local Reuse (NLR)

Page 23: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

23

Weight Stationary (WS)

• Minimize weight read energy consumption− maximize convolutional and filter reuse of weights

• Examples:[Chakradhar, ISCA 2010] [nn-X (NeuFlow), CVPRW 2014][Park, ISSCC 2015] [Origami, GLSVLSI 2015]

Global Buffer

W0 W1 W2 W3 W4 W5 W6 W7

Psum Pixel

PEWeight

Page 24: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

24

• Minimize partial sum R/W energy consumption− maximize local accumulation

• Examples:

Output Stationary (OS)

[Gupta, ICML 2015] [ShiDianNao, ISCA 2015][Peemen, ICCD 2013]

Global Buffer

P0 P1 P2 P3 P4 P5 P6 P7

Pixel Weight

PEPsum

Page 25: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

25

• Use a large global buffer as shared storage− Reduce DRAM access energy consumption

• Examples:

No Local Reuse (NLR)

[DianNao, ASPLOS 2014] [DaDianNao, MICRO 2014][Zhang, FPGA 2015]

PEPixel

Psum

Global BufferWeight

Page 26: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

26

Energy Efficiency Comparison

0

0.5

1

1.5

2

WS OSA OSB OSC NLR RS

Nor

m. E

nerg

y/O

p

DataflowsNLRWS OSA OSB OSC Row

Stationary

Normalized Energy/MAC

CNN Dataflows

Variants of OS

• Same total area • 256 PEs• AlexNet Configuration* • Batch size = 16

* AlexNet CONV layers

Page 27: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

27

Energy-Efficient Dataflow:Row Stationary (RS)

• Maximize reuse and accumulation at RF

• Optimize for overall energy efficiencyinstead for only a certain data type

Page 28: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

28

1D Row Convolution in PE

* =Filter Output Image

Input Image

Page 29: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

29

1D Row Convolution in PE

* =Filter Partial Sumsa b c a b c

a b c d e

PEReg Fileb ac

d ce ab

Input Image

Page 30: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

30

1D Row Convolution in PE

* =Filtera b c a b c

a b c d e

e d

PEb ac

Reg File

b ac

a

Partial SumsInput Image

Page 31: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

31

1D Row Convolution in PE

* =a b c

a b c d e Partial SumsInput Image

PEb ac

Reg File

c bd

b

ea

Filtera b c

Page 32: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

32

1D Row Convolution in PE

* =a b c

a b c d e Partial SumsInput Image

PEb ac

Reg File

d ce

cb a

Filtera b c

Page 33: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

33

1D Row Convolution in PE

PEb ac

Reg File

d ce

cb a

• Maximize row convolutional reuse in RF− Keep a filter row and image sliding window in RF

• Maximize row psum accumulation in RF

Page 34: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

34

2D Convolution in PE Array

Row 1 Row 1

=*

*PE 1

Page 35: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

35

2D Convolution in PE Array

Row 1 Row 1

Row 2 Row 2

Row 3 Row 3

Row 1

=*

*

*

*

PE 1

PE 2

PE 3

Page 36: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

36

2D Convolution in PE Array

Row 1 Row 1

Row 2 Row 2

Row 3 Row 3

Row 1

=*

Row 1 Row 2

Row 2 Row 3

Row 3 Row 4

=*

* *

* *

* *

Row 2

PE 1

PE 2

PE 3

PE 4

PE 5

PE 6

Page 37: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

37

2D Convolution in PE Array

PE 1Row 1 Row 1

PE 2Row 2 Row 2

PE 3Row 3 Row 3

Row 1

=*

PE 4Row 1 Row 2

PE 5Row 2 Row 3

PE 6Row 3 Row 4

Row 2

=*

PE 7Row 1 Row 3

PE 8Row 2 Row 4

PE 9Row 3 Row 5

Row 3

=*

* * *

* * *

* * *

Page 38: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

38

Convolutional Reuse Maximized

Row 1

Row 2

Row 3

Row 1

Row 2

Row 3

Row 4

Row 2

Row 3

Row 4

Row 5

Row 3

* * *

* * *

* * *

Filter rows are reused across PEs horizontally

Row 1

Row 2

Row 3

Row 1

Row 2

Row 3

Row 1

Row 2

Row 3

PE 1

PE 2

PE 3

PE 4

PE 5

PE 6

PE 7

PE 8

PE 9

Page 39: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

39

Convolutional Reuse Maximized

Row 1

Row 2

Row 3

Row 1

Row 1

Row 2

Row 3

Row 2

Row 1

Row 2

Row 3

Row 3

* * *

* * *

* * *

Image rows are reused across PEs diagonally

Row 1

Row 2

Row 3

Row 2

Row 3

Row 4

Row 3

Row 4

Row 5

PE 1

PE 2

PE 3

PE 4

PE 5

PE 6

PE 7

PE 8

PE 9

Page 40: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

40

Maximize 2D Accumulation in PE Array

Row 1 Row 1

Row 2 Row 2

Row 3 Row 3

Row 1 Row 2

Row 2 Row 3

Row 3 Row 4

Row 1 Row 3

Row 2 Row 4

Row 3 Row 5

* * *

* * *

* * *

Partial sums accumulate across PEs vertically

Row 1 Row 2 Row 3

PE 1

PE 2

PE 3

PE 4

PE 5

PE 6

PE 7

PE 8

PE 9

Page 41: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

41

Dimensions Beyond 2D Convolution3 Multiple Channels1 Multiple Images 2 Multiple Filters

Page 42: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

42

Filter Reuse in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters

R

R

C

H

C

H

H

C

H

Row 1 Row 1Channel 1Image 1

* Row 1=Psum 1Filter 1

Channel 1 Row 1 Row 1Image 2

* Row 1=Psum 2Filter 1

Processing in PE: concatenate image rows

Channel 1 *Row 1Image 1 & 2

=Psum 1 & 2Filter 1

Row 1 Row 1 Row 1 Row 1

share the same filter row

Page 43: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

43

Image Reuse in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters

R

R

C

R

R

C H

C

H

Row 1 Row 1Channel 1Image 1

* Row 1=Psum 1Filter 1

Channel 1 Row 1 Row 1Image 1

* Row 1=Psum 2Filter 2

share the same image row

Processing in PE: interleave filter rows

*Image 1

=Psum 1 & 2Filter 1 & 2

Row 1Channel 1

Page 44: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

44

Channel Accumulation in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters

R

R

C

H

C

H

Row 1 Row 1Channel 1Image 1

* Row 1=Psum 1Filter 1

Channel 2 Row 1 Row 1Image 1

* Row 1=Psum 1Filter 1

accumulate psums

Row 1 Row 1+ = Row 1

Page 45: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

45

Channel Accumulation in PE3 Multiple Channels1 Multiple Images 2 Multiple Filters

R

R

C

H

C

H

Row 1 Row 1Channel 1Image 1

* Row 1=Psum 1Filter 1

Channel 2 Row 1 Row 1Image 1

* Row 1=Psum 1Filter 1

Channel 1 & 2Image 1

=PsumFilter 1

* Row 1

Processing in PE: interleave channels

accumulate psums

Page 46: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

46

CNN Convolution – The Full Picture

Multiple images:

Multiple filters:

Multiple channels:

PERow 1 Row 1

PERow 2 Row 2

PERow 3 Row 3

PERow 1 Row 2

PERow 2 Row 3

PERow 3 Row 4

PERow 1 Row 3

PERow 2 Row 4

PERow 3 Row 5

* * *

* * *

* * *

Image 1=

PsumFilter 1

**

Image 1=

Psum 1 & 2Filter 1 & 2*

Image 1 & 2=

Psum 1 & 2Filter 1

Page 47: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

47

Simulation Results• Same total hardware area

• 256 PEs

• AlexNet Configuration

• Batch size = 16

Page 48: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

48

Dataflow Comparison: CONV Layers

RS uses 1.4× – 2.5× lower energy than other dataflows

NormalizedEnergy/MAC

ALURF

NoCbufferDRAM

0

0.5

1

1.5

2

WS OSA OSB OSC NLR RSCNN Dataflows

Page 49: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

49

Dataflow Comparison: CONV Layers

0

0.5

1

1.5

2

NormalizedEnergy/MAC

WS OSA OSB OSC NLR RS

psums

weightspixels

RS optimizes for the best overall energy efficiency

CNN Dataflows

Page 50: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

50

Dataflow Comparison: FC Layers

0

0.5

1

1.5

2

psums

weightspixels

NormalizedEnergy/MAC

WS OSA OSB OSC NLR RSCNN Dataflows

RS uses at least 1.3× lower energy than other dataflows

Page 51: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

51

Row Stationary: Layer Breakdown

ALURF

NoCbufferDRAM

2.0e10

1.5e10

1.0e10

0.5e10

0L1 L8L2 L3 L4 L5 L6 L7

NormalizedEnergy

(1 MAC = 1)

CONV Layers FC Layers

RF dominates DRAM dominates

Total Energy80% 20%

Page 52: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

52

Summary• We propose a Row Stationary (RS) dataflow to exploit

the low-cost local memories in a spatial architecture.

• RS optimizes for best overall energy efficiency whileexisting CNN dataflows only focus on certain data types.

• RS has higher energy efficiency than existing dataflows− 1.4× – 2.5× higher in CONV layers− at least 1.3× higher in FC layers. (batch size ≥ 16)

• We have verified RS in a fabricated CNN processor chip, Eyeriss

4000 µm

4000 µm

Page 53: Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional ...isca2016.eecs.umich.edu/wp-content/uploads/2016/07/6-1.pdf · 2016-07-25 · 2 Contributions of

53

Thank You

Learn more about Eyeriss at

http://eyeriss.mit.edu