nax near-data approximate computingayazdanb/publication/slides/nax-ac16-slides.pdfnax near-data...

35
NAX Near-Data Approximate Computing Georgia Institute of Technology Amir Yazdanbakhsh Jacob Sacks Choungki Song 1 Hadi Esmaeilzadeh Pejman Lotfi-Kamran 2 Nam Sung-Kim 3 1 University of Wisconsin-Madison 3 University of Illinois at Urbana-Champaign 2 The Institute for Research in Fundamental Sciences

Upload: phungdien

Post on 09-Apr-2018

225 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

NAXNear-Data Approximate Computing

Georgia Institute of Technology

Amir Yazdanbakhsh Jacob Sacks Choungki Song1

Hadi EsmaeilzadehPejman Lotfi-Kamran2 Nam Sung-Kim3

1 University of Wisconsin-Madison

3 University of Illinois at Urbana-Champaign

2 The Institute for Research in Fundamental Sciences

Page 2: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

2

Approximate ComputingEmbracing Imprecision

Relax theabstractionof“nearperfect” accuracy in

Acceptimprecision toimprove

performanceenergy dissipationresourceutilizationefficiency

DataProcessing Storage Communication

Page 3: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

3

VirtualReality

DataAnalytics

MachineLearning

MultimediaProcessing

NGPU

SM SM SM SM

SM SM SM SM

SM SM SM SM

SM SM SM SM

GPU

Page 4: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

VirtualReality

DataAnalytics

MachineLearning

MultimediaProcessing

NGPU

SM SM SM SM

SM SM SM SM

SM SM SM SM

SM SM SM SM

4

GPU

DiverseclassesofGPUapplications

areamenableto“approximation”.

Page 5: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

5

Neural Transformation for GPUs

NeuralNetwork

NeuralNetwork

Page 6: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

6

Neural Network Operations

xj,ixj,0 xj,n

wj,0

wj,i wj,n

...wj,0

...yj =

sigmoid(

wj,0 ⇥ xj,0 +

. . .

wj,i ⇥ xj,i +

. . .

wj,n ⇥ xj,n +

)yj

Page 7: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

7

Runtime Breakdown of Baseline GPU

AmirYazdanbakhsh,etal.,“NeuralAccelerationforGPUThroughputProcessors”,MICRO2015.

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

Nor

mal

ized

Run

time Data Processing Data Communication

Page 8: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

8

Runtime Breakdown of NGPU

AmirYazdanbakhsh,etal.,“NeuralAccelerationforGPUThroughputProcessors”,MICRO2015.

Nor

mal

ized

Run

time Data Processing Data Communication

45%

Page 9: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

9

In-DRAM Computing Challenges

DRAMiscost-sensitive!

Page 10: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

10

In-DRAM Computing Challenges

DRAMisunderpower constraint!

Page 11: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

11

In-DRAM Computing Challenges

core core core core core core core core

core core core core core core core core

core core core core core core core core

core core core core core core core core

core core core core core core core core

core core core core core core core core

core core core core core core core core

core core core core core core core core

GPUisSIMD!

Page 12: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

12

Near-Data Approximate Computing

In-DRAMCtrl

Page 13: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

13

Near-Data Approximate Computing

A B C DI/O S/ACO

LDEC

COLDEC

RD RD RD RD

IOCNTBitline

...

...

...Arithmetic

UnitArithmetic

Unit

Sigmoid LUT

Sigmoid LUT

Weight Register

Arithmetic Unit

Sigmoid LUT

Read Data

Write Data

Half-bank Half-bank Half-bank Half-bank

Page 14: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

14

NAX Execution Flow

1In-DRAM

Ctrl

Page 15: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

15

NAX Execution Flow

2

In-DRAMCtrl

Page 16: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

16

NAX Execution Flow

3In-DRAM

Ctrl

Page 17: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

17

NAX Execution Flow

4In-DRAM

Ctrl

Page 18: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

18

NAX Execution Flow

5 In-DRAMCtrl

Page 19: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

Inte

rcon

nect

ion

Netw

ork

L2Cache

Memory Controller

MemoryPartition

StreamingMultiprocessor

(SM)

A

E

B

F

C

G

D

H

A

E

B

F

C

G

D

H

I

M

J

N

K

O

L

P

I

M

J

N

K

O

L

P

DRAM Logic

AcceleratorLogic

19

NAX Execution Flow

6

In-DRAMCtrl

Page 20: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

20

NAX Microarchitectures

input register

shifter shift register

output register

contr

oll

er

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

FloatingPoint

FixedPoint

Page 21: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

21

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

Page 22: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

22

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

6 0

Page 23: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

23

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

4

Page 24: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

24

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

3

Page 25: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

25

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S00 = (00110)2

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

1

Page 26: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

26

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Iteration 1

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

T1 = Xi�6 + 0 = (8000)10

Error = 28.9%

S00 = (00110)2

(8000)10

Page 27: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

27

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S01 = (00100)2

S02 = (00011)2

S03 = (00001)2

Iteration 2

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

T2 = Xi�4 + T1 = (10000)10

Error = 11.2%

(2000)10

Page 28: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

28

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S02 = (00011)2

S03 = (00001)2

Iteration 3

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

T3 = Xi�3 + T2 = (11000)10

Error = 2.3%

(1000)10

Page 29: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

29

Simplification of Integrated Arithmetic

input register

shifter shift register

output register

cont

rolle

r

LUT

+

Xi

S03 = (00001)2

Iteration 4

Wi = (01011010)2 = (90)10

Xi = (01111101)2 = (125)10

Yi = Xi x Wi = (11,250)10

T4 = Xi�1 + T3 = (11250)10

Error = 0.0%

(250)10

Page 30: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

30

Experimental Setup

Power Model • TechnologyNode40nm(3-LayersMetal)

• Synopsys,Cadence• GPUWattch,McPAT andCACTI,Verilog

GPU Simulator• GPGPU-SimCycle-LevelSimulator

• Fermi-basedGTX480,Shader CoreFrequency1.4GHz

• NVCCCompiler–O3

MachineLearning,Finance,Vision3DGaming,MedicalImaging

NumericalAnalysis,ImageProcessing

Page 31: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

31

NAX Speedup Compared to NGPU

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Spee

dup

x

x

x

x

xx

x

x

NAX-AFxPNAX-FxPNAX-FP

2.0x

2.0x

1.2x

NAX-AFxP provides 1.2x speedup compared to NGPU.

Page 32: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

32

NAX Energy Saving Compared to NGPU

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

Ener

gy S

avin

g

NAX-AFxP provides 4.8x energy saving compared to NGPU.

4.8x

xxxxxxx

xx NAX-AFxPNAX-FxPNAX-FP

Page 33: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

0.0

0.5

1.0

1.5

2.0

2.5

3.0

33

DRAM System PowerD

RA

M S

yste

mPo

wer

Incr

ease

NAX-AFxP yields to a 0.7x lower DRAM system power.

x

x

x

x

x

x

x

NAX-AFxPNAX-FxPNAX-FP

Lower is better

0.7x

Page 34: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

0% 10% 20% 30% 40% 50% 60% 70% 80% 90%

100%

34

Application Quality LossQ

ualit

y Lo

ss

Quality loss is below 10% in all applications except one.

NAX-AFxPNAX-FxPNAX-FP

Page 35: NAX Near-Data Approximate Computingayazdanb/publication/slides/nax-ac16-slides.pdfNAX Near-Data Approximate Computing ... , Vision 3D Gaming, Medical Imaging ... 5.0 6.0 7.0 8.0 g

35

NAX: Near-Data Approximate Computing

4.8X Energy Saving1.2X Speedup

Ove

rhea

dB

enef

its

over

NG

PU2% Area Overheadper DRAM Chip

≤ 10% Quality Loss

0.7X DRAM System Power