hybrid dot-product design for fp-enabled fpgas · soft fp part - fused multipliers 1dsp =2 18x18 =4...

34
Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019

Upload: others

Post on 06-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Hybrid Dot-Product Design for FP-Enabled FPGAs

Bogdan Pasca

Intel

ARITH 2019, June 10-12, 2019

Page 2: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Hybrid Dot-Product Design for FP-Enabled FPGAs

Bogdan Pasca

Intel

ARITH 2019, June 10-12, 2019

Page 3: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Context

FPGAs intersting neural network training accelerators

training: mostly dot-products in forward+backward propagation

current industry-standard: bfloat16 multiplications, SP reduction

bfloat16 vs SP: 2X bandwidth

Goal→ Find a dot-product implementation that:

maintains an accuracy comparable to bfloat16+SP

maximizes the dot-product density for a given FPGA

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 4: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Context

FPGAs intersting neural network training accelerators

training: mostly dot-products in forward+backward propagation

current industry-standard: bfloat16 multiplications, SP reduction

bfloat16 vs SP: 2X bandwidth

Goal→ Find a dot-product implementation that:

maintains an accuracy comparable to bfloat16+SP

maximizes the dot-product density for a given FPGA

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 5: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density

FPGA devices have a various mix of resources

increasing compute density→ make efficient use of existing mix

Focus on ”Core Logic Fabric” and VP DSP Blocks

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 6: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density

FPGA devices have a various mix of resources

increasing compute density→ make efficient use of existing mix

Focus on ”Core Logic Fabric” and VP DSP Blocks

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 7: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 8: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 9: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 10: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 11: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density - some current devices

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 12: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Background

Intel FPGAs: DSP blocks implement SP mult-add

N-element SP dot-product = N DSPsA B C D E F G HAB+CD EF+GH

AB+CD

32 32 32

32

32 32 32

32

32 32 32

32

EF+GHAB+CD+EF+GH

32 32 32

32

soft-logic-only solutionbfloat16 multiplier→ 2/DSPSP FP adder 1→ 1/DSPN-element dot product: CDSP = N/2+N−1adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)solution is too large

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 13: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Background

Intel FPGAs: DSP blocks implement SP mult-add

N-element SP dot-product = N DSPsA B C D E F G HAB+CD EF+GH

AB+CD

32 32 32

32

32 32 32

32

32 32 32

32

EF+GHAB+CD+EF+GH

32 32 32

32

soft-logic-only solutionbfloat16 multiplier→ 2/DSPSP FP adder 1→ 1/DSPN-element dot product: CDSP = N/2+N−1

adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)solution is too large

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 14: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Background

Intel FPGAs: DSP blocks implement SP mult-add

N-element SP dot-product = N DSPsA B C D E F G HAB+CD EF+GH

AB+CD

32 32 32

32

32 32 32

32

32 32 32

32

EF+GHAB+CD+EF+GH

32 32 32

32

soft-logic-only solutionbfloat16 multiplier→ 2/DSPSP FP adder 1→ 1/DSPN-element dot product: CDSP = N/2+N−1adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)

solution is too large

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 15: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Background

Intel FPGAs: DSP blocks implement SP mult-add

N-element SP dot-product = N DSPsA B C D E F G HAB+CD EF+GH

AB+CD

32 32 32

32

32 32 32

32

32 32 32

32

EF+GHAB+CD+EF+GH

32 32 32

32

soft-logic-only solutionbfloat16 multiplier→ 2/DSPSP FP adder 1→ 1/DSPN-element dot product: CDSP = N/2+N−1adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add)solution is too large

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 16: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

How do we solve this?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 17: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Our implementation

dot−prodct

soft−logic

of the dot−product

hard FP part

α

A1..α

β

P

ACC

βα

Pb

Bα+1..α+βB1..α Aα+1..α+β

Pl

Pg

N = α+β

CALM = f (α,w)

CDSP = α/4+β

Objective: CALM/CDSP ≈ device ALM/DSP ratio

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 18: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Our implementation

dot−prodct

soft−logic

of the dot−product

hard FP part

α

A1..α

β

P

ACC

βα

Pb

Bα+1..α+βB1..α Aα+1..α+β

Pl

Pg

N = α+β

CALM = f (α,w)

CDSP = α/4+β

Objective: CALM/CDSP ≈ device ALM/DSP ratio

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 19: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Hard FP part

ACCA3 B3Pl3232 3232 32323232

32 32

A2 B2

32

A3B

3+A

CC

A2B

2+(A

3B3+

AC

C)

B1A1A0 B0

32 32

32 32

32

A0B0+A1B1Pg

Pb

P

SP accumulation integrated

Pg will merge with the logic-based dot product

Pl recirculated, added with Pb using spare adder

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 20: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part

dot2 dot2 dot2 dot2dot2 dot2

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

+ +

+

CONV

18x18two

+

18x18two

+

18x18two

+

NORM

A/B 0-1 6-7 8-9 10-114-52-3

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 21: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part

dot2 dot2 dot2 dot2dot2 dot2

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

+ +

+

CONV

18x18two

+

18x18two

+

18x18two

+

NORM

A/B 0-1 6-7 8-9 10-114-52-3

FPDSP

P

A/B 13A/B 12

ACCA/B 15A/B 14

FPDSP

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 22: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part

dot2 dot2 dot2 dot2dot2 dot2

...

8

w

...

8

w

...

8

w

...

8

w

...

8

w

...

8

w

23

22

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

+ +

+

CONV

18x18two

+

18x18two

+

18x18two

+

NORM

A/B 0-1 6-7 8-9 10-114-52-3

FPDSP

P

A/B 13A/B 12

ACCA/B 15A/B 14

FPDSP

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 23: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part - fused

Multipliers

1DSP = 2×18x18 = 4×8x8 mantissa multipliers (+ALMs)

skip multiplier normalization

RN→ RZw

extend exponent to avoid overflow/underlow

Adders(except first layer inputs) operate on 2’s complement mantissas

mantissa grows by 1 (+1 optional) bit(s) every stage

mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L)

after final adder, normalization converts to SP

intermediary normalization may be introduced for large α

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 24: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part - fused

Multipliers

1DSP = 2×18x18 = 4×8x8 mantissa multipliers (+ALMs)

skip multiplier normalization

RN→ RZw

extend exponent to avoid overflow/underlow

Adders(except first layer inputs) operate on 2’s complement mantissas

mantissa grows by 1 (+1 optional) bit(s) every stage

mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L)

after final adder, normalization converts to SP

intermediary normalization may be introduced for large α

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 25: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Soft FP part - fused

Multipliers

1DSP = 2×18x18 = 4×8x8 mantissa multipliers (+ALMs)

skip multiplier normalization

RN→ RZw

extend exponent to avoid overflow/underlow

Adders(except first layer inputs) operate on 2’s complement mantissas

mantissa grows by 1 (+1 optional) bit(s) every stage

mantissa format changes from (SM, 1, wF)→ (2C, 1+1+L, w+L)

after final adder, normalization converts to SP

intermediary normalization may be introduced for large α

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 26: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Accuracy (Average)

w - knob to control the accuracyec - exponents centered, es - the exponent spanec = 0,es = 10 - inputs generated in (2−10 ·2,210 ·2)

Table: Average relative error comparison between the proposed hybriddot-product and a typical AI bfloat16+SP implementation for n = 16, α = 12,β = 4, βg = 2, βb = 2

Config Param Proposed AI

ec = 0, es = 5w = 7 1.287601e-02

4.570449e-03w = 8 6.172194e-03w = 9 2.935275e-03

ec = 0, es = 10w = 7 7.934867e-03

3.402314e-03w = 8 4.120781e-03w = 9 1.864206e-03

ec = 0, es = 20w = 7 6.672454e-03

2.996574e-03w = 8 3.161355e-03w = 9 1.588372e-03

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 27: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density

rdot = CDSP/ALMs

Config Param ALMs DSPs rdot

n = 16 w = 7 10307

147α = 12,β = 4 w = 8 1075 153βg = 2,βb = 2 w = 9 1141 163

n = 16 w = 7 8638.5

102α = 10,β = 6 w = 8 894 106βg = 4,βb = 2 w = 9 948 112

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 28: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Density

rdot = CDSP/ALMs

Config Param ALMs DSPs rdot

n = 16 w = 7 10307

147α = 12,β = 4 w = 8 1075 153βg = 2,βb = 2 w = 9 1141 163

n = 16 w = 7 8638.5

102α = 10,β = 6 w = 8 894 106βg = 4,βb = 2 w = 9 948 112

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 29: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 30: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 31: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 32: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 33: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs

Page 34: Hybrid Dot-Product Design for FP-Enabled FPGAs · Soft FP part - fused Multipliers 1DSP =2 18x18 =4 8x8 mantissa multipliers (+ALMs) skip multiplier normalization RN!RZw extend exponent

Open questions

reduction tree topology?

accuracy?

resource ratio - account for plumbing?

integration/syncronization?

design/portability?

routability?

Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs