ernest jamro kat. elektroniki agh, kraków dep. of electronics, agh

33
Algorithms Sprzętowa Implementacja Algorytmów Układy mnożące, konwolwery Multipliers, convolvers Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

Upload: bina

Post on 05-Jan-2016

46 views

Category:

Documents


3 download

DESCRIPTION

Hardware Implementation of Algorithms Sprzętowa Implementacja Algorytmów Układy mnożące, konwolwery Multipliers, convolvers. Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH. 1. 0. 0. 1. X. 1. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 0. +. 1. 0. 0. 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

Hardware Implementation of AlgorithmsSprzętowa Implementacja Algorytmów

Układy mnożące, konwolweryMultipliers, convolvers

Ernest Jamro

Kat. Elektroniki AGH, Kraków

Dep. Of Electronics, AGH

Page 2: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

2

Mnożenie / Multiplication

        1 0 0 1

X       1 0 1 1

        1 0 0 1

      1 0 0 1  

    0 0 0 0    

+ 1 0 0 1      

  1 1 0 0 0 1 1

9 x 11= 99

Page 3: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

3

Parallel Array MultipliersMnożenie równoległe

&

&

&

&

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

a0 a1 a2 a3

& +

ai

bj

ck-1 ck

sl-1

sl

b0

b1

b2

b3

2ck+sl= =sl-1+(ai

bj)+ck-1

p0 p1 p2 p3 p4 p5 p6 p7

Page 4: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

4

FPGA, Built-in multiplier DSP48

Page 5: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

5

Sequential Multiplier /Mnożenie sekwencyjne

A3 A2 A1 A0 B3

B2

B1

B0

FA FA HA

FF FF FF FF FF

PISO

Register / Rejestr

Sumator Adder

Rejestr (Przesuwny) (Shift) Register

Pn,0 Pn,1 Pn,2 Pn,3

FA

FF FF FF

Page 6: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

6

Wallace Tree Multiplier(with Carry Save Adders)

W układach FPGA nie zaleca się stosowania CSA

In FPGA the CSA are not recommended

Page 7: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

7

Mnożenie ze znakiem / Multiplication of Sign numbers

Znak, Moduł / Sign-Module

Standardowe mnożenie liczb dodatnich / Standard unsigned multiplication

Znak= Znak1 XOR Znak2 Sign= Sign1 xor Sign2

W kodzie uzupełnień do dwóch Two’s Complement

2

0

1 22N

ii

iN

N aaa

C. R. Baugh and B. A.Wooley, “A two’s complement parallel array multiplication algorithm,” IEEE Trans. Comput., vol. C-22, pp. 1045–1047, Dec. 1973.

2

0

2

0

2

01

12

01

111

22 )2()2(22222N

ii

iN

i

N

ii

iNi

iNN

iNi

iNNN

N baabbababa

2

0

2

0

122N

iNi

iN

iNi

i baba

2

0

2

0

2

0

2

01

11

122 )2()2(222222N

ii

iN

i

N

ii

iN

iiN

iNNi

iNNNN

N baababbaba

(a1+a2)*(b1+b2)= a1b1+ a1b2+a2b1+a2b2

Page 8: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

8

Mnożenie w kodzie uzupełnień do 2 / Two’s complement multiplication

&

&

&

!&

& +

& +

& +

!& +

& +

& +

& +

!& +

!& +

!& +

!& +

& +

a0 a1 a2 a3

!& +

ai

bj

ck-1 ck

sl-1

sl

b0

b1

b2

b3

sl-1+(aibj)+ck-1=

=2ck+sl

p0 p1 p2 p3 p4 p5 p6 p7

1

Page 9: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

9

Układ mnożący o zredukowanej szerokości / Reduced-width multiplier

&

&

&

&

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

& +

a0 a1 a2 a3

& +

ai

bj

ck-1 ck

sl-1

sl

b0

b1

b2

b3

sl-1+(aibj)+ck-1=

=2ck+sl

p0 p1 p2 p3 p4 p5 p6

truncation line

p7

Page 10: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

10

Kompensacja błędu redukcji / Truncation error compensation

&

&

&

&

& +

& +

& +

& +

& +

& +

& +

& +

& +

a0

a1

a2 a3

& +

ai

bj

ck-1 ck

sl-1

sl

b0

b1

b2

sl-1+(aibj)+ck-1=

=2ck+sl

p3 p4 p5 p6 p7

Page 11: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

11

Mnożenie przez stały współczynnik / Constant Coefficient Multiplier

Look Up Table (LUT)

LUT Address Data

Example: Y= 5*X

Address Data

0 0

1 5

2 10

3 15 ...

Page 12: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

12

LUT-based Multiplier Constant Coefficient: C

Y = CA = CA(0:3) + 24 CA(4:7) Input

LUT B

LUT A

4 4

8

12 12

Adder

8 4

12

16

output

Page 13: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

13

Different ROM sizesInput data width = 6 bits

Mem161

Adder

Mem161

in

out

6

24

a)

Mem161

Adder

in

out

6

4

b)

Mem321

Adder

in

out

6

5

c)

Page 14: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

14

Heteregenous memory usage Virtex: 161, 321, 4k1, 2k2, 1k4, 5128, 25616

Input data and coefficient width= 14

25616 321 3161

147

7 5 4 1

3116

21

25616 321 3161

7 5 4

3116

21

Adder

28

14

7

21

7

21

1

21

11

Page 15: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

15

Exchange distributed RAM to BRAM

CLBBRAM

25616 321 3161

147

7 5 4 1

3116

21

7 5 4

3116

21

+

28

14

7

21

7

21

1

11

25616 321 3161 21LUT161

+

we

wy

14

4

LUT21

LUT161

4

LUT21

LUT161

4

LUT21

LUT161

2

LUT21

Page 16: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

16

Area [CLB] for different input and coeffitinent width K

0

2

4

6

8

10

12

4 6 8 10 12 14 16 18 20 22 24

Only CLB, scale 1:10

# of BRAM

Equvalent cost of 1 BRAM

Page 17: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

17

MM (Multiplierless Multiplication)Mnożenie bezmnożne

• Binary Representation, example B= 14= 11102

M= AB= (A<<1)+(A<<2)+(A<<3)

• Sub-structure Sharing (SS) example B= 27= 110112

tmp= A + (A<<1)

M= AB= tmp + (tmp<<3)

• Canonic Sign Digit (CSD)

set {0, 1, -1} (0 – no operation, 1 – addition, -1 (1) – subtraction)

example: B= 7 = 1112 B= 1001CSD

M=B·A= (A<<2) + (A<<1) + A M= (A<<3)-A

Page 18: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

18

BINARNIE CSD

insert symbol ‘1’ only if the total number of operation is reducedCoefficientBinary (TC) CSDMCSD3 11 101 117 111 1001 100111 1011 10101 101123 10111 101001 11001

Start

i=0, c0=0bn=bn-1

ci+1= bi+1bi bici bi+1ci

di= bi+ci-2ci+1

i= i+1

YNi<n

Stop

Start

i=0carry= false

(bi=1 and carry)or

(bi=0 and not carry)

di=0

Y

iwN Y

N

j=i+1

jwNY

0Q(i,j)<2Y N

Q(i,j)<2and not

(Y<0 and j=w)(sign bit)

di= 1carry= false

di= -1carry= true

i= i+1

carry and B>0Y

di= 1

Stop

N

Y N

Standard Modified

Page 19: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

19

Applience of different techniques of MM

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

3 4 5 6 7 8 9 10 11 12K

CSD-SS

SS

CSD

BR

Page 20: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

20

The MM cost for different coefficients

0

2

4

6

8

10

12

14

16

18

0 50 100 150 200 250

coeff

CLBs

Page 21: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

21

Filters FIR

1

0

)()()(N

k

kixkhiy

Układ opóźniający / Delay Module

Układ arytmetyczny / Arithmetic Module

x(i)

x(i) x(i-1) x(i-N+1)

y(i)

z -1 z -1

w 2 w 1 w 0 Input a y+2,x+2 a y+2,x+1 a y+2,x

+

Output

Page 22: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

22

Filter FIR (sposób pośredni/ transposed)

1

0

)()()(N

k

kixkhiy

Układ opóźniający Delay

Układy mnożące / Arithmetic module x(i)

x(i) h(0)

x(i-1) h(1)

x(i-N+1) h(N-1)

y(i) z-1 +

Input

Output z-1

+

z-1 +

h(0) h(1) h(2)

Page 23: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

23

FIR 2D

z-1 z-1

w2,2 w2,1 w2,0

Line Buf. z-1 z-1

w1,2 w1,1 w1,0

Line Buf. z-1 z-1

w0,2 w0,1 w0,0

Input ay+2,x+2 ay+2,x+1 ay+2,x

ay+1,x+2 ay+1,x+1 ay+1,x

ay,x+2 ay,x+1 ay,x

+

Output

by+1,x+1

Page 24: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

24

Examples of 2D FIR Filters

1 2 1

2 4 2

1 2 1

-1 -2 -1

0 0 0

1 2 1

1 1 1

1 -8 1

1 1 1

Low-Pass Sobel Laplace

Page 25: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

25

FIR Filter N=2LUT-based multipliers

z-1

LUTM0

LUTL0

LUTM1

LUTL1

In 8

4 4 4 4

Adder1 Adder0

Adder2

12 12 1212

13 134

18

4

Multiplier 1 Multiplier 2

Adder1 Adder0

Adder2

12 12 1212

13

9

414

18

Adders Block

Page 26: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

FIR, Arytmetyka w innej kolejności(Parallel) Distributed Arithmetic

1

0

1

0

1

0,2

N

i

N

i

L

jji

jiii ahah

1

0

1

0,2

L

j

N

ijii

j ahcoefficient

inputdifferent bits of

the input

Page 27: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

27

Arytmetyka Rozproszona (Distributed Arithmetic)

a0,0 a1,0 ... aN-1,0

S0

a0,1 a1,1 ... aN-1,1

S1<<1

a0,L-1 a1,L-1 ... aN-1,L-1

SL-1<<(L-1)WDAC

. . .

1

00,

N

iii ah

1

01,

N

iii ah

1

01,

N

iLii ah

1

0

2L

jj

j S

1

0

1

0,2

L

j

N

ijii

j ah

WDAC=K+ log2(N+1)

WLC= K+WIN

The same input bit weight

(smaller LUT widths)

Page 28: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

28

Filtry FIR z liniową fazą / Linear Phase Filters(symetryczne/ symmetric: h(0)=h(N-1), h(1)=h(N-2), ...)

Page 29: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

29

FPGA, Built-in multiplier DSP48

Page 30: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

30

Example of sub-structure sharing for FIR filters

H(z)= 5 + 13z-1 + 5z-2 = 1012 + 11012z-1 + 1012z-2

Example 1:

A= 5 = 1012- temporary expression

H(z)= A + (1000 + A)z-1 + Az-2

Example 2:

A= 1 + z-1

H(z)= 5A + 8z-1 + 5z-2

Page 31: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

31

Materiały dodatkowe

The END

Page 32: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

32

Szybkie mnożenie w układach FPGA

AND

+

AND

+

+

+

+

27a7

b

26a6

b

AND

+

25a5

b

24a4

b

AND

+

23a3

b

22a2

b

AND

+

21a1

b

20a0

b Ewentualne rejestry potokowe

26·(2·a7 ·b + a6 ·b)

Page 33: Ernest Jamro Kat. Elektroniki AGH, Kraków Dep. Of Electronics, AGH

33

Układy mnożące w FPGA

Fragment of Virtex Configurable Logic Block (CLB)

Przykład:

G4 - a7

G3 - bi

G2 - a6

G1 - bi+1

F4 – a7

F3 – bi-1

F2 – a6

F1 – bi

(a7 and bi) xor (a6 and bi+1)