integer multipliers - concordia university

Post on 18-Apr-2022

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Integer Multipliers

2

Multipliers

• A must have circuit in most DSP applications

• A variety of multipliers exists that can be chosen

based on their performance

• Serial, Serial/Parallel,Shift and Add, Array, Booth,

Wallace Tree,….

XA

B

P

3

16x16

multiplier

converter

Converter

RB

rese

t

en

converter

RC

en

rese

tRA

rese

t

en

XA

B

P

4

Multiplication Algorithm

Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0

Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1

Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2

… … … …

…. …. …. …. ….

Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2

Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1

-----------------------------------------------------------------------------------------------------------------------------------------

P2n-1 P2n-2 P2n-3 P2 P1 P0

X= Xn-1 Xn-2 ………..……X0 Multiplicand

Y=Yn-1 Yn-2……………….Y0 Multiplier

XA

B

P

5

1. Multiplication AlgorithmsImplementation of multiplication of binary numbers boils down to how to do the additions.

Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64

partial Products and then add them up.

6

MU( Multiplier Unit)

REG

IN

REG

OUT

Control Unit

Storage

Multiplier DesignXA

B

P

7

1-bit

REG

+

G2

G1

0 00

Serial Register

qd

Reset=0

x0

y0

x0y

0

0

0

1

x0y

0

0

CLK CLK/(N+1)

CLK

0

0

Slide 1

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

XA

B

P

Serial Multiplier

8

1-bit

REG

+

G2

G1

S0

00

Serial Register

qd

Reset=0

x1

y0

x1y

0

0

0

1

x1y

0

0

CLK CLK/(N+1)

CLK

0

0

Si: the ith bit of the final result

Slide 2

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

9

1-bit

REG

+

G2

G1

x1y

00S

0

Serial Register

qd

Reset=0

x2

y0

x2y

0

0

0

1

x2y

0

0

CLK CLK/(N+1)

CLK

0

0

Si: the ith bit of the final result

Slide 3

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

10

1-bit

REG

+

G2

G1

x2y

0S

0x

1y

0

Serial Register

qd

Reset=0

x3

y0

x3y

0

0

0

1

x3y

0

0

CLK CLK/(N+1)

CLK

0

0

Si: the ith bit of the final result

Slide 4

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

11

1-bit

REG

+

G2

G1

x3y

0x

1y

0x

2y

0

Serial Register

qd

Reset=1

00 0

S0

0

0

0

0

CLK CLK/(N+1)

CLK

S0

0

Si: the ith bit of the final result

Slide 5

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

12

1-bit

REG

+

G2

G1

0 x2y

0x

3y

0

Serial Register

qd

Reset=0

x0

y1

x0y

1

x1y

0

0

1

S1

C1

CLK CLK/(N+1)

CLK

x1y

0

x1y

0

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Slide 6

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

13

1-bit

REG

+

G2

G1

S1

x3y

00

Serial Register

qd

Reset=0

x1

y1

x1y

1

x2y

0

1

S20

C1

CLK CLK/(N+1)

CLK

x2y

0

x2y

0

C20

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 7

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

14

1-bit

REG

+

G2

G1

S20 0S

1

Serial Register

qd

Reset=0

x2

y1

x2y

1

x3y

0

1

S30

C20

CLK CLK/(N+1)

CLK

x3y

0

x3y

0

C30

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 8

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

15

1-bit

REG

+

G2

G1

S30 S

1S

20

Serial Register

qd

Reset=0

x3

y1

x3y

1

0

1

S40

C30

CLK CLK/(N+1)

CLK

0

0C

40

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 9

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

16

1-bit

REG

+

G2

G1

S40 S

20S

30

Serial Register

qd

Reset=1

00

0

S1

0

S50

C40

CLK CLK/(N+1)

CLK

S1

0

S0

C50=0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 10

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

17

1-bit

REG

+

G2

G1

S50 S

30S

40

Serial Register

qd

Reset=0

x0

y2

x0y

2

S20

1

S2

0

CLK CLK/(N+1)

CLK

S20

C21

S20

S1

S0

Slide 11

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

18

1-bit

REG

+

G2

G1

S2

S40S

50

Serial Register

qd

Reset=0

x1

y2

x1y

2

S30

1

S31

CLK CLK/(N+1)

CLK

S30

C21

S30

C31

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 12

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

19

1-bit

REG

+

G2

G1

S31 S

50S

2

Serial Register

qd

Reset=0

x2

y2

x2y

2

S40

1

S41

CLK CLK/(N+1)

CLK

S40

C31

S40

C41

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 13

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

20

1-bit

REG

+

G2

G1

S41 S

2S

31

Serial Register

qd

Reset=0

x3

y2

x3y

2

S50

1

S51

CLK CLK/(N+1)

CLK

S50

C41

S50

C51

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 14

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

21

1-bit

REG

+

G2

G1

S51 S

31S

41

Serial Register

qd

Reset=1

00

0

S2

0

S60

CLK CLK/(N+1)

CLK

S2

C51

0

S1

S0

C60=0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 15

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

22

1-bit

REG

+

G2

G1

S60 S

41S

51

Serial Register

qd

Reset=0

x0

y3

x0y

3

S31

1

S3

CLK CLK/(N+1)

CLK

S31

C32

0

S31

S2

S0

S1

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

Slide 16

23

1-bit

REG

+

G2

G1

S3

S51S

60

Serial Register

qd

Reset=0

x1

y3

x1y

3

S41

1

S4

CLK CLK/(N+1)

CLK

S41

C32

S41

C42

S2

S0

S1

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 17

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

24

1-bit

REG

+

G2

G1

S4

S60S

3

Serial Register

qd

Reset=0

x2

y3

x2y

3

S51

1

S5

CLK CLK/(N+1)

CLK

S51

C42

S51

C52

S2

S0

S1

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 18

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

25

1-bit

REG

+

G2

G1

S5

S3

S4

Serial Register

qd

Reset=0

x3

y3

x3y

3

S60

1

S6

CLK CLK/(N+1)

CLK

S60

C52

S60

C61

S2

S0

S1

Slide 19

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

26

1-bit

REG

+

G2

G1

S6

S4

S5

Serial Register

qd

Reset=1

00

0

S3

0

S7

CLK CLK/(N+1)

CLK

S3

C61

0

0

S2

S0

S1

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 20

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

27

1-bit

REG

+

G2

G1

S7

S5

S6

Serial Register

qd

Reset=0

00

0

1

CLK CLK/(N+1)

CLK

S4

0

S3

S0

S1

S2

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 21

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

28

1-bit

REG

+

G2

G1

S7

S5

S6

Serial Register

qd

Reset=0

00

0

1

CLK CLK/(N+1)

CLK

S4

0

S3

S0

S1

S2

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 21

X: x3x2x1x0 Y:y 3y2y1y0

Input Sequence for G1:

00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0

00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0

Reset:010000100001000010000

29

D

D DD

DD

+ ++

y0

y3

y2

y1

x0

S0

0

000

000

00

S0

S0S

0 S0

Si: the ith bit of the final result

Slide 1

XA

B

P

Serial / Parallel

Multiplier

30

D

D DD

DD

+ ++

y0

y3

y2

y1

x1

x1y

0

x0

000

00x0y

1

00

S1

C1

S1

S1 S

1S

0

Si: the ith bit of the final result

Ci: the only carry from column i

Slide 2

XA

B

P

31

D

D DD

DD

+ ++

y0

y3

y2

y1

x2

x2y

0

x1

00C1

0x0y

2x

1y

1

0

S20

C20

S2

S2

x0

C21

S2

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 3

XA

B

P

32

D

D DD

DD

+ ++

y0

y3

y2

y1

x3

x3y

0

x2

0

x0y

3x

1y

2x

2y

1

x0

S30

C20

S31 S

3

x1

S3

S2

S1

S0

C21

C30 C

31 C

32

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 4

XA

B

P

33

D

D DD

DD

+ ++

y0

y3

y2

y1

0

x3

x1y

3x2y

2x

3y

1

x1

S40

C30

S41 S

4

x2

C31

C40 C

41

C32

0

S4

S3

S2

S1

S0

C42

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 5

XA

B

P

34

D

D DD

DD

+ ++

y0

y3

y2

y1

0x

2y

3x3y

2

x2

C40

C40

S51 S

5

x3

S5

S4

S3

S2

S1

S0

C41

C50

C42

0

C510

0 0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 6

XA

B

P

35

D

D DD

DD

+ ++

y0

y3

y2

y1

0x

3y

30

x3

0

0

C50 S

6

0

C50

0

C51

0

C6

0

0 0

S6

S5

S4

S3

S2

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Sij: the jth partial sum for column i

Cij: the jth partial carry from column i

Slide 7

XA

B

P

36

D

D DD

DD

+ ++

y0

y3

y2

y1

000

0

0

0

0 S7

0

0

0

C6

0

00

0 0

S7

S6

S5

S4

S3

S2

S1

S0

Si: the ith bit of the final result

Ci: the only carry from column i

Slide 8

XA

B

P

37

8 bit Adder

MUX

0

INPUT Ain (7 downto 0)

REGA

Result (7 downto 0)Result (15 downto 8)

INPUT Bin (7 downto 0)

CLOCK

REGBREGC

Shift AND Add Multiplier X

A

B

P

38

Synchronous Shift and Add Multiplier

controller

Multiplication process:

5 states: Idle, Init, Test, Add, and Shift&Count.

Idle: Starts by receiving the Start signal;

Init: Multiplicand and multiplier are loaded into a load

register and a shift register, respectively;

Test: The LSB in the shift register which contains the

multiplier is tested to decide the next state;

XA

B

P

39

Synchronous Shift and Add Multiplier ControllerDesign

Add: If LSB is ‘1’, then next state is to add the new partial product to the

accumulation result, and the state machine transits to shift&count state ;

Shift&Count: If LSB is ‘0’, then the two shift register shift their contains

one bit right, and the counter counts up by one step. After that, the state

machine transits back to test state;

When the counter reaches to N , a Stop signal is asserted and the state

machine goes to the idle state;

Idle: In the idle state, a Done signal is asserted to indicate the end of

multiplication.

XA

B

P

40

Multiplicand

n-bit AdderShift and Add

Control Logic

An-1

A0

A1

An ...C

Multiplier

Qn-1

Q0

Q1

Qn ...

Shift Right

Add

Slide 1

n-bit Multiplier:

Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right

one bit

Q0=0: Registers C, A, Q are shifted to the right one bit

41

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right

Add

Multiplicand

1 0 1 1

0 000 1 101

Slide 2

Example: 4-bit Multiplier

Initial Values XA

B

P

42

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=0

Add=1

Multiplicand

1 0 1 1

1 110 1 101

Slide 3

Example: 4-bit Multiplier

First Cycle--Add

XA

B

P

43

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=1

Add=0

Multiplicand

1 0 1 1

0 101 1 011

Slide 4

Example: 4-bit Multiplier

First Cycle--Shift XA

B

P

44

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=1

Add=0

Multiplicand

1 0 1 1

1 1110 010

Slide 5

Example: 4-bit Multiplier

Second Cycle--Shift

XA

B

P

45

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=0

Add=1

Multiplicand

1 0 1 1

1 101 1 111

Slide 6

Example: 4-bit Multiplier

Third Cycle--Add XA

B

P

46

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=1

Add=0

Multiplicand

1 0 1 1

0 011 1 111

Slide 7

Example: 4-bit Multiplier

Third Cycle--Shift

XA

B

P

47

4-bit AdderShift and Add

Control Logic

1

Multiplier

Shift Right=0

Add=1

Multiplicand

1 0 1 1

0 100 1 111

Slide 8

Example: 4-bit Multiplier

Fourth Cycle--Add

XA

B

P

48

4-bit AdderShift and Add

Control Logic

0

Multiplier

Shift Right=1

Add=0

Multiplicand

1 0 1 1

1 000 1 111

Slide 9

Example: 4-bit Multiplier

Fourth Cycle--Shift

XA

B

P

49

4*4 Synchronous Shift and Add Multiplier DesignLayout Design

Floor plan of the 4*4 Synchronous

Shift and Add Multiplier

XA

B

P

50

Comparison between Synchronous and Asynchronous

Approaches

.

XA

B

P

51

Example : (simulated by Ovais Ahmed)

Multiplicand = 100010012 = 8916

Multiplier = 101010112 = AB16

Expected Result = 1011011100000112 =5B8316

XA

B

P

52

Regular structure based on add and shift algorithm.

Addition is mainly done by carry save algorithm.

Sign bit extension results in a higher capacitive load and

slows down the speed of the circuit.

Array MultiplierX

A

B

P

53

Addition with CLA

a0

a1

a2

a3

Four-bit Adder

a0

a1

a2

a3

a0

a1

a2

a3

Four-bit Adder

a0

a1

a2

a3

Four-bit Adder

b0

b1

b2

b3

Cin

Ci

n

Cin

Cout

Cout

Cout

0

0

0

0

Product (A*B)

A = a3a

2a

1a

0

B = b3b

2b

1b

0

XA

B

P

54

Array Multiplier with CSA

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

F.A

Ci

Si

P00

P10

P01

P11

P02

P12

P03

0 0 0

P20

P21

P22

P13

P30

P31

P32

P23

0P33

R0

R1

R2R

3R

4R

5R6

R7

Total of 16

gates

A0

A1

A2

A3

B0

B1

B2

B3

Pij

Aj

Bi

30

30

j

i

**Pij =A

i B

j

XA

B

P

55

Critical Path with Array Multipliers

HAFAFA FA

HAFAFA FA

HAFAFA FA

Two of the possible paths for the Ripple-Carry based 4*4 Multiplier

Area = (N*N) AND Gate + (N-1)N Full-Adder

Delay = τ HA + (2N-1) τFA

XA

B

P

56

XA

B

P

57

x0y

0

x1y

0

x0y

1

x3y

0

x2y

1

x1y

2

x0y

3

x2y

0

x1y

1

x0y

2

x4y

0

x3y

1

x2y

2

x1y

3

x0y

4

x4y

1

x3y

2

x2y

3

x1y

4

x4y

2

x3y

3

x2y

4

x4y

3

x3y

4

x4y

4

P1

P2

P3

P4

P5

P6

P7

P8

P9

P0

+++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

Wallace Tree

XA

B

P

58

Array Multiplier + Wallace Tree

XA

B

P

599/25/2017 Concordia VLSI Lab 59

Convert negative partial products to positive representation

• No sign-extension required

)2*2*(*)2*2*(*2

0

1

1

2

0

1

1

ik

i

i

k

k

ik

i

i

k

k yyxxYX

ikk

i

ikik

k

i

ikji

k

j

ji

k

i

kkk xyyxyxyx

1

2

0

11

2

0

1

2

0

2

0

2211 2*2*)2*2**(

XA

B

P

Baugh-Wooley Algorithm

609/25/2017 Concordia VLSI Lab 60

examples of 5-by-5 Baugh-Wooley

FA

FAFA FA FA

FAFA FA FA

FAFA FA FA

FAFA FA FA

FAFA FA FA FAFA

1

P0

a4b0' a3b0a1b0a2b0 a0b0

P9 P8 P7 P6 P5 P4 P3 P2 P1

0 000

a0b1

a3b1 a2b1a1b1

a0b2a3b2 a2b2 a1b2

a4b3'

a4b2'

a4b1'

a4' b4'

a0b3a3b3 a2b3 a1b3

a0'b4a4b4

a3'b4 a2'b4 a1'b4

a4

b4

The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier

XA

B

P

61

a7 a6 a5 a4 a3 a2 a1 a0

* a7 a6 a5 a4 a3 a2 a1 a0

--------

-----

-------

------

--------

-----

-------

------

--------

-----

--------

-----

-------

------

--------

-----

--------

-----

-------

------

--------

-----

-------

------

--------

-----

-------

------

--------

-----

a7*a0 a6*a0

a5*a

0 a4*a0

a3*a

0 a2*a0

a1*a

0 a0*a0

a7*a

1 a6*a1 a5*a1

a4*a

1 a3*a1

a2*a

1 a1*a1

a0*a

1

a7*a2

a6*a

2 a5*a2 a4*a2

a3*a

2 a2*a2

a1*a

2 a0*a2

a7*a3 a6*a3

a5*a

3 a4*a3 a3*a3

a2*a

3 a1*a3

a0*a

3

a7*a

4 a6*a4 a5*a4

a4*a

4 a3*a4 a2*a4

a1*a

4 a0*a4

a7*a5

a6*a

5 a5*a5 a4*a5

a3*a

5 a2*a5 a1*a5

a0*a

5

a7*a

6 a6*a6

a5*a

6 a4*a6 a3*a6

a2*a

6 a1*a6 a0*a6

a7*a7

a6*a

7 a5*a7

a4*a

7 a3*a7 a2*a7

a1*a

7 a0*a7

--------

-----

-------

------

--------

-----

-------

------

--------

-----

--------

-----

-------

------

--------

-----

--------

-----

-------

------

--------

-----

-------

------

--------

-----

-------

------

--------

-----

a7*a6

a7*a

5 a7*a4

a7*a

3 a7*a2 a7*a1

a7*a

0 a6*a0 a5*a0

a4*a

0 a3*a0

a2*a

0 a1*a0 ‘0' a0

XA

B

P

Squarer using Baugh-Wooley Algorithm

62

a1a0a1

‘0’

a2a0

‘0’

‘0’

‘0’

‘0’

a5a0

a4a1

a3a2

a5a1

a4a2a6a0

a6a1

a5a2a7a0

a6a2

a5a3a7a1

a3a1

a4a0

a2a1

a2a3a0‘0’ a0

‘0’a3a3a4

a4

a6a3

a5a4a7a2

a5

a6a4

a7a3

a6a5

a6a7a4

a7a5

a7

a7a6

S0S1S2S4S5S6S7S8S9S10S11S12S13S14S15 S3

Example of an 8bit squarerX

A

B

P

63

Array Multiplier

32bits by 32bits multiplier

XA

B

P

64

Booth (Radix-4) Multiplier

Radix-4 (3 bit recoding) reduces number of partial products to be

added by half.

Great saving in area and increased speed.

A = -an-12n-1 + an-22

n-2 + an-32n-3 + …. + a12 + a0

B = -bn-12n-1 + bn-22

n-2 + bn-32n-3 + …. + b12 + b0

· Base 4 redundant sign digit representation of B is

(n/2) - 1

B = 22i Kii = 0

XA

B

P

65

Ki is calculated by following equation

Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2

3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and

corresponding Ki is calculated.

B is always appended on the right with zero (b-1 = 0), and n is

always even (B is sign extended if needed).

The product AB is then obtained by adding n/2 partial products.

(n/2) - 1

AB = P = 22i Ki Ai = 0

66

Booth Algorithm

Decoding of multiplier to generate signals for hardware use

Xi+1 Xi Xi-1 OP NEG ZERO TWO

0 0 0 0 0 1 0

1 0 0 2 1 0 1

0 1 0 1 0 0 0

1 1 0 1 1 0 0

0 0 1 1 0 0 0

1 0 1 1 1 0 0

0 1 1 2 0 0 1

1 1 1 0 1 1 0

XA

B

P

67

Booth Algorithm

A Booth recoded multiplier examines

Three bits of the multiplicand at a time

It determine whether to add zero, 1, -1, 2, or -2 of that rank of

the multiplicand.

The operation to be performed is based on the current two bits

of the multiplicand and the previous bitXi+1 X Xi-1 Zi/2

0 0 0 0

0 0 1 1

0 1 0 1

0 1 1 2

1 0 0 -2

1 0 1 -1

1 1 0 -1

1 1 1 0

XA

B

P

68

BIT M is

21 20 2-1OPERATION

multiplied

Xi Xi+1 Xi+2

by

0 0 0 add zero (no string) +0

0 0 1 add multipleic (end of string) +X

0 1 0 add multiplic. (a string) +X

0 1 1 add twice the mul. (end of string) +2X

1 0 0 sub. twice the m. (beg. of string) -2X

1 0 1 sub. the m. (-2X and +X) -X

1 1 0 sub . the m. (beg. of string) -X

1 1 1 sub. zero (center of string) -0

69

Booth Algorithm- dot notation

Multiplicand A = ● ● ● ●Multiplier B = (●●)(●●)

Partial product bits ● ● ● ● (B1B0)2A40

Partial product bits ● ● ● ● (B3B2)A41

Product P = ● ● ● ● ● ● ● ●

XA

B

P

70

The following example is used to show how the calculation is done properly.

Multiplicand X = 000011

Multiplier Y = 011101 0 1 1 1 0 1 0

After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial

product two bits and add them together.

X* +1 000000000011

X* -1 1111111101

X* +2 00000110

--------------------------------------------

000001010111

Example

Added to

the

multiplier

XA

B

P

71

Sign Extension

XA

B

P

729/25/2017 Concordia VLSI Lab 72

Sign extension

Traditional sign-extension scheme

• Segment the input operands based on the size of

embedded blocks

• Multiply the segmented inputs and extend the sign bit of

each partial products

• Sum all partial products

Segmented input

operands

Sign extension

×

+

Final result

partial

products

Sign

XA

B

P

73

Booth Algorithm-Example 1

Example 1:

011101 (+29)

000011 (+3)

0

+2 -1 +1

000000000011111111110100000110

0000010101111 (+87)

XA

B

P

74

Booth Algorithm Example 2

011101 (+29)

111101 (-3)

0

+2 -1 +1

111111111101000000001111111010

1111101010011

2s complement of

multiplicand

(-87)

Notice sign

extensions

XA

B

P

75

Booth Algorithm-Example 3

100011 (-29)

111101 (-3)

0

-2 +1 -1

000000000011111111110100000110

0000010101111

Shifted 2s

complement

(+87)

Notice the sign

extensions

XA

B

P

76

Comparison of Booth and parallel

multiplier shift and Add

XA

B

P

77

Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also

negative numbers are entered as 1’s complement, this is why you need to

add the S in the right hand side of the diagram. If you use 2’complement

then the S’s on right side of the diagram can be removed

Template to reduce sign extensions for Booth

Algorithm

78

Comparison of Template and the sign

extension

S1S

1S

1S1

S1S

1S

1

S2S

2S

2S2

S2

S3S

3S

3

S4

B

A

P

S1S

1S

1

S21

S3

B

A

P

Sign template Sign extension

S1S

1S

1S1

S1S

1S

1

S2S

2S

2S2

S2

S3S

3S

3

S4

B

A

P

S1S

1S

1S1

S1S

1S

1

S2S

2S

2S2

S2

S3S

3S

3

S4

B

A

P

S1S

1S

1

S21

S3

B

A

P

S1S

1S

1

S21

S3

B

A

P

Sign template Sign extension

XA

B

P

79

Using the Template 25 * -35

Sign bit

0 0 0 1 1 0 0 1

Add SS 1 1 0 1 1 1 0 1 0

Add inverted S

Add Inverted sign and add 1

1 0 0 0 0 0 1 1 0 0 1 * 1

Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1

1 0 0 1 1 0 0 1 0 * 2

No sign bit 1 1 0 0 1 1 1 * -1

1 1 1 1 0 0 1 0 0 1 0 1 0 1

This is a –ve number. Convert it

0 0 0 0 1 1 0 1 1 0 1 0 1 1

512 256 64 32 8 2 1 = 875

Example of using the template25 * - 35 with -35 as the multiplier. Using 8 bit representation

XA

B

P

80

Booth Multiplier Components

Multiplier

Multiplicand

Booth Encoder

PPU (Partial products unit)

PPA(Partial products adding unit)

Product

XA

B

P

81

+ + + + + + + + + + +

+ + + + + + + + +

+

P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16

0

+++++++++++++++

+ + + + + + +

+ + + + +

0

Partial Product PP0,PP1,PP2(15 downto 0)

Partial Product PP3(15 downto 0)

Ripple Carry Adder

Critical Path

Pipeline Register

+ + + + + + + + + + +

+ + + + + + + + +

+

P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16

0

+++++++++++++++

+ + + + + + +

+ + + + +

0

Partial Product PP0,PP1,PP2(15 downto 0)

Partial Product PP3(15 downto 0)

Ripple Carry Adder

Critical Path

Pipeline Register

Wallace Tree and Ripple Carry Adder Structure.

Of 8*8 multiplier With PipelineX

A

B

P

82

Mulbegin

Stop

A3bit

CLK

Shift

Mux11

Init

Mulend

FSMCLR

Mux12

Mux0

X

SH

LD

D

CLK

CLR

Q16 32

reg_2left32

A

B

Sum

Cout

Cin

37

37

37

Adder

37

1

0

A 37

B 37Y 37

Sel

Mux37

D 37 Q 37

CLK

CLRRegister37

FinishCLK

CLRCounter20

StartMulbegin

CLK

A

CLK

Start

Doubleshift

Init

Start

Stop

QA(0-2)

CLK

Doubleshift

Mux11

Init

Mulend

CLK Finish

Start

Result

Start

Mux0

Start

not used

Start

B

Init

Shift

CLK

Mulend

SH

LD

D

CLK

CLR

Q16 17

reg2right17

=0; A16=0

=1, A16=1

F17

endcheck

Start

B

Init

Shift

CLK

2s

complement

SH

LD

D

CLK

CLR

Q16 32

reg_2left32

SH

LD

D

CLK

CLR

Q16 32

reg_2left32

*2 (shifter)

*2 (shifter)

11

10

01

00

A 32

B 32

Y32

ctrl1

mux4-32

ctrl0

C 32

D 32

Mul11 Mul12

sign

expansion

5

Mux12

Mux0

Hardware implementation of

Booth with shift and addX

A

B

P

83

Simulation PlanX

A

B

P

32-bit Signal

Generator A

32-bit Signal

Generator B

Behavioral Multiplier

A * B

64-bit

Comparator

A[31:0]

Result

Failed

Number

P[63:0]

B[31:0]

My_P[63:0]

My Multiplier

Array MultiplierModified Booth

Multiplier

Wallace Tree

Multiplier

Modified Booth-

Wallace Tree

Multiplier

Twin Pipe

Serial-Parallel

Multiplier

84

Testing the DesignX

A

B

P

85

Simulation For Parallel MultipliersXA

B

P

Signed

Number:

Unsigned

Number:

86

Simulation For Signed S/P MultipliersX

A

B

P

There are 340 ns

delay between the

result and the

operators because

of the D flip-flops

delay.

87

FPGA after implementation, areas of

programming shown clearlyX

A

B

P

88

Another implementation of the above after pipelining, the place and

rout has paced the design in different places.

XA

B

P

89

Spartacus FPGA board

XA

B

P

90

Testing the multiplication system

XA

B

P

91

Comparison of MultipliersXA

B

P

Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005

Array

Multiplier

Modified Booth

Multiplier

Wallace-Tree

Multiplier

Modified Booth-

Wallace

Tree

Multiplier

Twin Pipe

Serial-

Parallel

Multiplier

Behavioral

Multiplier

Area – Total CLB’s

(#)3076.50 2649.50 3325.50 2672.50 490.00 2993.50

Maximum Delay

D(ns)35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33

Total Dynamic

Power P (W)7.52 6.33 7.46 6.41 0.28 6.24

Delay ·Power

Product (DP)

(ns W)268.98 154.64 141.14 118.76 30.62 307.58

Area•Power

Product (AP)

(# W)23128.20 16771.60 24793.93 17127.79 139.54 18665.07

Area•Delay

Product (AD)

(# ns)1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05

Area•Delay2

Product

(AD2)

(# ns2)

3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06

92

Comparison of MultipliersXA

B

P

Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005

Array

Multiplier

Modified Booth

Multiplier

Wallace-Tree

Multiplier

Modified

Booth-

Wallace

Tree

Multiplier

Twin Pipe

Serial-

Parallel

Multiplier

Behavioral

Multiplier

Area – Total CLB’s

(#)3280.50 2800.00 3321.50 2845.50 487.00 3003.00

Maximum Delay

D(ns)37.23 25.33 18.93 18.33 107.52 44.50

Total Dynamic

Power P (W)7.57 6.66 7.32 6.66 0.29 6.26

Delay ·Power

Product (DP)

(ns W)281.88 168.77 138.60 122.13 30.66 278.53

Area•Power

Product (AP)

(# W)24837.98 18656.40 24319.36 18959.57 138.89 18795.78

Area•Delay

Product (AD)

(# ns)1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05

Area•Delay2

Product

(AD2)

(# ns2)

4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06

93

Comparison of MultipliersXA

B

P

The relation of Area

and Delay for

behavioral

multiplier --

"banana curve" 2950

3000

3050

3100

3150

3200

3250

0 20 40 60 80

Area (#)

Delay (ns)

Series1

Change the value of “set_max_delay” in Script file (ns)

0 10 20 30 40 50 60 >60

Area(#) 3014.

5

3013.

0

3110.

0

3193.

5

3019.

5

2999.

5

2978.

5

2978.

5

Power(w) 6.649

9

6.647

0

7.568

3

8.187

8

8.064

5

8.041

9

8.015

6

8.015

6

Delay(n

s)31.98 31.98 30.93 30.08 39.93 49.88 59.63 59.63

94

Comparison of MultipliersXA

B

P

By Chen Yaoquan, M.Eng. 2005

Array

Multiplier

Modified

Booth

Multiplier

Wallace-

Tree

Multiplier

Modified

Booth-

Wallace

Tree

Multiplier

Twin Pipe

Serial-

Parallel

Multiplier

Behavioral

Multiplier

Area Medium Small Large Small Smallest Medium

Critical Delay Medium Fast Very Fast Fastest Very Large Large

Power

ConsumptionLarge Medium Large Medium Smallest Medium

Complexity Simple ComplexMore

Complex

More

ComplexSimple Simplest

Implement Easy Medium Difficut Difficut Easy Easiest

95

Pipelining SimulationX

A

B

P

96

Synthesis for Signed MultipliersXA

B

P

ArrayModified Booth

Wallace Tree

Modified Booth

-Wallace TreeTwin Pipe S/P Behavioral

97

Synthesis for Unsigned MultipliersXA

B

P

ArrayModified Booth

Wallace Tree

Modified Booth

-Wallace TreeTwin Pipe S/P Behavioral

98

Conclusion X

A

B

P

• Modified Booth and Wallace Tree are the best

techniques for high speed multiplication.

• Wallace Tree has the best performance, but it is

hard to implement.

• Booth algorithm based multipliers have lower area

among parallel multipliers.

• For behavioral multipliers, the area will increase

while the delay decreases.

99

ComparisonArray

Multiplier

Modified

Booth

Multiplier

Wallace Tree

Multiplier

Modified Booth

& Wallace Tree

Multiplier

Twin Pipe Serial-

Parallel

MultiplierArea – Total

CLB’s (#)

1165 1292 1659 1239 133

Maximum Delay

(ns) 187.87ns 139.41ns 101.14ns 101.43ns

22.58ns

(722.56ns)

Power

Consumption at

highest speed

(mW)

16.6506m

W

(at 188ns)

23.136mW

(at 140ns)

30.95mW

(at 101.14ns)

30.862mW

(at 101.43ns)

2.089mW

(at 722.56ns)

Delay Power

Product (DP)

(ns mW)

3128.15 3225.39 3130.28 3130.33 1509.42

Area Power

Product (AP)

(# mW)

19.397 x

103

29.891 x 103 51.346 x 103 38.238 x 103 277.837

Area Delay

Product (AD)

(# ns)

218.868 x

103

180.118 x

103

167.791 x 103 125.671 x 103 96.101 x 103

Area Delay2

Product(AD2)

(# ns2)

41.119 x

106

25.110 x 106 16.970 x 106 12.747 x 106 69.438 x 106

XA

B

P

100

NOTICE

The rest of these slides are for extra information only

and are not part of the lecture

XA

B

P

101

Array Addition

102

Addition

of 8

binary

numbers

using the

Wallace

tree

principal

103

104

105

A

RESET

Done

RESULT 32 Adder37

37

REGSTER37

D

CLK

CLR

LAST_RESULT 37

B BEGIN0 CLK

MULT320

37

COUNTER20 INVERTER

AND_2

FINISH0 END0

START

Q

CLR

106

Baugh-Wooley two's complement

multiplier:

FA

FAFA FA FA

FAFA FA FA

FAFA FA FA

FAFA FA FA

FAFA FA FA FAFA

1

P0

a4b0' a3b0a1b0a2b0 a0b0

P9 P8 P7 P6 P5 P4 P3 P2 P1

0 000

a0b1

a3b1 a2b1a1b1

a0b2a3b2 a2b2 a1b2

a4b3'

a4b2'

a4b1'

a4' b4'

a0b3a3b3 a2b3 a1b3

a0'b4a4b4

a3'b4 a2'b4 a1'b4

a4

b4

The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier

107

Example of Baugh-Wooley Two’s Complement Multiplication

p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 P

a4' a3'b4 a2'b4 a1'b4 a0'b4

X

A

B

a4 a3 a2 a1 a0

b4 b3 b2 b1 b0

a4b0' a3b0 a2b0 a1b0 a0b0

a4b4 a4b3' a3b3 a2b3 a1b3 a0b3

a4b2' a3b2 a2b2 a1b2 a0b2

a4b1' a3b1 a2b1 a1b1 a0b1

+

b4' a4

1 b4

1 1 1 0 1 1 1 1 1 1

0 0 1 0 0

= -65

X =13

= -5

0 1 1 0 1

1 1 0 1 1

1 0 0 0 0

0 1 0 1 1

0 0 1 0 1 1

0 1 0 1 1

+

1 1

1 0

0 0 0 1 0 0 0 0 0 1

1 0 0 0 0

= 65

X

=13

= 5

0 1 1 0 1

0 0 1 0 1

0 0 0 0 0

0 1 1 0 1

0 0 0 0 0 0

0 1 1 0 1

+

1 0

1 0

0 0 0 1 0 0 0 0 0 1

0 1 1 0 0

= 65

X

= -13

= -5

1 0 0 1 1

1 1 0 1 1

0 0 0 1 1

0 0 0 1 1

1 0 0 0 1 1

1 0 0 0 0

+

0 1

1 1

1 1 1 0 1 1 1 1 1 1

1 0 0 1 0

= -65

X

=13

= -5

0 1 1 0 1

1 1 0 1 1

0 1 1 0 1

0 1 1 0 1

0 0 1 1 0 1

0 0 0 0 0

+

0 0

1 1

108

Cluster MultipliersX

A

B

P

Divide the multiplier into smaller multipliers

109

Cluster MultipliersXA

B

P

Multiplier

A8~A7 A3~A0

4-bit

Multiplier

Final Addition Stage

8-bit

Latch

8-bit

Latch

8

/CLR

CLK

CLK

4-bit

Multiplier

8-bit

Latch

8-bit

Latch

8

/CLR

CLK

CLK

Multiplicand

B8~B7 B3~B0

4-bit

Multiplier

8-bit

Latch

8-bit

Latch

8

/CLR

CLK

CLK

4-bit

Multiplier

8-bit

Latch

8-bit

Latch

8

/CLR

CLK

CLK

44 4 4

EN3 EN2 EN1 EN0

16

P8-bit cluster low power multiplier

The circuit used

to generate the

enable signal

110

Cluster Multipliers

• Dividing the multiplication circuit into clusters

(blocks) of smaller multipliers

• Applying clock gating techniques to disable the

blocks that are producing a zero result.

• Features

– Low Power (claims 13.4 % savings)

XA

B

P

111

Multiplexer-Based Array MultipliersXA

B

P

1

0

1

1

2 22n

j

j

j

n

j

j

jj ZyxP

0

1Z0

2Z

1

2Z

0

3Z

1

3Z

2

3Z

0

4Z

1

4Z

2

4Z

3

4Z

jjjjj yXYxZ 021 ...XXXX jjj

Z j

xjyj

112

Multiplexer-Based Array MultipliersXA

B

P

Two types of cells:

Cell 1: produce the terms Zij2j and includes a full adder of

carry save adder array

Cell 2: produce the terms xjyj 2j and includes a full adder of

carry save adder array

113

Multiplexer-Based Array Multipliers

• Characteristics

– Faster than Modified Booth

– Unlike Booth, does not require encoding logic

– Requires approximately N2/2 cells

– Has a zigzag shape, thus not layout-friendly

XA

B

P

114

Multiplexer-Based Array MultipliersXA

B

P

• Improvement

– More rectangular layout

– Save up to 40 percent area without penalties

– Outperforms the modified Booth multiplier in both

speed and power by 13% to 26%

115

Gray-Encoded Array Multiplier XA

B

P

Dec Hyb Dec Hyb Dec Hyb Dec Hyb

0 0000 4 0100 -8 1100 -4 1000

1 0001 5 0101 -7 1101 -3 1001

2 0011 6 0111 -6 1111 -2 1011

3 0010 7 0110 -5 1110 -1 1010

• 2’s complement Hybrid Coding

– Having a single bit different for consecutive values

– Reducing the number of transitions, and thus power ( for

highly correlated streams ).

116

Gray-Encoded Array Multiplier XA

B

P

An 8-bit wide 2’s complement radix-4 array multiplier

117

Gray-Encoded Array Multiplier

• Characteristics

– Uses gray code to reduce the switching activity

of multiplier

– Saves 45.6% power than Modified Booth

– Uses greater area(26.4% ) than Modified Booth

XA

B

P

118

Ultra-high Speed Parallel Multiplier

• How to ultra-high speed?

– Based on Modified Booth Algorithm and Tree

Structure (Column compress)

– Chooses efficient counters (3:2 and 5:3)

– Uses the new compressor (faster 20% )

– Uses First Partial product Addition (FPA)

Algorithm (reducing the bits of CLA by 50%)

XA

B

P

119

Ultra-high Speed Parallel Multiplier XA

B

P

Calculate the

partial products as

soon as possible.

The final CLA is

only 16-bit instead

of 32-bit.

Divide into 3 rows

or 5 rows only

(most efficient).

Calculation process using parallel counter in case of 16x16

---Totally reduce delay by about 30%

120

ULLRLF Multiplier

• ULLRLF stands for Upper/Lower Left-to-

Right Leapfrog.

• Combine the following techniques:

– Signal flow optimization in [3:2] adder array

for partial product reduction,

– Left-to-right leapfrog (LRLF) signal flow,

– Splitting of the reduction array into upper/lower

parts.

XA

B

P

121

ULLRLF MultiplierXA

B

P

1) Signal flow optimization in [3:2] adder array

-- For n = 32, the delay is reduced by 30 percent.

-- The power is saved also.

PPij is always connected to pin A Sin/Cin are connected to B/C ,

most Sin signals are connected to C

122

ULLRLF MultiplierX

A

B

P

2) Left-to-Right Leapfrog (LRLF) Structure

-- The delay of signals is more balanceable.

-- Low power.

The sum signals skip

over alternate rows.

123

ULLRLF MultiplierX

A

B

P

3) Upper/Lower Split Structure

-- The long path of data path be broken into parallel short

paths, there would be a saving in power.

-- The delay of Partial Products Reduction is reduced.

Only n+2 bits

124

ULLRLF MultiplierXA

B

P

Floorplan of ULLRLF (n = 32)

•ULLRLF multipliers have

less power than optimized

tree multipliers for n ≤ 32 while

keeping similar delay and

area.

• With more regularity and

inherently shorter

interconnects, the ULLRLF

structure presents a

competitive alternative to tree

structures.

125

Signed Array MultiplierXA

B

P

HAFA

FAFA

HA

HA

A31

A29A31

A31

A31 A30

A31

HA

FAFA

FA

A30 A0

A1 A0

B2

A2 A1 A0

A3 A2 A1

B0

FA FAFA

A30 A1 A0

B31

32-bit carry look ahead adder

FA

A28

A29

A30

A0

B1

B3

FA

A0

P63 P62 P61 P34 P33 P31 P30 P2 P1 P0P3

STAGE 4 TO 30

(Each stage includes 32 AND gates, 31 full adders ,1 half adder and 1 NOT gate)

1

FA

32*32-Bit Array Multiplier for Signed Number

One stage of carry

save adder

126

Unsigned Array MultiplierXA

B

P

A31

A29

A31

A31

A31

A31

HA

FA FA HA

HAHA FAFA

FA

A30 A0

A30 A1 A0

B2

A2 A1 A0

A3 A2 A1

B0

FA FAFA

A30 A1 A0

B31

32-bit carry look ahead adder

FA

FA

A28

A29

A30

A0

B1

B3

FA

A0

P63 P62 P61 P33 P32 P31 P30 P2 P1 P0P3

STAGE 4 TO 30

(Each stage includes 32 AND gates, 31 full adders and 1 half adder)

32*32-Bit Array Multiplier for Unsigned Number

One stage of carry

save adder

127

Signed Modified Booth MultiplierX

A

B

P

................................{

{{

{{{

{

{{

{{{

{

{{

{

0

M

u

l

t

I

p

l

i

e

r

…............................1 E .............................................................1 E ................................….........................1 E ..................................S….......................1 E ..................................S.........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S..........….............1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S.............................1 E ............................. ..S.....................................

6 3 6 0 5 5 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0

32*32-bit Booth Multiplier for

Signed Number

E = The inversion of sign bit in each row

S = the B i+1 bit in the three encoded bits

16 rows of partial products

B i-1 B B i+1

LSB

MSB

128

Signed Modified Booth

MultiplierXA

B

P

SEL SEL SEL SEL SEL SEL

A0 0A1A2A3A4

SEL SEL SEL SEL

A0 0A1A2

FA FA FA

SEL SEL SEL

A0 0A1

SEL SEL SEL SEL SEL SEL

A31 A31 A30 A29 A28 A27 A26

FA FA FAFAHA

1

SEL SEL SEL SEL

A31 A31 A30 A29 A28

1

SEL SEL

A31 A31 A30

1

HA FA HA HA HA HA HA

INVERT00

P0P1P2P3

INVERT2

P4

1

P63 P62 P61 P60 P5

64-bit carry look ahead adder

STAGE 3 TO 15

(Each stage includes 33 PP selectors, 31 full adders ,1 half adder and 1 NOT gate)INVERT n

Booth

Encoder

Booth

Encoder

Booth

Encoder

Booth

Encoder

B[1:0]0

B[3:1]

B[5:3]

B[31:5]

X1[0]

X2[0]

INVERT0

X1[1]

X2[1]

INVERT1

INVERT1

X1[2]

X2[2]

INVERT2

X1[n]

X2[n]

INVERT n

One stage

32*32-Bit Modified Booth Multiplier for Signed Number

0

129

Unsigned Modified Booth MultiplierX

A

B

P

…............................1 S' .............................................................1 S' .................................….........................1 S' ...................................S….......................1 S' ...................................S.........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S..........….............1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S.................................................................S...........................

................................{

{{

{{{

{

{{

{{{

{

{{

{

0

M

u

l

t

i

p

l

i

e

r

6 3 6 0 5 5 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0

32*32-bit Booth Multiplier for

unsigned Number

S = the B i+1 bit in the three encoded bits

S' = The inversion of S

B i-1 B B i+1

{ 00

17 rows of partial products

LSB

MSB

130

Unsigned Modified Booth MultiplierXA

B

P

SEL SEL SEL SEL SELSEL_

END

A0 0A1A2A3A4

SEL SEL SEL

A0 0A1A2

FA FA FA

SEL SEL

A0 0A1

SEL SEL SEL SEL SEL

A31 A30 A29 A28 A27 A26

FA FA FAFAHA

1

SEL SEL SEL

A31 A30 A29 A28

1

SEL

A31 A30

FA HA HA HA HA HA HA

S[0]0

P0P1P2P3

S[2]

P4P63 P62 P61 P5

S[i]

Booth

Encoder

Booth

Encoder

Booth

Encoder

Booth

Encoder

B[1:0]0

B[3:1]

B[5:3]

B[i+1, I, i-1]

X1[0]

X2[0]

S[0]

X1[1]

X2[1]

S[1]

S[1]

X1[2]

X2[2]

S[2]

X1[i]

X2[i]

S [i]

One stage

32*32-Bit Modified Booth Multiplier for Unsigned Number

0

SEL_

END

SEL_

END

SEL_

END

SEL_

END

HA

1

S[1]

FA

SEL_

END

S[2]

FA FA

SEL

A0 0

SEL SEL

A31 A30 A29

FAHA

1S16

Booth

Encoder

00B[31]

X1[16]

X2[16]

S[16]SEL_

ENDSEL_

END

FA

P6

FA

SEL

A1

P32P33P34P35 P31

64-bit carry look ahead adder

STAGE 3 TO 15

(Each stage includes 33 PP selectors, 32 full adders ,1 half adder and 1 NOT gate)

S[0]

131

Wallace Tree multipliersX

A

B

P

32 partial products added in Wallace Tree Adder

64-bit Carry Look-ahead Adder

A[31:0] B[31:0]

C[63:0] S[63:0]

P[63:0]

132

Wallace Tree multipliers

............................................................................................................................ ... ........................................................................................................................ ....................................................... .................................................... . ................................................... . .................................................. ................................................. ..............................................................................................……......................................…….....................................…….............................................................................………............................………….......................………….....................………….........................…………..................……………..............……............... ......... ........... ....

................................................................. .......................................................... .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... ...............................................…................................................................................................……......................................……......................................……...................................………………….................

............................................................................................................................ .. .............................................................................................................................................................................…...................................................……..............................................……....................................... ....….................................…………………............................ ...................... ........................

............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . ........................

............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... .

............................................................................................................................ .. ......................................................................................................................... ..............................................

............................................................................................................................ ...................................................................................................

................................................................................................................................. .

................................................................. ............................................................... .............................................................. ............................................................. ............................................................ ........................................................... .......................................................... ......................................................... ........................................................ ....................................................... ...................................................... ..................................................... .................................................... ................................................... ................................................. .................................................................................................................................................. .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ....................................... ..................................................................

................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ....................................................................

1

2

3

4

5

6

7

8

XA

B

P

• Use the 3:2 counters

and 2:2 counters

• Number of levels of

= log (32/2) / log (3/2)

≈8

• Irregular structure

• Fast

3:2 counter

....Carry

Sum

..

...

Carry

Sum

2:2 counter

Input:

Output:

133

Wallace Tree multipliersX

A

B

P

Carry Propagate/Generate unit

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

8-Bit BCLA

64-Bit Summation Unit

8-Bit BCLA

B63

P63-P56G63-G56

P7-P0G7-G0

..................................

C7-C0C63-C56

Cin

C8PM1GM1

PM0GM0

C16PM2GM2

C24PM3GM3

C40PM5GM5

PM4GM4

C48PM6GM6

C56PM7GM7

64-Bit Carry Look Ahead Adder

B0 A63 .................................. A0

P63 .................................. P0 G63 .................................. G0

.....................................................................................

C55-C48 C47-C40 C39-C32 C31-C24 C23-C16 C15-C8

P63 .................................. P0 C63 .................................. C0

.......................................................................................S63 S0C64

2-level hierarchical

134

Modified Booth-Wallace Tree MultipliersX

A

B

P

135

Modified Booth-Wallace Tree MultipliersXA

B

P

• Use the 3:2 counters

and 2:2 counters

• Number of levels of

= log (16/2) / log (3/2)

≈6

• Irregular structure

• Fast

• Less area

Rearrage

1

2

3

4

5

6

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

PP Dot Matrix of Booth-Wallace

Multiplier for Signed Number

............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

................................................

........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

.........................................

......................................................................................................................................................................................................................................................................................................................................................................................................

..........................................................

....................................................................................................................................................................................................................................................................

..

...................................................................................................................................................................................................

..................................................................................................................................

136

Twin pipe serial-parallel multipliersXA

B

P

Parallel in – serial out

shift registers

Parallel in – serial out

shift registers

32-bit twin pipe serial-parallel

multiplier unitB31 B29 …… B3 B1

B30 B28 …… B2 B0

Load/Shift

Reset

Clock

Block diagram of 32*32-bit signed twin pipe serial-parallel

multiplier with serial/parallel conversion logic

Serial in – parallel out

shift registers

Serial in – parallel out

shift registers

P62 P60 ……………………… P2 P0

P63 P61 ……………………… P3 P1

Result_ready

A31 A30 …………………… A1 A0

Sign

137

Signed twin pipe serial-parallel

multipliersX

A

B

P

FA

D D

D

FA

FA

DD D

D

FA

A31 A30

A31

Even data bits on rising clock

Odd data bits on rising clock

…... B2 B0 0 0 reset

Clock

Reset

FA

DD

D

D

FA

A0

HA

D

D

HAD

0

MUX

1

Product

Even

product

Odd

product

D

D

falling_edge

rising_edge

Clock

…... B3 B1 0 0 reset

32*32-bit twin pipe serial-parallel multiplier

for signed number

Repeat 28 units more

Sign

B31 B29 …...

A30 A0

D

“Sign” control line and the sign-change hardware

138

Unsigned twin pipe serial-parallel

multipliersX

A

B

P

HA

D D

D

HA

FA

DD D

D

FA

A31 A30

A31 A30

Even data bits on rising clock

Odd data bits on rising clock

…... B2 B0 0 0 reset

Clock

Reset

FA

DD

D

D

FA

A0

A0

HA

D

D

HAD

0

MUX

1

Product

Even

product

Odd

product

D

D

falling_edge

rising_edge

Clock

…... B3 B1 0 0 reset

32*32 bit twin pipe serial-parallel multiplier

for unsigned number

Repeat 28 units more

• Don’t need the “Sign” control line and the sign-change hardware

top related