integer multipliers - concordia university
TRANSCRIPT
1
Integer Multipliers
2
Multipliers
• A must have circuit in most DSP applications
• A variety of multipliers exists that can be chosen
based on their performance
• Serial, Serial/Parallel,Shift and Add, Array, Booth,
Wallace Tree,….
XA
B
P
3
16x16
multiplier
converter
Converter
RB
rese
t
en
converter
RC
en
rese
tRA
rese
t
en
XA
B
P
4
Multiplication Algorithm
Yn-1X0 Yn-2X0 Yn-3X0 …… Y1X0 Y0X0
Yn-1X1 Yn-2X1 Yn-3X1 …… Y1X1 Y0X1
Yn-1X2 Yn-2X2 Yn-3X2 …… Y1X2 Y0X2
… … … …
…. …. …. …. ….
Yn-1Xn-2 Yn-2X0 n-2 Yn-3X n-2 …… Y1Xn-2 Y0Xn-2
Yn-1Xn-1 Yn-2X0n-1 Yn-3Xn-1 …… Y1Xn-1 Y0Xn-1
-----------------------------------------------------------------------------------------------------------------------------------------
P2n-1 P2n-2 P2n-3 P2 P1 P0
X= Xn-1 Xn-2 ………..……X0 Multiplicand
Y=Yn-1 Yn-2……………….Y0 Multiplier
XA
B
P
5
1. Multiplication AlgorithmsImplementation of multiplication of binary numbers boils down to how to do the additions.
Consider the two 8 bit numbers A and B to generate the 16 bit product P. First generate the 64
partial Products and then add them up.
6
MU( Multiplier Unit)
REG
IN
REG
OUT
Control Unit
Storage
Multiplier DesignXA
B
P
7
1-bit
REG
+
G2
G1
0 00
Serial Register
qd
Reset=0
x0
y0
x0y
0
0
0
1
x0y
0
0
CLK CLK/(N+1)
CLK
0
0
Slide 1
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
XA
B
P
Serial Multiplier
8
1-bit
REG
+
G2
G1
S0
00
Serial Register
qd
Reset=0
x1
y0
x1y
0
0
0
1
x1y
0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result
Slide 2
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
9
1-bit
REG
+
G2
G1
x1y
00S
0
Serial Register
qd
Reset=0
x2
y0
x2y
0
0
0
1
x2y
0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result
Slide 3
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
10
1-bit
REG
+
G2
G1
x2y
0S
0x
1y
0
Serial Register
qd
Reset=0
x3
y0
x3y
0
0
0
1
x3y
0
0
CLK CLK/(N+1)
CLK
0
0
Si: the ith bit of the final result
Slide 4
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
11
1-bit
REG
+
G2
G1
x3y
0x
1y
0x
2y
0
Serial Register
qd
Reset=1
00 0
S0
0
0
0
0
CLK CLK/(N+1)
CLK
S0
0
Si: the ith bit of the final result
Slide 5
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
12
1-bit
REG
+
G2
G1
0 x2y
0x
3y
0
Serial Register
qd
Reset=0
x0
y1
x0y
1
x1y
0
0
1
S1
C1
CLK CLK/(N+1)
CLK
x1y
0
x1y
0
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 6
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
13
1-bit
REG
+
G2
G1
S1
x3y
00
Serial Register
qd
Reset=0
x1
y1
x1y
1
x2y
0
1
S20
C1
CLK CLK/(N+1)
CLK
x2y
0
x2y
0
C20
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 7
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
14
1-bit
REG
+
G2
G1
S20 0S
1
Serial Register
qd
Reset=0
x2
y1
x2y
1
x3y
0
1
S30
C20
CLK CLK/(N+1)
CLK
x3y
0
x3y
0
C30
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 8
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
15
1-bit
REG
+
G2
G1
S30 S
1S
20
Serial Register
qd
Reset=0
x3
y1
x3y
1
0
1
S40
C30
CLK CLK/(N+1)
CLK
0
0C
40
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 9
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
16
1-bit
REG
+
G2
G1
S40 S
20S
30
Serial Register
qd
Reset=1
00
0
S1
0
S50
C40
CLK CLK/(N+1)
CLK
S1
0
S0
C50=0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 10
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
17
1-bit
REG
+
G2
G1
S50 S
30S
40
Serial Register
qd
Reset=0
x0
y2
x0y
2
S20
1
S2
0
CLK CLK/(N+1)
CLK
S20
C21
S20
S1
S0
Slide 11
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
18
1-bit
REG
+
G2
G1
S2
S40S
50
Serial Register
qd
Reset=0
x1
y2
x1y
2
S30
1
S31
CLK CLK/(N+1)
CLK
S30
C21
S30
C31
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 12
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
19
1-bit
REG
+
G2
G1
S31 S
50S
2
Serial Register
qd
Reset=0
x2
y2
x2y
2
S40
1
S41
CLK CLK/(N+1)
CLK
S40
C31
S40
C41
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 13
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
20
1-bit
REG
+
G2
G1
S41 S
2S
31
Serial Register
qd
Reset=0
x3
y2
x3y
2
S50
1
S51
CLK CLK/(N+1)
CLK
S50
C41
S50
C51
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 14
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
21
1-bit
REG
+
G2
G1
S51 S
31S
41
Serial Register
qd
Reset=1
00
0
S2
0
S60
CLK CLK/(N+1)
CLK
S2
C51
0
S1
S0
C60=0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 15
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
22
1-bit
REG
+
G2
G1
S60 S
41S
51
Serial Register
qd
Reset=0
x0
y3
x0y
3
S31
1
S3
CLK CLK/(N+1)
CLK
S31
C32
0
S31
S2
S0
S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
Slide 16
23
1-bit
REG
+
G2
G1
S3
S51S
60
Serial Register
qd
Reset=0
x1
y3
x1y
3
S41
1
S4
CLK CLK/(N+1)
CLK
S41
C32
S41
C42
S2
S0
S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 17
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
24
1-bit
REG
+
G2
G1
S4
S60S
3
Serial Register
qd
Reset=0
x2
y3
x2y
3
S51
1
S5
CLK CLK/(N+1)
CLK
S51
C42
S51
C52
S2
S0
S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 18
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
25
1-bit
REG
+
G2
G1
S5
S3
S4
Serial Register
qd
Reset=0
x3
y3
x3y
3
S60
1
S6
CLK CLK/(N+1)
CLK
S60
C52
S60
C61
S2
S0
S1
Slide 19
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
26
1-bit
REG
+
G2
G1
S6
S4
S5
Serial Register
qd
Reset=1
00
0
S3
0
S7
CLK CLK/(N+1)
CLK
S3
C61
0
0
S2
S0
S1
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 20
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
27
1-bit
REG
+
G2
G1
S7
S5
S6
Serial Register
qd
Reset=0
00
0
1
CLK CLK/(N+1)
CLK
S4
0
S3
S0
S1
S2
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 21
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
28
1-bit
REG
+
G2
G1
S7
S5
S6
Serial Register
qd
Reset=0
00
0
1
CLK CLK/(N+1)
CLK
S4
0
S3
S0
S1
S2
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 21
X: x3x2x1x0 Y:y 3y2y1y0
Input Sequence for G1:
00x3x2x1x00 x3x2x1x0 0x3x2x1x0 0x3x2x1x0
00y 3y3y3y3 0y 2y2y2y2 0y 1y1y1y1 0y 0y0y0y0
Reset:010000100001000010000
29
D
D DD
DD
+ ++
y0
y3
y2
y1
x0
S0
0
000
000
00
S0
S0S
0 S0
Si: the ith bit of the final result
Slide 1
XA
B
P
Serial / Parallel
Multiplier
30
D
D DD
DD
+ ++
y0
y3
y2
y1
x1
x1y
0
x0
000
00x0y
1
00
S1
C1
S1
S1 S
1S
0
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 2
XA
B
P
31
D
D DD
DD
+ ++
y0
y3
y2
y1
x2
x2y
0
x1
00C1
0x0y
2x
1y
1
0
S20
C20
S2
S2
x0
C21
S2
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 3
XA
B
P
32
D
D DD
DD
+ ++
y0
y3
y2
y1
x3
x3y
0
x2
0
x0y
3x
1y
2x
2y
1
x0
S30
C20
S31 S
3
x1
S3
S2
S1
S0
C21
C30 C
31 C
32
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 4
XA
B
P
33
D
D DD
DD
+ ++
y0
y3
y2
y1
0
x3
x1y
3x2y
2x
3y
1
x1
S40
C30
S41 S
4
x2
C31
C40 C
41
C32
0
S4
S3
S2
S1
S0
C42
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 5
XA
B
P
34
D
D DD
DD
+ ++
y0
y3
y2
y1
0x
2y
3x3y
2
x2
C40
C40
S51 S
5
x3
S5
S4
S3
S2
S1
S0
C41
C50
C42
0
C510
0 0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 6
XA
B
P
35
D
D DD
DD
+ ++
y0
y3
y2
y1
0x
3y
30
x3
0
0
C50 S
6
0
C50
0
C51
0
C6
0
0 0
S6
S5
S4
S3
S2
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Sij: the jth partial sum for column i
Cij: the jth partial carry from column i
Slide 7
XA
B
P
36
D
D DD
DD
+ ++
y0
y3
y2
y1
000
0
0
0
0 S7
0
0
0
C6
0
00
0 0
S7
S6
S5
S4
S3
S2
S1
S0
Si: the ith bit of the final result
Ci: the only carry from column i
Slide 8
XA
B
P
37
8 bit Adder
MUX
0
INPUT Ain (7 downto 0)
REGA
Result (7 downto 0)Result (15 downto 8)
INPUT Bin (7 downto 0)
CLOCK
REGBREGC
Shift AND Add Multiplier X
A
B
P
38
Synchronous Shift and Add Multiplier
controller
Multiplication process:
5 states: Idle, Init, Test, Add, and Shift&Count.
Idle: Starts by receiving the Start signal;
Init: Multiplicand and multiplier are loaded into a load
register and a shift register, respectively;
Test: The LSB in the shift register which contains the
multiplier is tested to decide the next state;
XA
B
P
39
Synchronous Shift and Add Multiplier ControllerDesign
Add: If LSB is ‘1’, then next state is to add the new partial product to the
accumulation result, and the state machine transits to shift&count state ;
Shift&Count: If LSB is ‘0’, then the two shift register shift their contains
one bit right, and the counter counts up by one step. After that, the state
machine transits back to test state;
When the counter reaches to N , a Stop signal is asserted and the state
machine goes to the idle state;
Idle: In the idle state, a Done signal is asserted to indicate the end of
multiplication.
XA
B
P
40
Multiplicand
n-bit AdderShift and Add
Control Logic
An-1
A0
A1
An ...C
Multiplier
Qn-1
Q0
Q1
Qn ...
Shift Right
Add
Slide 1
n-bit Multiplier:
Q0=1: Multiplicand is added to register A; the result is stored in register A; registers C, A, Q are shifted to the right
one bit
Q0=0: Registers C, A, Q are shifted to the right one bit
41
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right
Add
Multiplicand
1 0 1 1
0 000 1 101
Slide 2
Example: 4-bit Multiplier
Initial Values XA
B
P
42
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
1 110 1 101
Slide 3
Example: 4-bit Multiplier
First Cycle--Add
XA
B
P
43
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
0 101 1 011
Slide 4
Example: 4-bit Multiplier
First Cycle--Shift XA
B
P
44
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
1 1110 010
Slide 5
Example: 4-bit Multiplier
Second Cycle--Shift
XA
B
P
45
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
1 101 1 111
Slide 6
Example: 4-bit Multiplier
Third Cycle--Add XA
B
P
46
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
0 011 1 111
Slide 7
Example: 4-bit Multiplier
Third Cycle--Shift
XA
B
P
47
4-bit AdderShift and Add
Control Logic
1
Multiplier
Shift Right=0
Add=1
Multiplicand
1 0 1 1
0 100 1 111
Slide 8
Example: 4-bit Multiplier
Fourth Cycle--Add
XA
B
P
48
4-bit AdderShift and Add
Control Logic
0
Multiplier
Shift Right=1
Add=0
Multiplicand
1 0 1 1
1 000 1 111
Slide 9
Example: 4-bit Multiplier
Fourth Cycle--Shift
XA
B
P
49
4*4 Synchronous Shift and Add Multiplier DesignLayout Design
Floor plan of the 4*4 Synchronous
Shift and Add Multiplier
XA
B
P
50
Comparison between Synchronous and Asynchronous
Approaches
.
XA
B
P
51
Example : (simulated by Ovais Ahmed)
Multiplicand = 100010012 = 8916
Multiplier = 101010112 = AB16
Expected Result = 1011011100000112 =5B8316
XA
B
P
52
Regular structure based on add and shift algorithm.
Addition is mainly done by carry save algorithm.
Sign bit extension results in a higher capacitive load and
slows down the speed of the circuit.
Array MultiplierX
A
B
P
53
Addition with CLA
a0
a1
a2
a3
Four-bit Adder
a0
a1
a2
a3
a0
a1
a2
a3
Four-bit Adder
a0
a1
a2
a3
Four-bit Adder
b0
b1
b2
b3
Cin
Ci
n
Cin
Cout
Cout
Cout
0
0
0
0
Product (A*B)
A = a3a
2a
1a
0
B = b3b
2b
1b
0
XA
B
P
54
Array Multiplier with CSA
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
F.A
Ci
Si
P00
P10
P01
P11
P02
P12
P03
0 0 0
P20
P21
P22
P13
P30
P31
P32
P23
0P33
R0
R1
R2R
3R
4R
5R6
R7
Total of 16
gates
A0
A1
A2
A3
B0
B1
B2
B3
Pij
Aj
Bi
30
30
j
i
**Pij =A
i B
j
XA
B
P
55
Critical Path with Array Multipliers
HAFAFA FA
HAFAFA FA
HAFAFA FA
Two of the possible paths for the Ripple-Carry based 4*4 Multiplier
Area = (N*N) AND Gate + (N-1)N Full-Adder
Delay = τ HA + (2N-1) τFA
XA
B
P
56
XA
B
P
57
x0y
0
x1y
0
x0y
1
x3y
0
x2y
1
x1y
2
x0y
3
x2y
0
x1y
1
x0y
2
x4y
0
x3y
1
x2y
2
x1y
3
x0y
4
x4y
1
x3y
2
x2y
3
x1y
4
x4y
2
x3y
3
x2y
4
x4y
3
x3y
4
x4y
4
P1
P2
P3
P4
P5
P6
P7
P8
P9
P0
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
Wallace Tree
XA
B
P
58
Array Multiplier + Wallace Tree
XA
B
P
599/25/2017 Concordia VLSI Lab 59
Convert negative partial products to positive representation
• No sign-extension required
)2*2*(*)2*2*(*2
0
1
1
2
0
1
1
ik
i
i
k
k
ik
i
i
k
k yyxxYX
ikk
i
ikik
k
i
ikji
k
j
ji
k
i
kkk xyyxyxyx
1
2
0
11
2
0
1
2
0
2
0
2211 2*2*)2*2**(
XA
B
P
Baugh-Wooley Algorithm
609/25/2017 Concordia VLSI Lab 60
examples of 5-by-5 Baugh-Wooley
FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA FAFA
1
P0
a4b0' a3b0a1b0a2b0 a0b0
P9 P8 P7 P6 P5 P4 P3 P2 P1
0 000
a0b1
a3b1 a2b1a1b1
a0b2a3b2 a2b2 a1b2
a4b3'
a4b2'
a4b1'
a4' b4'
a0b3a3b3 a2b3 a1b3
a0'b4a4b4
a3'b4 a2'b4 a1'b4
a4
b4
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
XA
B
P
61
a7 a6 a5 a4 a3 a2 a1 a0
* a7 a6 a5 a4 a3 a2 a1 a0
--------
-----
-------
------
--------
-----
-------
------
--------
-----
--------
-----
-------
------
--------
-----
--------
-----
-------
------
--------
-----
-------
------
--------
-----
-------
------
--------
-----
a7*a0 a6*a0
a5*a
0 a4*a0
a3*a
0 a2*a0
a1*a
0 a0*a0
a7*a
1 a6*a1 a5*a1
a4*a
1 a3*a1
a2*a
1 a1*a1
a0*a
1
a7*a2
a6*a
2 a5*a2 a4*a2
a3*a
2 a2*a2
a1*a
2 a0*a2
a7*a3 a6*a3
a5*a
3 a4*a3 a3*a3
a2*a
3 a1*a3
a0*a
3
a7*a
4 a6*a4 a5*a4
a4*a
4 a3*a4 a2*a4
a1*a
4 a0*a4
a7*a5
a6*a
5 a5*a5 a4*a5
a3*a
5 a2*a5 a1*a5
a0*a
5
a7*a
6 a6*a6
a5*a
6 a4*a6 a3*a6
a2*a
6 a1*a6 a0*a6
a7*a7
a6*a
7 a5*a7
a4*a
7 a3*a7 a2*a7
a1*a
7 a0*a7
--------
-----
-------
------
--------
-----
-------
------
--------
-----
--------
-----
-------
------
--------
-----
--------
-----
-------
------
--------
-----
-------
------
--------
-----
-------
------
--------
-----
a7*a6
a7*a
5 a7*a4
a7*a
3 a7*a2 a7*a1
a7*a
0 a6*a0 a5*a0
a4*a
0 a3*a0
a2*a
0 a1*a0 ‘0' a0
XA
B
P
Squarer using Baugh-Wooley Algorithm
62
a1a0a1
‘0’
a2a0
‘0’
‘0’
‘0’
‘0’
a5a0
a4a1
a3a2
a5a1
a4a2a6a0
a6a1
a5a2a7a0
a6a2
a5a3a7a1
a3a1
a4a0
a2a1
a2a3a0‘0’ a0
‘0’a3a3a4
a4
a6a3
a5a4a7a2
a5
a6a4
a7a3
a6a5
a6a7a4
a7a5
a7
a7a6
S0S1S2S4S5S6S7S8S9S10S11S12S13S14S15 S3
Example of an 8bit squarerX
A
B
P
63
Array Multiplier
32bits by 32bits multiplier
XA
B
P
64
Booth (Radix-4) Multiplier
Radix-4 (3 bit recoding) reduces number of partial products to be
added by half.
Great saving in area and increased speed.
A = -an-12n-1 + an-22
n-2 + an-32n-3 + …. + a12 + a0
B = -bn-12n-1 + bn-22
n-2 + bn-32n-3 + …. + b12 + b0
· Base 4 redundant sign digit representation of B is
(n/2) - 1
B = 22i Kii = 0
XA
B
P
65
Ki is calculated by following equation
Ki = -2b2i+1 + b2i + b2i-1 i = 0,1,2,….(n-2)/2
3 bits of Multiplier B, b2i+1, b2i, b2i-1, are examined and
corresponding Ki is calculated.
B is always appended on the right with zero (b-1 = 0), and n is
always even (B is sign extended if needed).
The product AB is then obtained by adding n/2 partial products.
(n/2) - 1
AB = P = 22i Ki Ai = 0
66
Booth Algorithm
Decoding of multiplier to generate signals for hardware use
Xi+1 Xi Xi-1 OP NEG ZERO TWO
0 0 0 0 0 1 0
1 0 0 2 1 0 1
0 1 0 1 0 0 0
1 1 0 1 1 0 0
0 0 1 1 0 0 0
1 0 1 1 1 0 0
0 1 1 2 0 0 1
1 1 1 0 1 1 0
XA
B
P
67
Booth Algorithm
A Booth recoded multiplier examines
Three bits of the multiplicand at a time
It determine whether to add zero, 1, -1, 2, or -2 of that rank of
the multiplicand.
The operation to be performed is based on the current two bits
of the multiplicand and the previous bitXi+1 X Xi-1 Zi/2
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 2
1 0 0 -2
1 0 1 -1
1 1 0 -1
1 1 1 0
XA
B
P
68
BIT M is
21 20 2-1OPERATION
multiplied
Xi Xi+1 Xi+2
by
0 0 0 add zero (no string) +0
0 0 1 add multipleic (end of string) +X
0 1 0 add multiplic. (a string) +X
0 1 1 add twice the mul. (end of string) +2X
1 0 0 sub. twice the m. (beg. of string) -2X
1 0 1 sub. the m. (-2X and +X) -X
1 1 0 sub . the m. (beg. of string) -X
1 1 1 sub. zero (center of string) -0
69
Booth Algorithm- dot notation
Multiplicand A = ● ● ● ●Multiplier B = (●●)(●●)
Partial product bits ● ● ● ● (B1B0)2A40
Partial product bits ● ● ● ● (B3B2)A41
Product P = ● ● ● ● ● ● ● ●
XA
B
P
70
The following example is used to show how the calculation is done properly.
Multiplicand X = 000011
Multiplier Y = 011101 0 1 1 1 0 1 0
After booth decoding, Y is decoded as to multiply X by +2, -1, +1 separately, then shift the partial
product two bits and add them together.
X* +1 000000000011
X* -1 1111111101
X* +2 00000110
--------------------------------------------
000001010111
Example
Added to
the
multiplier
XA
B
P
71
Sign Extension
XA
B
P
729/25/2017 Concordia VLSI Lab 72
Sign extension
Traditional sign-extension scheme
• Segment the input operands based on the size of
embedded blocks
• Multiply the segmented inputs and extend the sign bit of
each partial products
• Sum all partial products
Segmented input
operands
Sign extension
×
+
Final result
partial
products
Sign
XA
B
P
73
Booth Algorithm-Example 1
Example 1:
011101 (+29)
000011 (+3)
0
+2 -1 +1
000000000011111111110100000110
0000010101111 (+87)
XA
B
P
74
Booth Algorithm Example 2
011101 (+29)
111101 (-3)
0
+2 -1 +1
111111111101000000001111111010
1111101010011
2s complement of
multiplicand
(-87)
Notice sign
extensions
XA
B
P
75
Booth Algorithm-Example 3
100011 (-29)
111101 (-3)
0
-2 +1 -1
000000000011111111110100000110
0000010101111
Shifted 2s
complement
(+87)
Notice the sign
extensions
XA
B
P
76
Comparison of Booth and parallel
multiplier shift and Add
XA
B
P
77
Please note that each operand is 17 bit ie. the 17th bit is the sign bit. Also
negative numbers are entered as 1’s complement, this is why you need to
add the S in the right hand side of the diagram. If you use 2’complement
then the S’s on right side of the diagram can be removed
Template to reduce sign extensions for Booth
Algorithm
78
Comparison of Template and the sign
extension
S1S
1S
1S1
S1S
1S
1
S2S
2S
2S2
S2
S3S
3S
3
S4
B
A
P
S1S
1S
1
S21
S3
B
A
P
Sign template Sign extension
S1S
1S
1S1
S1S
1S
1
S2S
2S
2S2
S2
S3S
3S
3
S4
B
A
P
S1S
1S
1S1
S1S
1S
1
S2S
2S
2S2
S2
S3S
3S
3
S4
B
A
P
S1S
1S
1
S21
S3
B
A
P
S1S
1S
1
S21
S3
B
A
P
Sign template Sign extension
XA
B
P
79
Using the Template 25 * -35
Sign bit
0 0 0 1 1 0 0 1
Add SS 1 1 0 1 1 1 0 1 0
Add inverted S
Add Inverted sign and add 1
1 0 0 0 0 0 1 1 0 0 1 * 1
Add Inverted sign bit 1 0 1 1 1 0 0 1 1 1 * -1
1 0 0 1 1 0 0 1 0 * 2
No sign bit 1 1 0 0 1 1 1 * -1
1 1 1 1 0 0 1 0 0 1 0 1 0 1
This is a –ve number. Convert it
0 0 0 0 1 1 0 1 1 0 1 0 1 1
512 256 64 32 8 2 1 = 875
Example of using the template25 * - 35 with -35 as the multiplier. Using 8 bit representation
XA
B
P
80
Booth Multiplier Components
Multiplier
Multiplicand
Booth Encoder
PPU (Partial products unit)
PPA(Partial products adding unit)
Product
XA
B
P
81
+ + + + + + + + + + +
+ + + + + + + + +
+
P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16
0
+++++++++++++++
+ + + + + + +
+ + + + +
0
Partial Product PP0,PP1,PP2(15 downto 0)
Partial Product PP3(15 downto 0)
Ripple Carry Adder
Critical Path
Pipeline Register
+ + + + + + + + + + +
+ + + + + + + + +
+
P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14P15P16
0
+++++++++++++++
+ + + + + + +
+ + + + +
0
Partial Product PP0,PP1,PP2(15 downto 0)
Partial Product PP3(15 downto 0)
Ripple Carry Adder
Critical Path
Pipeline Register
Wallace Tree and Ripple Carry Adder Structure.
Of 8*8 multiplier With PipelineX
A
B
P
82
Mulbegin
Stop
A3bit
CLK
Shift
Mux11
Init
Mulend
FSMCLR
Mux12
Mux0
X
SH
LD
D
CLK
CLR
Q16 32
reg_2left32
A
B
Sum
Cout
Cin
37
37
37
Adder
37
1
0
A 37
B 37Y 37
Sel
Mux37
D 37 Q 37
CLK
CLRRegister37
FinishCLK
CLRCounter20
StartMulbegin
CLK
A
CLK
Start
Doubleshift
Init
Start
Stop
QA(0-2)
CLK
Doubleshift
Mux11
Init
Mulend
CLK Finish
Start
Result
Start
Mux0
Start
not used
Start
B
Init
Shift
CLK
Mulend
SH
LD
D
CLK
CLR
Q16 17
reg2right17
=0; A16=0
=1, A16=1
F17
endcheck
Start
B
Init
Shift
CLK
2s
complement
SH
LD
D
CLK
CLR
Q16 32
reg_2left32
SH
LD
D
CLK
CLR
Q16 32
reg_2left32
*2 (shifter)
*2 (shifter)
11
10
01
00
A 32
B 32
Y32
ctrl1
mux4-32
ctrl0
C 32
D 32
Mul11 Mul12
sign
expansion
5
Mux12
Mux0
Hardware implementation of
Booth with shift and addX
A
B
P
83
Simulation PlanX
A
B
P
32-bit Signal
Generator A
32-bit Signal
Generator B
Behavioral Multiplier
A * B
64-bit
Comparator
A[31:0]
Result
Failed
Number
P[63:0]
B[31:0]
My_P[63:0]
My Multiplier
Array MultiplierModified Booth
Multiplier
Wallace Tree
Multiplier
Modified Booth-
Wallace Tree
Multiplier
Twin Pipe
Serial-Parallel
Multiplier
84
Testing the DesignX
A
B
P
85
Simulation For Parallel MultipliersXA
B
P
Signed
Number:
Unsigned
Number:
86
Simulation For Signed S/P MultipliersX
A
B
P
There are 340 ns
delay between the
result and the
operators because
of the D flip-flops
delay.
87
FPGA after implementation, areas of
programming shown clearlyX
A
B
P
88
Another implementation of the above after pipelining, the place and
rout has paced the design in different places.
XA
B
P
89
Spartacus FPGA board
XA
B
P
90
Testing the multiplication system
XA
B
P
91
Comparison of MultipliersXA
B
P
Table 7. Performance comparison for two’s complement multipliers By Chen Yaoquan, M.Eng. 2005
Array
Multiplier
Modified Booth
Multiplier
Wallace-Tree
Multiplier
Modified Booth-
Wallace
Tree
Multiplier
Twin Pipe
Serial-
Parallel
Multiplier
Behavioral
Multiplier
Area – Total CLB’s
(#)3076.50 2649.50 3325.50 2672.50 490.00 2993.50
Maximum Delay
D(ns)35.78 24.43 18.93 18.53 107.52 (3.36x32) 49.33
Total Dynamic
Power P (W)7.52 6.33 7.46 6.41 0.28 6.24
Delay ·Power
Product (DP)
(ns W)268.98 154.64 141.14 118.76 30.62 307.58
Area•Power
Product (AP)
(# W)23128.20 16771.60 24793.93 17127.79 139.54 18665.07
Area•Delay
Product (AD)
(# ns)1.10E+05 6.47E+04 6.30E+04 4.95E+04 5.27E+04 1.48E+05
Area•Delay2
Product
(AD2)
(# ns2)
3.94E+06 1.58E+06 1.19E+06 9.18E+05 5.66E+06 7.28E+06
92
Comparison of MultipliersXA
B
P
Table 7. Performance comparison for Unsigned multipliers By Chen Yaoquan, M.Eng. 2005
Array
Multiplier
Modified Booth
Multiplier
Wallace-Tree
Multiplier
Modified
Booth-
Wallace
Tree
Multiplier
Twin Pipe
Serial-
Parallel
Multiplier
Behavioral
Multiplier
Area – Total CLB’s
(#)3280.50 2800.00 3321.50 2845.50 487.00 3003.00
Maximum Delay
D(ns)37.23 25.33 18.93 18.33 107.52 44.50
Total Dynamic
Power P (W)7.57 6.66 7.32 6.66 0.29 6.26
Delay ·Power
Product (DP)
(ns W)281.88 168.77 138.60 122.13 30.66 278.53
Area•Power
Product (AP)
(# W)24837.98 18656.40 24319.36 18959.57 138.89 18795.78
Area•Delay
Product (AD)
(# ns)1.22E+05 7.09E+04 6.29E+04 5.22E+04 5.24E+04 1.34E+05
Area•Delay2
Product
(AD2)
(# ns2)
4.55E+06 1.80E+06 1.19E+06 9.56E+05 5.63E+06 5.95E+06
93
Comparison of MultipliersXA
B
P
The relation of Area
and Delay for
behavioral
multiplier --
"banana curve" 2950
3000
3050
3100
3150
3200
3250
0 20 40 60 80
Area (#)
Delay (ns)
Series1
Change the value of “set_max_delay” in Script file (ns)
0 10 20 30 40 50 60 >60
Area(#) 3014.
5
3013.
0
3110.
0
3193.
5
3019.
5
2999.
5
2978.
5
2978.
5
Power(w) 6.649
9
6.647
0
7.568
3
8.187
8
8.064
5
8.041
9
8.015
6
8.015
6
Delay(n
s)31.98 31.98 30.93 30.08 39.93 49.88 59.63 59.63
94
Comparison of MultipliersXA
B
P
By Chen Yaoquan, M.Eng. 2005
Array
Multiplier
Modified
Booth
Multiplier
Wallace-
Tree
Multiplier
Modified
Booth-
Wallace
Tree
Multiplier
Twin Pipe
Serial-
Parallel
Multiplier
Behavioral
Multiplier
Area Medium Small Large Small Smallest Medium
Critical Delay Medium Fast Very Fast Fastest Very Large Large
Power
ConsumptionLarge Medium Large Medium Smallest Medium
Complexity Simple ComplexMore
Complex
More
ComplexSimple Simplest
Implement Easy Medium Difficut Difficut Easy Easiest
95
Pipelining SimulationX
A
B
P
96
Synthesis for Signed MultipliersXA
B
P
ArrayModified Booth
Wallace Tree
Modified Booth
-Wallace TreeTwin Pipe S/P Behavioral
97
Synthesis for Unsigned MultipliersXA
B
P
ArrayModified Booth
Wallace Tree
Modified Booth
-Wallace TreeTwin Pipe S/P Behavioral
98
Conclusion X
A
B
P
• Modified Booth and Wallace Tree are the best
techniques for high speed multiplication.
• Wallace Tree has the best performance, but it is
hard to implement.
• Booth algorithm based multipliers have lower area
among parallel multipliers.
• For behavioral multipliers, the area will increase
while the delay decreases.
99
ComparisonArray
Multiplier
Modified
Booth
Multiplier
Wallace Tree
Multiplier
Modified Booth
& Wallace Tree
Multiplier
Twin Pipe Serial-
Parallel
MultiplierArea – Total
CLB’s (#)
1165 1292 1659 1239 133
Maximum Delay
(ns) 187.87ns 139.41ns 101.14ns 101.43ns
22.58ns
(722.56ns)
Power
Consumption at
highest speed
(mW)
16.6506m
W
(at 188ns)
23.136mW
(at 140ns)
30.95mW
(at 101.14ns)
30.862mW
(at 101.43ns)
2.089mW
(at 722.56ns)
Delay Power
Product (DP)
(ns mW)
3128.15 3225.39 3130.28 3130.33 1509.42
Area Power
Product (AP)
(# mW)
19.397 x
103
29.891 x 103 51.346 x 103 38.238 x 103 277.837
Area Delay
Product (AD)
(# ns)
218.868 x
103
180.118 x
103
167.791 x 103 125.671 x 103 96.101 x 103
Area Delay2
Product(AD2)
(# ns2)
41.119 x
106
25.110 x 106 16.970 x 106 12.747 x 106 69.438 x 106
XA
B
P
100
NOTICE
The rest of these slides are for extra information only
and are not part of the lecture
XA
B
P
101
Array Addition
102
Addition
of 8
binary
numbers
using the
Wallace
tree
principal
103
104
105
A
RESET
Done
RESULT 32 Adder37
37
REGSTER37
D
CLK
CLR
LAST_RESULT 37
B BEGIN0 CLK
MULT320
37
COUNTER20 INVERTER
AND_2
FINISH0 END0
START
Q
CLR
106
Baugh-Wooley two's complement
multiplier:
•
FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA
FAFA FA FA FAFA
1
P0
a4b0' a3b0a1b0a2b0 a0b0
P9 P8 P7 P6 P5 P4 P3 P2 P1
0 000
a0b1
a3b1 a2b1a1b1
a0b2a3b2 a2b2 a1b2
a4b3'
a4b2'
a4b1'
a4' b4'
a0b3a3b3 a2b3 a1b3
a0'b4a4b4
a3'b4 a2'b4 a1'b4
a4
b4
The schematic logic circuit diagram of a 5-by-5 Baugh-Wooley two’s complement array multiplier
107
Example of Baugh-Wooley Two’s Complement Multiplication
p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 P
a4' a3'b4 a2'b4 a1'b4 a0'b4
X
A
B
a4 a3 a2 a1 a0
b4 b3 b2 b1 b0
a4b0' a3b0 a2b0 a1b0 a0b0
a4b4 a4b3' a3b3 a2b3 a1b3 a0b3
a4b2' a3b2 a2b2 a1b2 a0b2
a4b1' a3b1 a2b1 a1b1 a0b1
+
b4' a4
1 b4
1 1 1 0 1 1 1 1 1 1
0 0 1 0 0
= -65
X =13
= -5
0 1 1 0 1
1 1 0 1 1
1 0 0 0 0
0 1 0 1 1
0 0 1 0 1 1
0 1 0 1 1
+
1 1
1 0
0 0 0 1 0 0 0 0 0 1
1 0 0 0 0
= 65
X
=13
= 5
0 1 1 0 1
0 0 1 0 1
0 0 0 0 0
0 1 1 0 1
0 0 0 0 0 0
0 1 1 0 1
+
1 0
1 0
0 0 0 1 0 0 0 0 0 1
0 1 1 0 0
= 65
X
= -13
= -5
1 0 0 1 1
1 1 0 1 1
0 0 0 1 1
0 0 0 1 1
1 0 0 0 1 1
1 0 0 0 0
+
0 1
1 1
1 1 1 0 1 1 1 1 1 1
1 0 0 1 0
= -65
X
=13
= -5
0 1 1 0 1
1 1 0 1 1
0 1 1 0 1
0 1 1 0 1
0 0 1 1 0 1
0 0 0 0 0
+
0 0
1 1
108
Cluster MultipliersX
A
B
P
Divide the multiplier into smaller multipliers
109
Cluster MultipliersXA
B
P
Multiplier
A8~A7 A3~A0
4-bit
Multiplier
Final Addition Stage
8-bit
Latch
8-bit
Latch
8
/CLR
CLK
CLK
4-bit
Multiplier
8-bit
Latch
8-bit
Latch
8
/CLR
CLK
CLK
Multiplicand
B8~B7 B3~B0
4-bit
Multiplier
8-bit
Latch
8-bit
Latch
8
/CLR
CLK
CLK
4-bit
Multiplier
8-bit
Latch
8-bit
Latch
8
/CLR
CLK
CLK
44 4 4
EN3 EN2 EN1 EN0
16
P8-bit cluster low power multiplier
The circuit used
to generate the
enable signal
110
Cluster Multipliers
• Dividing the multiplication circuit into clusters
(blocks) of smaller multipliers
• Applying clock gating techniques to disable the
blocks that are producing a zero result.
• Features
– Low Power (claims 13.4 % savings)
XA
B
P
111
Multiplexer-Based Array MultipliersXA
B
P
1
0
1
1
2 22n
j
j
j
n
j
j
jj ZyxP
0
1Z0
2Z
1
2Z
0
3Z
1
3Z
2
3Z
0
4Z
1
4Z
2
4Z
3
4Z
jjjjj yXYxZ 021 ...XXXX jjj
Z j
xjyj
112
Multiplexer-Based Array MultipliersXA
B
P
Two types of cells:
Cell 1: produce the terms Zij2j and includes a full adder of
carry save adder array
Cell 2: produce the terms xjyj 2j and includes a full adder of
carry save adder array
113
Multiplexer-Based Array Multipliers
• Characteristics
– Faster than Modified Booth
– Unlike Booth, does not require encoding logic
– Requires approximately N2/2 cells
– Has a zigzag shape, thus not layout-friendly
XA
B
P
114
Multiplexer-Based Array MultipliersXA
B
P
• Improvement
– More rectangular layout
– Save up to 40 percent area without penalties
– Outperforms the modified Booth multiplier in both
speed and power by 13% to 26%
115
Gray-Encoded Array Multiplier XA
B
P
Dec Hyb Dec Hyb Dec Hyb Dec Hyb
0 0000 4 0100 -8 1100 -4 1000
1 0001 5 0101 -7 1101 -3 1001
2 0011 6 0111 -6 1111 -2 1011
3 0010 7 0110 -5 1110 -1 1010
• 2’s complement Hybrid Coding
– Having a single bit different for consecutive values
– Reducing the number of transitions, and thus power ( for
highly correlated streams ).
116
Gray-Encoded Array Multiplier XA
B
P
An 8-bit wide 2’s complement radix-4 array multiplier
117
Gray-Encoded Array Multiplier
• Characteristics
– Uses gray code to reduce the switching activity
of multiplier
– Saves 45.6% power than Modified Booth
– Uses greater area(26.4% ) than Modified Booth
XA
B
P
118
Ultra-high Speed Parallel Multiplier
• How to ultra-high speed?
– Based on Modified Booth Algorithm and Tree
Structure (Column compress)
– Chooses efficient counters (3:2 and 5:3)
– Uses the new compressor (faster 20% )
– Uses First Partial product Addition (FPA)
Algorithm (reducing the bits of CLA by 50%)
XA
B
P
119
Ultra-high Speed Parallel Multiplier XA
B
P
Calculate the
partial products as
soon as possible.
The final CLA is
only 16-bit instead
of 32-bit.
Divide into 3 rows
or 5 rows only
(most efficient).
Calculation process using parallel counter in case of 16x16
---Totally reduce delay by about 30%
120
ULLRLF Multiplier
• ULLRLF stands for Upper/Lower Left-to-
Right Leapfrog.
• Combine the following techniques:
– Signal flow optimization in [3:2] adder array
for partial product reduction,
– Left-to-right leapfrog (LRLF) signal flow,
– Splitting of the reduction array into upper/lower
parts.
XA
B
P
121
ULLRLF MultiplierXA
B
P
1) Signal flow optimization in [3:2] adder array
-- For n = 32, the delay is reduced by 30 percent.
-- The power is saved also.
PPij is always connected to pin A Sin/Cin are connected to B/C ,
most Sin signals are connected to C
122
ULLRLF MultiplierX
A
B
P
2) Left-to-Right Leapfrog (LRLF) Structure
-- The delay of signals is more balanceable.
-- Low power.
The sum signals skip
over alternate rows.
123
ULLRLF MultiplierX
A
B
P
3) Upper/Lower Split Structure
-- The long path of data path be broken into parallel short
paths, there would be a saving in power.
-- The delay of Partial Products Reduction is reduced.
Only n+2 bits
124
ULLRLF MultiplierXA
B
P
Floorplan of ULLRLF (n = 32)
•ULLRLF multipliers have
less power than optimized
tree multipliers for n ≤ 32 while
keeping similar delay and
area.
• With more regularity and
inherently shorter
interconnects, the ULLRLF
structure presents a
competitive alternative to tree
structures.
125
Signed Array MultiplierXA
B
P
HAFA
FAFA
HA
HA
A31
A29A31
A31
A31 A30
A31
HA
FAFA
FA
A30 A0
A1 A0
B2
A2 A1 A0
A3 A2 A1
B0
FA FAFA
A30 A1 A0
B31
32-bit carry look ahead adder
FA
A28
A29
A30
A0
B1
B3
FA
A0
P63 P62 P61 P34 P33 P31 P30 P2 P1 P0P3
STAGE 4 TO 30
(Each stage includes 32 AND gates, 31 full adders ,1 half adder and 1 NOT gate)
1
FA
32*32-Bit Array Multiplier for Signed Number
One stage of carry
save adder
126
Unsigned Array MultiplierXA
B
P
A31
A29
A31
A31
A31
A31
HA
FA FA HA
HAHA FAFA
FA
A30 A0
A30 A1 A0
B2
A2 A1 A0
A3 A2 A1
B0
FA FAFA
A30 A1 A0
B31
32-bit carry look ahead adder
FA
FA
A28
A29
A30
A0
B1
B3
FA
A0
P63 P62 P61 P33 P32 P31 P30 P2 P1 P0P3
STAGE 4 TO 30
(Each stage includes 32 AND gates, 31 full adders and 1 half adder)
32*32-Bit Array Multiplier for Unsigned Number
One stage of carry
save adder
127
Signed Modified Booth MultiplierX
A
B
P
................................{
{{
{{{
{
{{
{{{
{
{{
{
0
M
u
l
t
I
p
l
i
e
r
…............................1 E .............................................................1 E ................................….........................1 E ..................................S….......................1 E ..................................S.........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S..........….............1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S...........................1 E ..................................S.............................1 E ............................. ..S.....................................
6 3 6 0 5 5 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0
32*32-bit Booth Multiplier for
Signed Number
E = The inversion of sign bit in each row
S = the B i+1 bit in the three encoded bits
16 rows of partial products
B i-1 B B i+1
LSB
MSB
128
Signed Modified Booth
MultiplierXA
B
P
SEL SEL SEL SEL SEL SEL
A0 0A1A2A3A4
SEL SEL SEL SEL
A0 0A1A2
FA FA FA
SEL SEL SEL
A0 0A1
SEL SEL SEL SEL SEL SEL
A31 A31 A30 A29 A28 A27 A26
FA FA FAFAHA
1
SEL SEL SEL SEL
A31 A31 A30 A29 A28
1
SEL SEL
A31 A31 A30
1
HA FA HA HA HA HA HA
INVERT00
P0P1P2P3
INVERT2
P4
1
P63 P62 P61 P60 P5
64-bit carry look ahead adder
STAGE 3 TO 15
(Each stage includes 33 PP selectors, 31 full adders ,1 half adder and 1 NOT gate)INVERT n
Booth
Encoder
Booth
Encoder
Booth
Encoder
Booth
Encoder
B[1:0]0
B[3:1]
B[5:3]
B[31:5]
X1[0]
X2[0]
INVERT0
X1[1]
X2[1]
INVERT1
INVERT1
X1[2]
X2[2]
INVERT2
X1[n]
X2[n]
INVERT n
One stage
32*32-Bit Modified Booth Multiplier for Signed Number
0
129
Unsigned Modified Booth MultiplierX
A
B
P
…............................1 S' .............................................................1 S' .................................….........................1 S' ...................................S….......................1 S' ...................................S.........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S..........….............1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S...........................1 S' ...................................S.................................................................S...........................
................................{
{{
{{{
{
{{
{{{
{
{{
{
0
M
u
l
t
i
p
l
i
e
r
6 3 6 0 5 5 5 0 4 5 4 0 3 5 3 0 2 5 2 0 1 5 1 0 5 0
32*32-bit Booth Multiplier for
unsigned Number
S = the B i+1 bit in the three encoded bits
S' = The inversion of S
B i-1 B B i+1
{ 00
17 rows of partial products
LSB
MSB
130
Unsigned Modified Booth MultiplierXA
B
P
SEL SEL SEL SEL SELSEL_
END
A0 0A1A2A3A4
SEL SEL SEL
A0 0A1A2
FA FA FA
SEL SEL
A0 0A1
SEL SEL SEL SEL SEL
A31 A30 A29 A28 A27 A26
FA FA FAFAHA
1
SEL SEL SEL
A31 A30 A29 A28
1
SEL
A31 A30
FA HA HA HA HA HA HA
S[0]0
P0P1P2P3
S[2]
P4P63 P62 P61 P5
S[i]
Booth
Encoder
Booth
Encoder
Booth
Encoder
Booth
Encoder
B[1:0]0
B[3:1]
B[5:3]
B[i+1, I, i-1]
X1[0]
X2[0]
S[0]
X1[1]
X2[1]
S[1]
S[1]
X1[2]
X2[2]
S[2]
X1[i]
X2[i]
S [i]
One stage
32*32-Bit Modified Booth Multiplier for Unsigned Number
0
SEL_
END
SEL_
END
SEL_
END
SEL_
END
HA
1
S[1]
FA
SEL_
END
S[2]
FA FA
SEL
A0 0
SEL SEL
A31 A30 A29
FAHA
1S16
Booth
Encoder
00B[31]
X1[16]
X2[16]
S[16]SEL_
ENDSEL_
END
FA
P6
FA
SEL
A1
P32P33P34P35 P31
64-bit carry look ahead adder
STAGE 3 TO 15
(Each stage includes 33 PP selectors, 32 full adders ,1 half adder and 1 NOT gate)
S[0]
131
Wallace Tree multipliersX
A
B
P
32 partial products added in Wallace Tree Adder
64-bit Carry Look-ahead Adder
A[31:0] B[31:0]
C[63:0] S[63:0]
P[63:0]
132
Wallace Tree multipliers
............................................................................................................................ ... ........................................................................................................................ ....................................................... .................................................... . ................................................... . .................................................. ................................................. ..............................................................................................……......................................…….....................................…….............................................................................………............................………….......................………….....................………….........................…………..................……………..............……............... ......... ........... ....
................................................................. .......................................................... .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... ...............................................…................................................................................................……......................................……......................................……...................................………………….................
............................................................................................................................ .. .............................................................................................................................................................................…...................................................……..............................................……....................................... ....….................................…………………............................ ...................... ........................
............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . ........................
............................................................................................................................ .. ......................................................................................................................... ....................................................... .................................................... . ................................................... . .......................................... ....... .
............................................................................................................................ .. ......................................................................................................................... ..............................................
............................................................................................................................ ...................................................................................................
................................................................................................................................. .
................................................................. ............................................................... .............................................................. ............................................................. ............................................................ ........................................................... .......................................................... ......................................................... ........................................................ ....................................................... ...................................................... ..................................................... .................................................... ................................................... ................................................. .................................................................................................................................................. .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................. ....................................... ..................................................................
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ....................................................................
1
2
3
4
5
6
7
8
XA
B
P
• Use the 3:2 counters
and 2:2 counters
• Number of levels of
= log (32/2) / log (3/2)
≈8
• Irregular structure
• Fast
3:2 counter
....Carry
Sum
..
...
Carry
Sum
2:2 counter
Input:
Output:
133
Wallace Tree multipliersX
A
B
P
Carry Propagate/Generate unit
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
8-Bit BCLA
64-Bit Summation Unit
8-Bit BCLA
B63
P63-P56G63-G56
P7-P0G7-G0
..................................
C7-C0C63-C56
Cin
C8PM1GM1
PM0GM0
C16PM2GM2
C24PM3GM3
C40PM5GM5
PM4GM4
C48PM6GM6
C56PM7GM7
64-Bit Carry Look Ahead Adder
B0 A63 .................................. A0
P63 .................................. P0 G63 .................................. G0
.....................................................................................
C55-C48 C47-C40 C39-C32 C31-C24 C23-C16 C15-C8
P63 .................................. P0 C63 .................................. C0
.......................................................................................S63 S0C64
2-level hierarchical
134
Modified Booth-Wallace Tree MultipliersX
A
B
P
135
Modified Booth-Wallace Tree MultipliersXA
B
P
• Use the 3:2 counters
and 2:2 counters
• Number of levels of
= log (16/2) / log (3/2)
≈6
• Irregular structure
• Fast
• Less area
Rearrage
1
2
3
4
5
6
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
PP Dot Matrix of Booth-Wallace
Multiplier for Signed Number
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
................................................
........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
.........................................
......................................................................................................................................................................................................................................................................................................................................................................................................
..........................................................
....................................................................................................................................................................................................................................................................
..
...................................................................................................................................................................................................
..................................................................................................................................
136
Twin pipe serial-parallel multipliersXA
B
P
Parallel in – serial out
shift registers
Parallel in – serial out
shift registers
32-bit twin pipe serial-parallel
multiplier unitB31 B29 …… B3 B1
B30 B28 …… B2 B0
Load/Shift
Reset
Clock
Block diagram of 32*32-bit signed twin pipe serial-parallel
multiplier with serial/parallel conversion logic
Serial in – parallel out
shift registers
Serial in – parallel out
shift registers
P62 P60 ……………………… P2 P0
P63 P61 ……………………… P3 P1
Result_ready
A31 A30 …………………… A1 A0
Sign
137
Signed twin pipe serial-parallel
multipliersX
A
B
P
FA
D D
D
FA
FA
DD D
D
FA
A31 A30
A31
Even data bits on rising clock
Odd data bits on rising clock
…... B2 B0 0 0 reset
Clock
Reset
FA
DD
D
D
FA
A0
HA
D
D
HAD
0
MUX
1
Product
Even
product
Odd
product
D
D
falling_edge
rising_edge
Clock
…... B3 B1 0 0 reset
32*32-bit twin pipe serial-parallel multiplier
for signed number
Repeat 28 units more
Sign
B31 B29 …...
A30 A0
D
“Sign” control line and the sign-change hardware
138
Unsigned twin pipe serial-parallel
multipliersX
A
B
P
HA
D D
D
HA
FA
DD D
D
FA
A31 A30
A31 A30
Even data bits on rising clock
Odd data bits on rising clock
…... B2 B0 0 0 reset
Clock
Reset
FA
DD
D
D
FA
A0
A0
HA
D
D
HAD
0
MUX
1
Product
Even
product
Odd
product
D
D
falling_edge
rising_edge
Clock
…... B3 B1 0 0 reset
32*32 bit twin pipe serial-parallel multiplier
for unsigned number
Repeat 28 units more
• Don’t need the “Sign” control line and the sign-change hardware