multipliers in vlsi
Post on 16-Apr-2017
667 Views
Preview:
TRANSCRIPT
Sp09 CMPEN 411 L20 S.1
CMPEN 411VLSI Digital Circuits
Spring 2009
Lecture 20: Multiplier Design
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L20 S.2
Review: Basic Building Blocks Datapath
Execution units- Adder, multiplier, divider, shifter, etc.
Register file and pipeline registers Multiplexers, decoders
Control Finite state machines (PLA, ROM, random logic)
Interconnect Switches, arbiters, buses
Memory Caches (SRAMs), TLBs, DRAMs, buffers
Sp09 CMPEN 411 L20 S.3
The Binary Multiplication
x
Partial products
Multiplicand
Multiplier
Result
1 0 1 0 1 0
1 0 1 0 1 0
1 0 1 0 1 0
1 1 1 0 0 1 1 1 0
0 0 0 0 0 0
1 0 1 0 1 0
1 0 1 1
Sp09 CMPEN 411 L20 S.4
Multiply Operation Multiplication is just a a lot of additions
multiplicandmultiplier
partialproductarray
double precision product
N
2N
N can be formed in parallel
Sp09 CMPEN 411 L20 S.5
Multiplication Approaches Right shift and add
Partial product array rows are accumulated from top to bottom on an N-bit adder
- After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add
Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA
Making it faster Use a faster adder Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)
- Use multiplier recoding to simplify multiple formation (booth) Form the partial product array in parallel and add it in parallel
Making it smaller (i.e., slower) Use serial-parallel mult Use an array multiplier
- Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined
Sp09 CMPEN 411 L20 S.6
Serial-parallel multiplier structure
Sp09 CMPEN 411 L20 S.7
The Array Multiplier
Y0
Y1
X3 X2 X1 X0
X3
HA
X2
FA
X1
FA
X0
HA
Y2X3
FA
X2
FA
X1
FA
X0
HA
Z1
Z3Z6Z7 Z5 Z4
Y3X3
FA
X2
FA
X1
FA
X0
HA
Sp09 CMPEN 411 L20 S.11
Booth multiplier Encoding scheme to reduce number of stages in
multiplication.
Performs two bits of multiplication at once—requires half the stages.
Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.
Sp09 CMPEN 411 L20 S.12
Booth encoding Two’s-complement form of multiplier:
y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)(example, y=18=010010 y= -18 = 101110 )
Rewrite using 2a = 2a+1 - 2a: y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Sp09 CMPEN 411 L20 S.13
Booth actions
yi yi-1 yi-2 increment
0 0 0 0
0 0 1 x
0 1 0 x
0 1 1 2x
1 0 0 -2x
1 0 1 -x
1 1 0 -x
1 1 1 0
y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...
Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.
Sp09 CMPEN 411 L20 S.14
Booth example x = 1001 (910), y = 0111 (710).
P0 = 00000000
y3y2y1=011 y1y0y-1=11(0)
y1y0y-1 = 110, P1 = P0 - (1001) = 11110111 x shift left for 2 bits to be 100100
y3y2y1 = 011, P2 = P1 (10*100100) =
11110111+01001000 = 001111111 (6310) An array multiplier needs N addtions, booth multiplier
needs only N/2 additions
Sp09 CMPEN 411 L20 S.15
Review: A 64-bit Adder/Subtractor
1-bit FA S0
C0=Cin
C1
1-bit FA S1
C2
1-bit FA S2
C3
C64=Cout
1-bit FA S63
C63
. . .
Ripple Carry Adder (RCA) built out of 64 FAs
Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in
RCA advantage: simple logic,
so small (low cost)
disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)
A0
B0
A1
B1
A2
B2
A63
B63
add/subt
Sp09 CMPEN 411 L20 S.16
Booth structure
Sp09 CMPEN 411 L20 S.17
Wallace-Tree Multiplier
6 5 4 3 2 1 0 6 5 4 3 2 1 0
Partial products First stage
Bit position
6 5 4 3 2 1 0 6 5 4 3 2 1 0Second stage Final adder
FA HA
(a) (b)
(c) (d)
Sp09 CMPEN 411 L20 S.18
Wallace-Tree Multiplier
Partial products
First stage
Second stage
Final adder
FA FA FA
HA HA
FA
x3y3
z7 z6 z5 z4 z3 z2 z1 z0
x3y2x2y3
x1y1x3y0 x2y0 x0y1x0y2
x2y2x1y3
x1y2x3y1x0y3 x1y0 x0y
Full adder = (3,2) compressor
Sp09 CMPEN 411 L20 S.19
Making it Faster: Tree Multiplier Structure
partial productarray reduction tree
fast carry propagate adder (CPA)
P (product)
mux + reductiontree (log N)+CPA (log N)
Q (‘ier)
D (‘icand)
DD
D
0
00
0
multiple forming circuits
inte
rcon
nect
Sp09 CMPEN 411 L20 S.20
(4,2) Counter Built out of two (3,2) counters (just FA’s!)
all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)
the internal carry output is fed to the next higher weight position (indicated by the )
(3,2)
(3,2) Note: Two carry outs - one “internal” and one “external”
Sp09 CMPEN 411 L20 S.22
Tiling (4,2) Counters
Reduces columns four high to columns only two high Tiles with neighboring (4,2) counters Internal carry in at same “level” (i.e., bit position weight) as the
internal carry out
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
(3,2)
Sp09 CMPEN 411 L20 S.24
4x4 Partial Product Array Reduction
multiplicandmultiplier
partialproductarray
reduced pp array (to CPA)
double precision product
Fast 4x4 multiplication using (4,2) counters How would you
lay it out?
five (4,2) counters
5-bit CPA
multiplicand
multiplier
8-bit product
Sp09 CMPEN 411 L20 S.25
8x8 Partial Product Array Reduction ‘icand
‘ier
partialproductarray
reduced partial product array
Wallace tree multiplier
two rows of nine (4,2) counters
one row of thirteen (4,2) counters
to a 13-bit fast CPA
Sp09 CMPEN 411 L20 S.26
An 8x8 Multiplier Layout How should it be laid out?
multiplicandm
ultiplier
thirteen (4,2) counters
nine (4,2) counters
nine (4,2) counters
13-bit CPA
Sp09 CMPEN 411 L20 S.32
Multipliers —Summary
• Optimization Goals Different Vs Binary Adder
• Once Again: Identify Critical Path
• Other possible techniques
- Data encoding (Booth)- Pipelining
FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION
- Logarithmic versus Linear (Wallace Tree Mult)
Sp09 CMPEN 411 L20 S.33
Next Lecture and Reminders Next lecture
Shifters, decoders, and multiplexers- Reading assignment – Rabaey, et al, 11.5-11.6
top related