prof. v.g. oklobdzijavlsi arithmetic1 vlsi arithmetic adders & multipliers prof. vojin g....
TRANSCRIPT
Prof. V.G. Oklobdzija VLSI Arithmetic 1
VLSI ArithmeticAdders & Multipliers
Prof. Vojin G. Oklobdzija
University of California
http://www.ece.ucdavis.edu/acsel
Prof. V.G. Oklobdzija VLSI Arithmetic 2
Introduction• Digital Computer Arithmetic belongs to
Computer Architecture, however, it is also an aspect of logic design
• The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.
• Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.
Prof. V.G. Oklobdzija VLSI Arithmetic 3
Basic Operations
• Addition
• Multiplication
• Multiply-Add
• Division
• Evaluation of Functions
Prof. V.G. Oklobdzija VLSI Arithmetic 4
Addition of Binary NumbersFull Adder. The full adder is the fundamental building block of most arithmetic circuits:
The sum and carry outputs are described as:
iiiiiiiiiiiiiiiiiii cbcabacbacbacbacbac 1
iiiiiiiiiiiii cbacbacbacbas
FullAdder
CinCout
si
ai bi
Prof. V.G. Oklobdzija VLSI Arithmetic 5
Addition of Binary Numbers
Propagate
Propagate
Generate
Generate
Inputs Outputs
ci ai bi si ci+1
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
Prof. V.G. Oklobdzija VLSI Arithmetic 6
Full-Adder Implementation Full Adder operations is defined by equations:
iiiiiiiiiiiiiiiiii cpcbacbacbacbacbas
iiiiiiiiiiii cpgbacbacbac 1
One-bit adder could be implemented as shown
Carry-Propagate:and Carry-Generate gi
iii bap
iii bag cout c in
s i
a i b i
Prof. V.G. Oklobdzija VLSI Arithmetic 7
High-Speed Addition
iii cps
iiii cpgc 1
One-bit adder could be implemented more efficiently
because MUX is faster
iii bap iii bag
0
1s
b ia i
cout
s i
c in
Prof. V.G. Oklobdzija VLSI Arithmetic 8
The Ripple-Carry Adder
Prof. V.G. Oklobdzija VLSI Arithmetic 9
The Ripple-Carry AdderA0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2
A3 B3
S3
Co,3
(= Ci,1)FA FA FA FA
Worst case delay linear with the number of bits
tadder N 1– tcarry tsum+
td = O(N)
Goal: Make the fastest possible carry path circuit
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 10
Inversion Property
A B
S
CoCi FA
A B
S
CoCi FA
S A B Ci S A B Ci
=
Co A B Ci Co A B Ci
=
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 11
Minimize Critical Path by Reducing Inverting Stages
A0 B0
S0
Co,0Ci,0
A1 B1
S1
Co,1
A2 B2
S2
Co,2 Co,3FA’ FA’ FA’ FA’
A3 B3
S3
Odd CellEven Cell
Exploit Inversion Property
Note: need 2 different types of cellsFrom Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 12
Manchester Carry-Chain Realization of the Carry Path
• Simple and very popular scheme for implementation of carry signal path
V dd
Carry out Carry in
Propagatedevice
Predischarge& kill device
Generatedevice
++++++++
V ddV ddV ddV ddV ddV ddV dd
Prof. V.G. Oklobdzija VLSI Arithmetic 13
Manchester Carry Chain
P0
Ci,0
P1
G0
P2
G1
P3
G2
P4
G3 G4
VDD
Kilburn, et al, IEE Proc, 1959.
•Implement P with pass-transistors•Implement G with pull-up, kill (delete) with pull-down•Use dynamic logic to reduce the complexity and speed up
Prof. V.G. Oklobdzija VLSI Arithmetic 14
Ripple Carry Adder Carry-Chain of an RCA implemented using multiplexer from the standard cell library:
a i+1 b i+1 a i b ia i+2 b i+2
cout
c i+1 c i
s is i+1s i+2
c in
Critical Path
Oklobdzija, ISCAS’88
Prof. V.G. Oklobdzija VLSI Arithmetic 15
Pass-Transistor Realization in DPL A
A
B
B
C C
V C CS
S
XO R /XN O R M U LT IPLEX ER B U FFER
C C
M U LT IPLEX ER
V C CC
O
CO
B U FFER
V C C
V C C
O R /N O R
A N D /N A N D
A
A
B
B
A
A
B
B
Prof. V.G. Oklobdzija VLSI Arithmetic 16
Carry-Skip Adder
MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61
Prof. V.G. Oklobdzija VLSI Arithmetic 17
Carry-Skip Adder
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,3Co,2Co,1Co,0Ci ,0
FA FA FA FA
P0 G1 P0 G1 P2 G2 P3 G3
Co,2Co,1Co,0Ci,0
Co,3
Mul
tipl
exer
BP=PoP1P2P3
Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.
Bypass
From Rabaey
Prof. V.G. Oklobdzija VLSI Arithmetic 18
Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups
G r G r-1
...
SN-k-1S N-1
a N -1bN -1 b N -k-1a N -k-1
S(r-1)k-1 S (r-2)k
G 1G o
...
Sk
S2k-1
a 2k-1b 2k-1 b kak
Sk-1
S0
...
...a (r-1)k b(r-1)k a (r-1)kb (r-1)k
...a k-1 b k-1 a0 b 0
...
C in
... ... ... ... ... ... ... ...
P r-1P r-2 P 1 P 0
C out + + + +
A N D
O RO RO R O R
A N DA N DA N D
critica l pa th , de lay =2(k-1)+(N /2-2)
Prof. V.G. Oklobdzija VLSI Arithmetic 19
Carry-Skip Adder
SKIPRCAd tN
tkt
2
212
N
tp
ripple adder
bypass adder
4..8
k
Prof. V.G. Oklobdzija VLSI Arithmetic 20
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 21
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
G 0
... ...
a0 b
0
...
...
ai
bi
aN-1
bN-1
S j
P m -2
C inC out
C ou
t
G 2G m -2G m -1G m
G 0G 1G 2G m -2G m -1G m
S N-1S i
S 0
P 2P 0P m -1P m
.....
G 1
P 1
C in
.....
aj b
j
Carry signal path
skip ing
ripp ling
Prof. V.G. Oklobdzija VLSI Arithmetic 22
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
1 12 23 34 4
5 56
=9
Any-point-to-any-point delay = 9 as compared to 12 for CSKA
Prof. V.G. Oklobdzija VLSI Arithmetic 23
Carry-chain block size determination for a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 24
Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
P0
Ci,0
P1
G0
P2
G1
P3
G2
BP
G3
BP
Co,3
Delay model:
Prof. V.G. Oklobdzija VLSI Arithmetic 25
Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Group Length
Oklobdzija, Barnes, Arith’85
321 cNcctd
Prof. V.G. Oklobdzija VLSI Arithmetic 26
Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Variable Block Lengths
• No closed form solution for delay• It is a dynamic programming problem
Prof. V.G. Oklobdzija VLSI Arithmetic 27
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 28
Delay Comparison: Variable Block Adder
0
2
4
6
8
10
12
14
16
4 11 18 25 32 39 46 53 60
Size N
Del
ay
VBA- Multi-Level
CLA
VBA
Prof. V.G. Oklobdzija VLSI Arithmetic 29
Fan-Out Dependency
Prof. V.G. Oklobdzija VLSI Arithmetic 30
Fan-In Dependency
Prof. V.G. Oklobdzija VLSI Arithmetic 31
Delay Comparison: Variable Block Adder(Oklobdzija, Barnes: IBM 1985)
Prof. V.G. Oklobdzija VLSI Arithmetic 32
Prof. V.G. Oklobdzija VLSI Arithmetic 33
Carry-Lookahead Adder(Weinberger and Smith)
Weinberger and J. L. Smith, “A Logic for High-Speed Addition”,
National Bureau of Standards, Circ. 591, p.3-12, 1958.
Prof. V.G. Oklobdzija VLSI Arithmetic 34
Carry-Lookahead Adder(Weinberger and Smith)
1111
111
1112
)(
cppgpg
cpgpg
cpgc
iiiii
iiii
iiii
iiiiiiiiiiii cpgbacbacbac 1
iiiiiiiiii
iiiiiiii
iiii
cpppgppgpg
cppgpgpg
cpgc
1212122
11122
2223
)(
Prof. V.G. Oklobdzija VLSI Arithmetic 35
Carry-Lookahead Adder
jiiiiiiiiij cpppgppgpgG 123123233
iiiij ppppP 123
jiij cPGc 4)1(4
One gate delay to calculate p, g
One to calculateP and two for G
Three gate delaysTo calculate C4(j+1)
Compare that to 8 in RCA !
a i b i
Cin Cj
G jP j
a i+1 b i+1
g i+1p i+1 g i p i
a i+2 b i+2a i+3 b i+3
g i+1p i+1g i+1p i+1
C4(j+1)
C4j+1C4j+2C4j+3
P , G G roup
Prof. V.G. Oklobdzija VLSI Arithmetic 36
Carry-Lookahead Adder(Weinberger and Smith)
iiiiiiiiiij GPPPGPPGPG 123123233*G
iiiij PPPPP 123*
jkkj cPGc 4)1(4 **
P j
G* P*
C 4j+1
G jP j+1G j+1P j+3G j+3P j+2G j+2
C4jC4(j+1)
C 4j+2C 4j+3
Additional two gate delays
C16 will take a total of 5 vs. 32 for RCA !
Prof. V.G. Oklobdzija VLSI Arithmetic 37
32-bit Carry Lookahead Adder
C in
C out C in
C 4C 8C 12
C out
C 20C 24C 28
C in
C 16
a ib i
ind ividua l addersgenera ting: g i, p i,
and sum S i
C arry-lookahead b locks o f4-b its generating:
G i, P i, and C in fo r theadders
C arry-lookahead super- b locks o f4-b its b locks genera ting:
G * i, P * i, and C in fo r the 4-b itb locks
G roup producing fina lcarry C out and C 16
C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay
Prof. V.G. Oklobdzija VLSI Arithmetic 38
Carry-Lookahead Adder(Weinberger and Smith: original derivation )
Prof. V.G. Oklobdzija VLSI Arithmetic 39
Carry-Lookahead Adder(Weinberger and Smith: original derivation )
Prof. V.G. Oklobdzija VLSI Arithmetic 40
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Prof. V.G. Oklobdzija VLSI Arithmetic 41
Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !
Prof. V.G. Oklobdzija VLSI Arithmetic 42
Delay Optimized CLA
B. Lee, V. G. Oklobdzija
Journal of VLSI Signal Processing, Vol.3, No.4, October 1991
Prof. V.G. Oklobdzija VLSI Arithmetic 43
Delay Optimized CLA: Lee-Oklobdzija
‘91(a.) Fixed groups and levels
(b.) variable-sized groups, fixed levels
(c.) variable-sized groups and fixed levels
(d.) variable-sized groups and levels
Prof. V.G. Oklobdzija VLSI Arithmetic 44
Two-Levels of Logic Implementation of the Carry Block
Prof. V.G. Oklobdzija VLSI Arithmetic 45
Two-Levels of Logic Implementation of the Carry-Lookahead Block
Prof. V.G. Oklobdzija VLSI Arithmetic 46
Three-Levels of Logic Implementation of the Carry Block (restricted fan-in)
Prof. V.G. Oklobdzija VLSI Arithmetic 47
Three-Levels of Logic Implementation of the Carry Lookahead (restricted fan-in)
Prof. V.G. Oklobdzija VLSI Arithmetic 48
Delay Optimized CLA: Lee-Oklobdzija ‘91
Delay: Two-level BCLA Delay: Three-level BCLA
Prof. V.G. Oklobdzija VLSI Arithmetic 49
Delay Optimized CLA: Lee-Oklobdzija ‘91
(a.) 2-level BCLA =8.5nS (b.) 3-level BCLA =8.9nS
Prof. V.G. Oklobdzija VLSI Arithmetic 50
Motorola: CLA Implementation Example
A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,
Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.
Prof. V.G. Oklobdzija VLSI Arithmetic 51
Critical path in Motorola's 64-bit CLA
C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63
G4
P7
G0
P0
G1
P1
G2
P2
G3
P3
...
CARRYBLOCK
G8
P1
1
... G1
2
P1
5
... G1
6
P3
1
... G3
2
P4
7
... G4
8
P5
1
G6
0
P6
0
G6
1
P6
1
G6
2
P6
2
G6
3
P6
3
... G5
2
P5
5
... G5
6
P5
9
...
PG BLOCK
PG BLOCK
PG BLOCK
PG BLOCK
P,G
0
P,G
1:0
P,G
2:0
G3
:0
P3
:0
G7
:4
P7
:4
G1
1:8
P1
1:8
G1
5:1
2
P1
5:1
2
G3
:0
P3
:0
G7
:0
P7
:0
G1
1:0
P1
1:0
G1
5:0
P1
5:0
G1
5:0
P1
5:0
G3
1:1
6
P3
1:1
6
G3
1:0
P3
1:0
G4
7:3
2
P4
7:3
2
G4
7:0
P4
7:0
G5
1:4
8
P5
1:4
8
G5
5:5
2
P5
5:5
2
G5
9:5
6
P5
9:5
6
C6
4
G5
1:4
8
P5
1:4
8
G5
5:4
8
P5
5:4
8
G5
9:4
8
P5
9:4
8
P,G
60
P,G
61
:60
P,G
62
:60
G6
3:6
0
P6
3:6
0
G6
3:4
8
P6
3:4
8
G6
3:0
P6
3:0
C0
C4
C8
C1
2
C1
6
C3
2
C4
8
C1
6
C3
2
C4
8
C5
2
C5
6
C6
0
C6
3
PG BLOCK
C6
2
C6
1
Prof. V.G. Oklobdzija VLSI Arithmetic 52
Motorola's 64-bit CLA
conventional PG Block
Prof. V.G. Oklobdzija VLSI Arithmetic 53
Motorola's 64-bit CLA
Modified PG Block
Intermediate propagate signals Pi:0 are generated to speed-up C3
Prof. V.G. Oklobdzija VLSI Arithmetic 54
Ling’s Adder
Huey Ling, “High-Speed Binary Adder”
IBM Journal of Research and Development, Vol.5, No.3, 1981.
Prof. V.G. Oklobdzija VLSI Arithmetic 55
Ling AdderVariation of CLA:
Ling, IBM J. Res. Dev, 5/81
1 iiii GpgG
1 iii GpS
iii bap
iii bag
11 iiii HtgH
11 iiiiii HtgHtS
iii bat
iii bag
Ling’s equations:
Prof. V.G. Oklobdzija VLSI Arithmetic 56
Ling Adder
1 iiii GpgG
1
11
iiii
iiiiii
Gpgg
GpGggG
1 iiii GtgG11 iiii GtgH
Ling’s equation
Doran, Trans on Comp 9/88
Propagates informationon two bits
Prof. V.G. Oklobdzija VLSI Arithmetic 57
Ling Adder
01231232333 gtttgttgtgG
0121223
00121122233
gttgtgg
gtttgttgtgH
Conventional:
Ling:
Prof. V.G. Oklobdzija VLSI Arithmetic 58
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 59
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 60
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 61
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 62
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 63
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 64
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 65
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 66
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 67
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 68
S. Naffziger, ISSCC’96
Prof. V.G. Oklobdzija VLSI Arithmetic 69
Results:S. Naffziger, “A Subnanosecond 64-b Adder”, ISSCC ‘ 96
• 0.5u Technology
• Speed: 0.930 nS
• Nominal process, 80C, V=3.3V
Prof. V.G. Oklobdzija VLSI Arithmetic 70
ConditionalSum Adder
J. Sklansky, “Conditional-Sum Addition Logic”, IRE Transactions on Electronic
Computers, EC-9, p.226-231, 1960.
Prof. V.G. Oklobdzija VLSI Arithmetic 71
ConditionalSum Adder
Prof. V.G. Oklobdzija VLSI Arithmetic 72
ConditionalSum Adder
Prof. V.G. Oklobdzija VLSI Arithmetic 73
Carry-Select Adder
O. J. Bedrij, “Carry-Select Adder”, IRE Transactions on Electronic Computers, June
1962, p.340-34
Prof. V.G. Oklobdzija VLSI Arithmetic 74
Carry-Select AdderAddition under assumption of Cin=0 and Cin =1.
Prof. V.G. Oklobdzija VLSI Arithmetic 75
Carry Select Adder:combining two 32-b VBAs in select mode
Delay =VBA32+ MUX
Prof. V.G. Oklobdzija VLSI Arithmetic 76
Addition Under Non-equal Signal Arrival Profile
Assumption
P. Stelling , V. G. Oklobdzija, "Design Strategies for Optimal Hybrid Final Adders in a Parallel Multiplier", special issue on VLSI Arithmetic, Journal of VLSI Signal Processing, Kluwer
Academic Publishers, Vol.14, No.3, December 1996
Prof. V.G. Oklobdzija VLSI Arithmetic 77
Signal Arrival Profile form the Parallel Multiplier Partial-Product Recuction Tree
Prof. V.G. Oklobdzija VLSI Arithmetic 78Oklobdzija, Villeger, IEEE Transactions on VLSI Systems, June, 1995
Prof. V.G. Oklobdzija VLSI Arithmetic 79
Oklobdzija and Villeger, IEEE Transactions on VLSI Systems, June, 1995
Prof. V.G. Oklobdzija VLSI Arithmetic 80
Prof. V.G. Oklobdzija VLSI Arithmetic 81
Prof. V.G. Oklobdzija VLSI Arithmetic 82
Prof. V.G. Oklobdzija VLSI Arithmetic 83
Prof. V.G. Oklobdzija VLSI Arithmetic 84
Prof. V.G. Oklobdzija VLSI Arithmetic 85
Prof. V.G. Oklobdzija VLSI Arithmetic 86
Prof. V.G. Oklobdzija VLSI Arithmetic 87
Prof. V.G. Oklobdzija VLSI Arithmetic 88
Performing Multiply-Add Operation in the Multiply Time
P. Stelling, V. G. Oklobdzija, " Achieving Multiply-Accumulate Operation in the
Multiply Time", Thirteenth International Symposium on Computer Arithmetic, Pacific
Grove, California, July 5 - 9, 1997.
Prof. V.G. Oklobdzija VLSI Arithmetic 89
Prof. V.G. Oklobdzija VLSI Arithmetic 90
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 91
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 92
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 93
Final Adder: Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 94
Recurrence Solver Based Adders
Koggie and Stone, IEEE Trans on Computers, August 1973
Bilgory and Gajski, 18th DAC, 1981
Brent and Kung, IEEE Trans on Computers, March 1982
Prof. V.G. Oklobdzija VLSI Arithmetic 95
Recurrence Solver Based Adders• 1973, Koggie and Stone published a general
recurrence scheme for parallel computation• 1979, Brent and Kung published Tech. Report on
regular layout for parallel adders• 1980, Guibas and Vuillemin, developed a layout
scheme based on recurrence equation for addition• 1980, Ladner and Fisher published “parallel prefix
computation”, Jo of ACM• 1981, Bilgory and Gajski published a paper on
recurrence structures for automatic cell generation
Prof. V.G. Oklobdzija VLSI Arithmetic 96
Recurrence Solver Based Adders
They are based on recurrence equation for P,G
(what is new there since Weinberger ?!!):
Or: and
jiiiiiiiiij cpppgppgpgG 123123233
iiiij ppppP 123
11 iiii GpgG11 iii PpP
Prof. V.G. Oklobdzija VLSI Arithmetic 97
Recurrence Solver Based Adders C 16 C 13C 14C 15 C 7 C 1C 2C 3C 8 C 4C 5C 6C 12 C 9C 10C 11
(g1 , p
1 )
(g3 , p
3 )
(g4 , p
4 )
(g2 , p
2 )
(g5 , p
5 )
(g7 , p
7 )
(g8 , p
8 )
(g6 , p
6 )
(g9 , p
9 )
(g11 , p
11 )
(g12 , p
12 )
(g10 , p
10 )
(g13 , p
13 )
(g15 , p
15 )
(g16 , p
16 )
(g14 , p
14 )
generationof carry
generationof g i, p i
Prof. V.G. Oklobdzija VLSI Arithmetic 98
Carry-Lookahead Adder (Weinberger and Smith)
Just to remind you !please notice the similarity with Parallel-Prefix Adders !
Prof. V.G. Oklobdzija VLSI Arithmetic 99
Multiplexer Based Adder
Farooqui and Oklobdzija1999 Int’l Sym. on VLSI Technology, Taipei,
Taiwan, June 8-10, 1999
Prof. V.G. Oklobdzija VLSI Arithmetic 100
Multiplexer Based Adder
• Based on the realization that MUX circuit is faster than a logic gate due to its transmission gate implementation
• Based on Carry-Lookahead method (W-S), or recurrence solver.
Prof. V.G. Oklobdzija VLSI Arithmetic 101
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
a3b2a2 b2a2b3a3
0 1
b0 a0 a1b0 a0 b1 a1
0 1
01
g01g23
p23
p3p1
g03p03
g03 p03
g3p
3
g2p
2
g1p
1
g0p
0
Prof. V.G. Oklobdzija VLSI Arithmetic 102
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
4 -b it M U Xb a se d g ro u p
c a r ry g e n .
M U X an d N O RM U X an d N O R
M U X an d N A N DM U X an d N A N D
A 03B 03A 47B 47A 811B 811A 1215B 1215
G 0 -3
P 0 -3G 4 -7P 4 -7G 8 -11
P 8 -11G 1 2 -1 5
P 1 2 -1 5
C 3C 7C 11C 1 5
P 0 -7
G 0 -7
P 8 -1 5 G 8 -1 5
G 0 -11G 0 -1 5P 0 -11P 0 -1 5
B 811 A 811B 811A1215B1215 A1215B1215
S um 0-3
4 -b itS u m
4 -b itS u m
C in0C in1
S um 4-7
1 0
A 47B 47 A 47B 47
4 -b itS u m
4 -b itS u m
C in0C in1
S um 8-11
1 0
A 811
4 -b itS u m
4 -b itS u m
C in0C in1
S um 12-15
1 0
4 -b itS u m
C in0A 03B 03
AND
AND
P art_C ont
P art_C ont
CSA CSACSA
Prof. V.G. Oklobdzija VLSI Arithmetic 103
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
0 10 1
g0p1
p0
a0b0
0 1
01
a1b1
p2
g1
g1
0 1
01
a2b2
p3
g2
0 1
g2g1
Cin
Sum0Sum1Sum2Sum3
Prof. V.G. Oklobdzija VLSI Arithmetic 104
Multiplexer Based AdderA. A. Farooqui, V. G. Oklobdzija , F. Chechrazi, 1999 Int’l Sym. on VLSI
Technology, Taipei, Taiwan, June 8-10, 1999.
• Results in a very fast structure• 7-MUX delays for a 64-b adder• Delay using standard cell 0.25u, 2.5V, 25oC :
Adder Size (bits)
Delay
(pS)
8 625
16 665
32 710
64 903
Prof. V.G. Oklobdzija VLSI Arithmetic 105
DEC "Alpha" 21064 Adder
• Combination:– 8-bit tapered pre-discharged Manchester Carry
Chains, with Cin = 0 and Cin = 1
– 32-bit LSB Carry Lookahead Adder– 32-bit MSB Conditional-Sum Adder– Carry-Select on most significant 32-bits– Latches in the middle: pipelined addition
Prof. V.G. Oklobdzija VLSI Arithmetic 106
DEC "Alpha" 21064 Adder Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
S witch
Latch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
D ualS w itch
Latch & X O R Latch & X O R Latch & X O R Latch & X O R
Latch & X O R Latch & X O RLatch & X O RLatch & X O R
PG K C ellPG K C ell PG K C ell PG K C ell PG K C ellPG K C ell PG K C ell PG K C ell
LookA head
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
C arryC hain
M UX
10
10
10
10
10
10
10
C in
Input O perandsB yte 7
Input O perandsB yte 6
Input O perandsB yte 5
Input O perandsB yte 4
Input O perandsB yte 3
Input O perandsB yte 2
Input O perandsB yte 1
Input O perandsB yte 0
R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt R esu lt
Prof. V.G. Oklobdzija VLSI Arithmetic 107
DEC "Alpha" 21064 Adder: Results
• The first 200MHz processor
• Built using 0.75u technology
• V=3.3V, 30W
• Pipelined (two-latches) allowing 5nS throughput and 10nS latency
Prof. V.G. Oklobdzija VLSI Arithmetic 108
ConclusionVLSI Implementation of Addition
Prof. V.G. Oklobdzija VLSI Arithmetic 109
Conclusion: VLSI Implementation of Addition
• Currently, implementation parameters are not reflected in algorithms used for development
• Layout and wire delays effects are largely neglected and this is becoming intolerable in the next generation of technology
• Transistor sizing has a large effect which can outweight the algorithm
• There is a great disconnect between algorithm and implementation
• New rules and measures of goodness are needed
Prof. V.G. Oklobdzija VLSI Arithmetic 110
Multiplication
Parallel Multiplier Implementation
Prof. V.G. Oklobdzija VLSI Arithmetic 111
Multiplication Algorithm:
in
i
iin
i
i ryXryXXYP
1
0
1
0
0 p)(0
)(1)1(
jnjj Xyrp
rp for j=0,....,n-1
initially
p(n)=XY after n steps
Prof. V.G. Oklobdzija VLSI Arithmetic 112
Parallel MultipliersParallel Multipliers
Step 0
S tep 1
S tep 2
S tep 3
S tep 4
Prof. V.G. Oklobdzija VLSI Arithmetic 113
4:2 Compressor
4-2
I4 I1I2I3
C 0 C i
C S
Prof. V.G. Oklobdzija VLSI Arithmetic 114
Re-designed 4:2 Compressor with 3 XOR Delay
C inI1
I2
I3
I4
0
1
S
C
C out
Prof. V.G. Oklobdzija VLSI Arithmetic 115
Three-Dimensional optimization Method: TDM
(Oklobdzija, Villeger, Liu, 1996)
Sum
Carry
A
BCin
Sum
Carry
A
BCin
I1
I2
I3
I4
C out
C in 3 XO Rdelays
Prof. V.G. Oklobdzija VLSI Arithmetic 116
Generation of the Partial Product Reduction Tree in TDM multiplier
E x am ple o f a1 2 X 12 M u l tip lic a tio n
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
1 0 1 1 0 1 0 1 0 1 0 00 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0
1 0 1 1 0 1 0 1 0 1 0 01 0 1 1 0 1 0 1 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 01 0 1 1 0 1 0 1 0 1 0 0
Ve rtica l C o m pre sso r S lice - VC S
(P ar tia l P rod uc t fo r X *Y = B 54 * B 1B )
FA FA
F A
FA
0 0 1 1 0 1 0
FA
Tim e
Fin a l A d d er
Prof. V.G. Oklobdzija VLSI Arithmetic 117
Speed of Partial Product Reduction for Various Schemes
Prof. V.G. Oklobdzija VLSI Arithmetic 118
Booth Recoding Algorithm
xi+2xi+1xi Add to partial product
000 +0Y
001 +1Y
010 +1Y
011 +2Y
100 -2Y
101 -1Y
110 -1Y
111 -0Y
Prof. V.G. Oklobdzija VLSI Arithmetic 119
Organization of Hitachi's DPL multiplier
4-2 4-2
4-2
4-2 4-2
4-2
4-2 4-2
4-2
4-2
4-2
4-2
4-2
54 b it 54 b it
B ooth 's E ncoder
108-b C LA A dder
108 b it
W alace 's tree
C onditiona l C arry S e lection (C C S )
Prof. V.G. Oklobdzija VLSI Arithmetic 120
Hitachi's 4:2 compressor structure
M UX
M UX
M UX
M UX
I4
I3
I1
I2
M UX
M UX
I1
I3
I4
C i
C i
C o
C
S
3 G ATES
Prof. V.G. Oklobdzija VLSI Arithmetic 121
DPL multiplexer circuit
L
H
M U X
D 0
D 1
D 0
D 1
S S
O U T
O U T
O U T
S
D 1
D 0
Prof. V.G. Oklobdzija VLSI Arithmetic 122
ConclusionReferences:
1. E. Swartzlander, "Computer Arithmetic". Vol. 1&2, IEEE Computer Society Press, 1990.
2. K. Hwang, "Computer Arithmetic : Principles, Architecture and Design", John Wiley and Sons, 1979.
3. M. Ercegovac, “Digital Systems and Hardware/Firmware Algorithms”, Chapter 12: Arithmetic Algorithms and Processors, John Wiley & Sons, 1985.
4. A. Chandrakasan, W. Bowhill, F Fox, Editors, "Design of High Performance Microprocessors Circuits", IEEE Press, July 2000.
5. V. G. Oklobdzija, “High-Performance System Design: Circuits and Logic”, IEEE Press, July 1999.
Also: http://www.ece.ucdavis.edu/acsel/Publications.html