vlsi arithmetic adders prof. vojin g. oklobdzija university of california

44
VLSI Arithmetic Adders Prof. Vojin G. Oklobdzija University of California http:// www.ece.ucdavis.edu/

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

VLSI ArithmeticAdders

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Oklobdzija 2004 Computer Arithmetic 2

Introduction

• Digital Computer Arithmetic belongs to Computer Architecture, however, it is also an aspect of logic design.

• The objective of Computer Arithmetic is to develop appropriate algorithms that are utilizing available hardware in the most efficient way.

• Ultimately, speed, power and chip area are the most often used measures, making a strong link between the algorithms and technology of implementation.

Oklobdzija 2004 Computer Arithmetic 3

Basic Operations

• Addition

• Multiplication

• Multiply-Add

• Division

• Evaluation of Functions

• Multi-Media

Addition of Binary Numbers

Oklobdzija 2004 Computer Arithmetic 5

Addition of Binary NumbersFull Adder. The full adder is the fundamental building block of most arithmetic circuits:  

The sum and carry outputs are described as:

iiiiiiiiiiiiiiiiiii cbcabacbacbacbacbac 1

iiiiiiiiiiiii cbacbacbacbas

FullAdder

CinCout

si

ai bi

Oklobdzija 2004 Computer Arithmetic 6

Addition of Binary Numbers

Propagate

Propagate

Generate

Generate

Inputs Outputs

ci ai bi si ci+1

0 0 0 0 0

0 0 1 1 0

0 1 0 1 0

0 1 1 0 1

1 0 0 1 0

1 0 1 0 1

1 1 0 0 1

1 1 1 1 1

Oklobdzija 2004 Computer Arithmetic 7

Full-Adder Implementation

Full Adder operations is defined by equations:

iiiiiiiiiiiiiiiiii cpcbacbacbacbacbas

iiiiiiiiiiii cpgbacbacbac 1

One-bit adder could be implemented as shown

Carry-Propagate:and Carry-Generate gi

iii bap

iii bag cout c in

s i

a i b i

Oklobdzija 2004 Computer Arithmetic 8

High-Speed Addition

iii cps

iiii cpgc 1

One-bit adder could be implemented more efficiently

because MUX is faster

iii bap iii bag

0

1s

b ia i

cout

s i

c in

Oklobdzija 2004 Computer Arithmetic 9

The Ripple-Carry Adder

Oklobdzija 2004 Computer Arithmetic 10

The Ripple-Carry Adder

A0 B0

S0

Co,0Ci,0

A1 B1

S1

Co,1

A2 B2

S2

Co,2

A3 B3

S3

Co,3

(= Ci,1)FA FA FA FA

Worst case delay linear with the number of bits

tadder N 1– tcarry tsum+

td = O(N)

Goal: Make the fastest possible carry path circuit

From Rabaey

Oklobdzija 2004 Computer Arithmetic 11

Inversion Property

A B

S

CoCi FA

A B

S

CoCi FA

S A B Ci S A B Ci

=

Co A B Ci Co A B Ci

=

From Rabaey

Oklobdzija 2004 Computer Arithmetic 12

Minimize Critical Path by Reducing Inverting Stages

A0 B0

S0

Co,0Ci,0

A1 B1

S1

Co,1

A2 B2

S2

Co,2 Co,3FA’ FA’ FA’ FA’

A3 B3

S3

Odd CellEven Cell

Exploit Inversion Property

Note: need 2 different types of cellsFrom Rabaey

Oklobdzija 2004 Computer Arithmetic 13

Ripple Carry Adder

Carry-Chain of an RCA implemented using multiplexer from the standard cell library: a i+1 b i+1 a i b i

a i+2 b i+2

cout

c i+1 c i

s is i+1s i+2

c in

Critical Path

Oklobdzija, ISCAS’88

Oklobdzija 2004 Computer Arithmetic 14

Manchester Carry-Chain Realization of the Carry Path

• Simple and very popular scheme for implementation of carry signal path

V dd

Carry out Carry in

Propagatedevice

Predischarge& kill device

Generatedevice

++++++++

V ddV ddV ddV ddV ddV ddV dd

Oklobdzija 2004 Computer Arithmetic 15

Original DesignT. Kilburn, D. B. G. Edwards, D. Aspinall, "Parallel Addition in Digital Computers:

A New Fast "Carry" Circuit", Proceedings of IEE, Vol. 106, pt. B, p. 464, September 1959.

Oklobdzija 2004 Computer Arithmetic 16

Carry-Skip Adder

MacSorley, Proc IRE 1/61Lehman, Burla, IRE Trans on Comp, 12/61

Oklobdzija 2004 Computer Arithmetic 17

Carry-Skip Adder

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,3Co,2Co,1Co,0Ci ,0

FA FA FA FA

P0 G1 P0 G1 P2 G2 P3 G3

Co,2Co,1Co,0Ci,0

Co,3

Mul

tipl

exer

BP=PoP1P2P3

Idea: If (P0 and P1 and P2 and P3 = 1)then Co3 = C0, else “kill” or “generate”.

Bypass

From Rabaey

Oklobdzija 2004 Computer Arithmetic 18

Carry-Skip Adder: N-bits, k-bits/group, r=N/k groups

G r G r-1

...

SN-k-1S N-1

a N -1bN -1 b N -k-1a N -k-1

S(r-1)k-1 S (r-2)k

G 1G o

...

Sk

S2k-1

a 2k-1b 2k-1 b kak

Sk-1

S0

...

...a (r-1)k b(r-1)k a (r-1)kb (r-1)k

...a k-1 b k-1 a0 b 0

...

C in

... ... ... ... ... ... ... ...

P r-1P r-2 P 1 P 0

C out + + + +

A N D

O RO RO R O R

A N DA N DA N D

critica l pa th , de lay =2(k-1)+(N /2-2)

Oklobdzija 2004 Computer Arithmetic 19

Carry-Skip Adder

SKIPRCAd tN

tkt

2

212

N

tp

ripple adder

bypass adder

4..8

k

Oklobdzija 2004 Computer Arithmetic 20

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Oklobdzija 2004 Computer Arithmetic 21

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

G 0

... ...

a0 b

0

...

...

ai

bi

aN-1

bN-1

S j

P m -2

C inC out

C ou

t

G 2G m -2G m -1G m

G 0G 1G 2G m -2G m -1G m

S N-1S i

S 0

P 2P 0P m -1P m

.....

G 1

P 1

C in

.....

aj b

j

Carry signal path

skip ing

ripp ling

Oklobdzija 2004 Computer Arithmetic 22

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

1 13 34 4

5 56

=9

Any-point-to-any-point delay = 9 as compared to 12 for CSKA

Oklobdzija 2004 Computer Arithmetic 23

Delay Calculation for Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

P0

Ci,0

P1

G0

P2

G1

P3

G2

BP

G3

BP

Co,3

Delay model:

Oklobdzija 2004 Computer Arithmetic 24

Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Group Length

Oklobdzija, Barnes, Arith’85

321 cNcctd

Oklobdzija 2004 Computer Arithmetic 25

Carry-chain of a 32-bit Variable Block Adder(Oklobdzija, Barnes: IBM 1985)

Variable Block Lengths

• No closed form solution for delay• It is a dynamic programming problem

Oklobdzija 2004 Computer Arithmetic 26

Delay Comparison: Variable Block Adder

0

2

4

6

8

10

12

14

16

4 11 18 25 32 39 46 53 60

Size N

Del

ay

VBA- Multi-Level

CLA

VBA

VLSI ArithmeticLecture 4

Prof. Vojin G. Oklobdzija

University of California

http://www.ece.ucdavis.edu/acsel

Oklobdzija 2004 Computer Arithmetic 28

Carry-Lookahead Adder(Weinberger and Smith, 1958)

Ref: A. Weinberger and J. L. Smith, “A Logic for High-Speed Addition”, National Bureau of Standards, Circ. 591, p.3-12, 1958.

ARITH-13: Presenting Achievement Award to Arnold Weinberger of IBM (who invented CLA adder in 1958)

Oklobdzija 2004 Computer Arithmetic 29

CLA Definitions: One-bit adder

iii cps

iiii cpgc 1

iii bap iii bag

0

1s

b ia i

cout

s i

c in

Oklobdzija 2004 Computer Arithmetic 30

CLA Definitions: 4-bit Adderai bi

Ci

gi pi

ai+1 bi+1

Ci+1

gi+1 pi+1

ai+2 bi+2

Ci+2

gi+2 pi+2

ai+3 bi+3

Ci+3

gi+3 pi+3

Ci+4

1111

1111112 )(

cppgpg

cpgpgcpgc

iiiii

iiiiiiii

iiiiiiiiiiii cpgbacbacbac 1

Oklobdzija 2004 Computer Arithmetic 31

Carry-Lookahead Adder: 4-bitsai bi

Ci

gi pi

ai+1 bi+1

Ci+1

gi+1 pi+1

ai+2 bi+2

Ci+2

gi+2 pi+2

ai+3 bi+3

Ci+3

gi+3 pi+3

Ci+4

iiiiiiiiii

iiiiiiiiiiii

cpppgppgpg

cppgpgpgcpgc

1212122

111222223

)(

iiiiiiiiiiiiiii

iiiiiiiiiiii

cppppgpppgppgpg

gppgpgpgcpgc

123123123233

12122333334

)(

Gj Pj

Oklobdzija 2004 Computer Arithmetic 32

Carry-Lookahead Adderiiiiiiiiiij gpppgppgpgG 123123233

iiiij ppppP 123

jjjj cPGc )1(4

One gate delay to calculate p, g

One to calculateP and two for G

Three gate delaysTo calculate C4(j+1)

Compare that to 8 in RCA !

a i b i

Cin Cj

G jP j

a i+1 b i+1

g i+1p i+1 g i p i

a i+2 b i+2a i+3 b i+3

g i+1p i+1g i+1p i+1

C4(j+1)

C4j+1C4j+2C4j+3

P , G G roup

Oklobdzija 2004 Computer Arithmetic 33

Carry-Lookahead Adder(Weinberger and Smith)

  

iiiiiiiiiij GPPPGPPGPG 123123233*G

iiiij PPPPP 123*

jkkj cPGc 4)1(4 **

P j

G* P*

C 4j+1

G jP j+1G j+1P j+3G j+3P j+2G j+2

C4jC4(j+1)

C 4j+2C 4j+3

Additional two gate delays

C16 will take a total of 5 vs. 32 for RCA !

Oklobdzija 2004 Computer Arithmetic 34

32-bit Carry Lookahead Adder

C in

C out C in

C 4C 8C 12

C out

C 20C 24C 28

C in

C 16

a ib i

ind ividua l addersgenera ting: g i, p i,

and sum S i

C arry-lookahead b locks o f4-b its generating:

G i, P i, and C in fo r theadders

C arry-lookahead super- b locks o f4-b its b locks genera ting:

G * i, P * i, and C in fo r the 4-b itb locks

G roup producing fina lcarry C out and C 16

C ritica l pa th de lay = (fo r g i,p i)+2x2 (fo r G ,P )+3x2 (fo r C in)+1XO R - (fo r S um ) = appx. 12of de lay

Oklobdzija 2004 Computer Arithmetic 35

Carry-Lookahead Adder(Weinberger and Smith: original derivation, 1958 )

Oklobdzija 2004 Computer Arithmetic 36

Carry-Lookahead Adder(Weinberger and Smith: original derivation )

Oklobdzija 2004 Computer Arithmetic 37

Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !

Oklobdzija 2004 Computer Arithmetic 38

Carry-Lookahead Adder (Weinberger and Smith)please notice the similarity with Parallel-Prefix Adders !

Motorola: CLA Implementation Example

A. Naini, D. Bearden and W. Anderson, “A 4.5nS 96b CMOS Adder Design”,

Proceedings of the IEEE Custom Integrated Circuits Conference, May 3-6, 1992.

Oklobdzija 2004 Computer Arithmetic 40

Critical path in Motorola's 64-bit CLA

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3

... G5

2

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G

0

P,G

1:0

P,G

2:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G

60

P,G

61

:60

P,G

62

:60

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

1.05nS

1.7nS

2.0nS 2.35nS

2.7nS

3.75nS

4.8nS

Oklobdzija 2004 Computer Arithmetic 41

Motorola's 64-bit CLA

conventional PG Block

carry ripples locally5-transistors in the path

no better situation here !

Basically, this is MCC performance with Carry-Skip.One should not expect any better results than VBA.

Oklobdzija 2004 Computer Arithmetic 42

Motorola's 64-bit CLA

Modified PG Block

Intermediate propagate signals Pi:0 are generated to speed-up C3

still critical path resembles MCC

Oklobdzija 2004 Computer Arithmetic 43

Motorola's 64-bit CLA

1.8nS

2.2nS

2.9nS 3.2nS

3.55nS

3.9nS

Oklobdzija 2004 Computer Arithmetic 44

C ritica l pa th : A , B - G 0 - G 3:0 - G 15:0 - G 47:0 - C 48 - C 60 - C 63 - S 63

G4

P7

G0

P0

G1

P1

G2

P2

G3

P3

...

CARRYBLOCK

G8

P1

1

... G1

2

P1

5

... G1

6

P3

1

... G3

2

P4

7

... G4

8

P5

1

G6

0

P6

0

G6

1

P6

1

G6

2

P6

2

G6

3

P6

3... G

52

P5

5

... G5

6

P5

9

...

PG BLOCK

PG BLOCK

PG BLOCK

PG BLOCK

P,G0

P,G1

:0

P,G2

:0

G3

:0

P3

:0

G7

:4

P7

:4

G1

1:8

P1

1:8

G1

5:1

2

P1

5:1

2

G3

:0

P3

:0

G7

:0

P7

:0

G1

1:0

P1

1:0

G1

5:0

P1

5:0

G1

5:0

P1

5:0

G3

1:1

6

P3

1:1

6

G3

1:0

P3

1:0

G4

7:3

2

P4

7:3

2

G4

7:0

P4

7:0

G5

1:4

8

P5

1:4

8

G5

5:5

2

P5

5:5

2

G5

9:5

6

P5

9:5

6

C6

4

G5

1:4

8

P5

1:4

8

G5

5:4

8

P5

5:4

8

G5

9:4

8

P5

9:4

8

P,G6

0

P,G6

1:6

0

P,G6

2:6

0

G6

3:6

0

P6

3:6

0

G6

3:4

8

P6

3:4

8

G6

3:0

P6

3:0

C0

C4

C8

C1

2

C1

6

C3

2

C4

8

C1

6

C3

2

C4

8

C5

2

C5

6

C6

0

C6

3

PG BLOCK

C6

2

C6

1

1.05nS

1.7nS

2.0nS 2.35nS

2.7nS3.75nS

4.8nS

1.8nS

2.2nS

2.9nS 3.2nS

3.55nS

3.9nS