faster functional modules lessons taught from fp-adders guy even electrical engineering dept....

25
faster functional modules lessons taught from FP- ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Upload: samantha-morrison

Post on 17-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

faster functional moduleslessons taught from FP-ADDERs

Guy Even

Electrical Engineering Dept.

Tel-Aviv Univ.

Silicon Value Seminar (April 29, 2002)

Page 2: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

outline• FP-Adder: an example of a complicated

module

– brief overview

– focus on two sub-blocks

• Counting leading zeros – priority encoders

various design methods:

• divide & conquer

• parallel prefix computation

• redundant addition

• Adders:

– fast adders

– compound adders

Page 3: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Background

Faster clock rates require faster modules.

Example: Floating-Point Adders

• early designs: 50-60 logic levels.

• 15-20 gate levels per cycle 3-4 cycles!

• new designs: 25 gate levels 2 cycles.

How? better algorithms and faster sub-blocks…

Page 4: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Floating-Point Add• Algorithm: Why 50-60 logic

levels?

• Sub-Modules: List of sub-blocks.floating-point number: S-sign, E-exponent, F-mantissa

FES 2)1(

FbFF EbSbEaSaES 2)1(2)1(2)1(

floating-point addition:

input: (Sa,Ea,Fa) & (Sb,Eb,Fb)

output: (S,E,F) such that:

Page 5: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

FP-Add: naïve algorithm

Round:

SWAP operands

Align mantissa of smaller operand (shift right)

Compute sticky-bit

(OR of bits shifted outside)

Pre-process: Add/Sub:

Add/Sub mantissas

Convert sum

to sign & mag

abs(negative sum)

rounding decision

INC according to

rounding decision

Normalize sum

(shift left)

RESULT

Page 6: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

focus: normalization shift

Problem:

– LZ= number of leading zeros

– shift left by LZ positions unary example:

X[1:4]=0010

A[1:4]=0011Use a priority encoder!

two types of priority encoders:– unary

– binary

binary example:

X[1:4]=0010

Y[2:0]= 010

Page 7: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Unary PENC

otherwise.0

1][: if1][

jXijiA

– input: X[1:n]

– Output: A[1:n]

– functionality:

Simpler: ])[,],2[],1[(][ iXXXORiA

Implementation: what is the best design?

Delay = (log n) & Cost = (n).

Page 8: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Unary PENC – divide & conquer

delay: O(log n) is optimal O(log n) even if fan-out considered

cost: O(n log n) not optimal

OR(n/2)

U- PENC(n/2)

X[1:n/2]

n/2

U-PENC(n/2)

X[1+n/2:n]

n/2

OR-tree(n/2)

1

A[1+n/2:n]A[1:n/2]

linear fan-out

logarithmic delay

share OR-tree

slight reduction of cost

Page 9: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Unary PENC - improve

Parallel Prefix Computation (PPC) [FL,BK]!

])[,],2[],1[(][

])3[],2[],1[(]3[

])2[],1[(]2[

]1[]1[

nXXXORnA

XXXORA

XXORA

XA

Page 10: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Unary PENC = PPC(OR)

A[1] A[3]

X[3] X[4]X[1] X[2] X[n-1] X[n]

A[n-1]

OR OR

A[4]A[2] A[n]

OROR OR

U-PENC (n/2)

delay = O(log n)

cost = O(n)

Page 11: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

PPC - properties

A[1] A[3]

X[3] X[4]X[1] X[2] X[n-1] X[n]

A[n-1]

OR OR

A[4]A[2] A[n]

OROR OR

U-PENC (n/2)

Fan-out:

Logarithmic fan-out can be decreased to constant (cost still O(n)).

Layout:

O(n log n) area.

Same design as “Brent-Kung” adder.

Applicable for every associative operator.

Page 12: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC

i

i nXiY ]:1[ in zeros leading ofnumber 2][

– input: X[1:n] (n=2^k)

– Output: Y[k:0]

– functionality:

Relation to Unary PENC:

.])[1(2][1

n

ji

i jAiY

Implementation: what is the best design?

Delay = (log n) & Cost = (n).

Page 13: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC – simple & optimal

PPC (OR)

X[1:n]

encoder(n)

Y[k:0]

A[1:n]

])[,],1[],0[(][ iXXXORiA

diff(n)delay(diff(n)) = constant

delay(encoder(n)) = O(log n)

cost(diff(n)) = O(n)

cost(encoder(n)) = O(n)

Page 14: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC – with adder tree

PPC (OR)

X[1:n]

ADD-tree(n)

Y[k:0]

A[1:n]

problem:

adder(k) in tree O(log k) delay per adder

total delay is O(log n log k).

1

0])[1(2][

n

ji

i jAiY

])[,],1[],0[(][ iXXXORiA

Page 15: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Redundant addition

a3 a2 a1 a0

c3 c2 c1 c0

b3 b2 b1 b0

y3 y2 y1 y0

x3 x2 x1 x0

add columns in parallel using Full-Adders

Partial compression or (3:2)-addition:

delay is constant!

Tree structure enables (n:2)-addition

with O(log n) delay.

(n:2)-addition used in fast multipliers

Page 16: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC – O(log n) delay

Tree of Full-Adders:

delay of each full-adder is constant

depth is O(log n)

output is carry-save number

])[,],1[],0[(][ iXXXORiA

A[1:n]

PPC (OR)

X[1:n]

FA-tree(n)

2:1-Adder

Y[k:0]

2[k:0]

1

0])[1(2][

n

ji

i jAiY

Page 17: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC – divide & conquer

XL

1 2 n/2

XR

n/2+1 n

)(

)(

RR

LL

XPENCBinY

XPENCBinY

000 ifn/2

000 if0

LR

LL

XY

XYY

XL=00…0

YL[k-1]=1

Page 18: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Binary PENC – divide & conquer

+2(k-1)

(Half Adder)

Bin-PENC(n/2)

X[1:n/2]

k

Bin-PENC(n/2)

X[1+n/2:n]

k

1MUX(k)

k

k

Y[k:0]

YRYL

YL[k-1]

delay=constant

cost=O(log n)

delay=constant

cost=O(log n)fan-out=k

incurs O(log log n)

delay

bottom line:

delay = O(log n log logn )

cost = O(n)

initial analysis:

delay = O(log n)

cost = O(n)

Page 19: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

PENC – quick summary

designmethodcostdelay

U-PENCdiv & conquern log nlog n

PPCn

area=n log n

log n

Bin-PENCPPC+encodernlog n

PPC+Add_treenlog n

Div & Conquernlog n

log log n

Page 20: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

PENC - further issues

back to FP-Adder:

can we estimate LZ before subtracting?

100

111

000

00

10

01

must pre-process to avoid “catastrophic cancellation”!

method: partial compression (signed half-adders).

Page 21: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

focus – adderAvoid INC after rounding decision by

pre-computing increment.

a b

k

RESULT

Compound Adder

a+b a+b+1

MUXrounding decision

Page 22: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

PPC Adder [FL,BK]•computes carry bits C[n:1]

•sum bits satisfy: S[i]=XOR(A[i],B[i],C[i]).

•computation of carry bits C[n:1].

claim: pgppjiijiC ]:[:1]1[

example:

A[3:0]=0100

B[3:0]=1110

[3:0]=pgpk

C[4:1]=1100

2][][if

1][][if

0][][if

][

,,][

iBiAg

iBiAp

iBiAk

i

gpki

Define:

Page 23: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

PPC adder (cont.)

pgppjiij ]:[:

how to compute the event:

define an operator : {k,p,g} {k,p,g} {k,p,g} as follows:

g x = g

p x = p

k x = kclaim: is associative.

definition: [i] = [i] … [0].

claim: giiC ][1]1[

compute [i]

using PPC with

-gates!

Page 24: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Compound Adder [T]how to compute a+b & a+b+1?

– use 2 separate adders

– understand PPC adder

1

1 .

.

B[0]

A[0]

B[1]

A[1]

]1[

]1[

nB

nA

a+b+1

= (a+0.5)+(b+0.5)

recall: [i] = [i] … [0].

Now, for a+b+1: ’[i] = [i] … [0] g.

.][

][

][' 1]1['

ki

ggi

giiC

Therefore,

Page 25: Faster functional modules lessons taught from FP-ADDERs Guy Even Electrical Engineering Dept. Tel-Aviv Univ. Silicon Value Seminar (April 29, 2002)

Conclusion

• faster modules require clever designs

• starting point: gate count (for delay & cost)

• Must take fan-out & layout into account

• lots of methods: – divide & conquer

– parallel prefix computation

– redundant arithmetic