1 design of a parallel-prefix adder architecture with efficient timing-area tradeoff characteristic...

21
1 Design of a Parallel- Design of a Parallel- Prefix Adder Prefix Adder Architecture with Architecture with Efficient Timing-Area Efficient Timing-Area Tradeoff Characteristic Tradeoff Characteristic Sabyasachi Das Sabyasachi Das University of Colorado, Boulder University of Colorado, Boulder Sunil P. Khatri Sunil P. Khatri Texas A&M University Texas A&M University

Post on 20-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

1

Design of a Parallel-Prefix Design of a Parallel-Prefix Adder Architecture with Adder Architecture with

Efficient Timing-Area Efficient Timing-Area Tradeoff CharacteristicTradeoff Characteristic

Sabyasachi DasSabyasachi DasUniversity of Colorado, BoulderUniversity of Colorado, Boulder

Sunil P. KhatriSunil P. KhatriTexas A&M UniversityTexas A&M University

2

What is an Adder?What is an Adder?

IC block that performs addition of 2 data IC block that performs addition of 2 data signals signals

Well-known logic architecturesWell-known logic architectures

Often part of other arithmetic components, Often part of other arithmetic components, like Sum-of-Products, Multiplier etc.like Sum-of-Products, Multiplier etc.

Computationally-intensive and occupies large Computationally-intensive and occupies large areaarea

Wide usage in almost all digital designsWide usage in almost all digital designs

3

Overview of an adderOverview of an adder

aa77 a a6 6 a a55 a a4 4 aa33 a a2 2 a a11 a a00

bb7 7 b b6 6 b b55 b b4 4 bb33 b b2 2 b b11 b b00

__________________________________________________________

SS8 8 S S77 S S6 6 S S55 S S4 4 SS33 S S2 2 S S11 S S00

For each bit (i = 0 to (n-1))For each bit (i = 0 to (n-1)) SSi i = a= aii b bii Carry Carryii

CarryCarryi+1i+1 = (a = (ai i bbi i )) (b(bii Carry Carryii) ) (Carry(Carryi i aai i ))

4

Introduction to Parallel-Introduction to Parallel-Prefix AdderPrefix Adder

Fast family of addersFast family of adders

Computes CarryComputes Carryii for each bit i in a tree for each bit i in a tree structurestructure

Several different flavors are availableSeveral different flavors are available

Brent-Kung and Kogge-Stone are very Brent-Kung and Kogge-Stone are very popularpopular

5

Generate and Propagate Generate and Propagate for a Bitfor a Bit

For each bit i of the adder, For each bit i of the adder, Generate (GGenerate (Gii)) indicates indicates whether a carry is generated from that bitwhether a carry is generated from that bit GGii = a = ai i bbi i

For each bit i of the adder, For each bit i of the adder, Propagate (PPropagate (Pii)) indicates whether a carry is propagated through indicates whether a carry is propagated through that bitthat bit PPii = a = ai i bbi i

Generate and Propagate concept is extendable to Generate and Propagate concept is extendable to blocks comprising multiple bitsblocks comprising multiple bits

6

Generate and Propagate for Generate and Propagate for BlocksBlocks

If two blocks (comprising one or more If two blocks (comprising one or more bits) have the GP value-pairs as (Gbits) have the GP value-pairs as (Gleftleft, P, Pleftleft) ) and (Gand (Grightright, P, Prightright), then the combined block ), then the combined block has the GP values as follows:has the GP values as follows: GGleft, rightleft, right = G = Gleft left (P (Pleftleft G Grightright))

PPleft, rightleft, right = P = Pleft left P Prightright

This operation is performed by a This operation is performed by a

carry-operator or carry-operator or o-operatoro-operator..

(Gleft, Pleft)(Gright, Pright )

(Gleft, right, Pleft, right )

7

Kogge-Stone (KS) AdderKogge-Stone (KS) Adder

Parallel prefix, fast architecture: logParallel prefix, fast architecture: log22n levelsn levels

Requires large area: (n*logRequires large area: (n*log22n-n+1) cellsn-n+1) cells

GP3 GP2 GP1GP0GP7 GP6 GP5 GP4

C4 C3 C2C8 C7 C6 C5 C1

Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

8

Brent-Kung (BK) AdderBrent-Kung (BK) Adder

Parallel prefix architecture: (2*logParallel prefix architecture: (2*log22n-2) levelsn-2) levels

Optimized for area: (2n-2-logOptimized for area: (2n-2-log22n) cellsn) cells

GP3 GP2 GP1GP0GP7 GP6 GP5 GP4

C4 C3 C2C8 C7 C6 C5 C1

Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

9

Our Proposed ApproachOur Proposed Approach

Use 2-input XOR and Use 2-input XOR and AND gates to compute AND gates to compute GGii and P and Pii values values

Use triple-carry Use triple-carry operator in parallel-operator in parallel-prefix tree to compute prefix tree to compute CarryCarryii values values

Use PUse Pii and Carry and Carryii to to compute final Sumcompute final Sumii values.values.

G and P Generator (for each bit)

Parallel-Prefix Treeusing Triple-Carry

operator

Computation of Final Sum values

2 Inputs

Outputs

10

Generate and Propagate Generate and Propagate for a Bitfor a Bit

In our approach, we use the traditional way of In our approach, we use the traditional way of computing the computing the Generate (GGenerate (Gii)) and and Propagate (PPropagate (Pii)) for each bit.for each bit. GGii = a = ai i bbi i

PPii = a = ai i bbi i

If GIf Gii is equal to 1, that indicates a Carry is equal to 1, that indicates a Carryi+1i+1 signal signal equal to 1’b1 (logic-1) is generated from the iequal to 1’b1 (logic-1) is generated from the ithth bitbit

If PIf Pii is equal to 1, that indicates the Carry is equal to 1, that indicates the Carryii gets gets fed to the Carryfed to the Carryi+1i+1 signal signal

11

Triple-Carry OperatorTriple-Carry Operator If three blocks (or bits) have the GP value-pairs If three blocks (or bits) have the GP value-pairs

as as

(G(Gleftleft, P, Pleftleft), (G), (Gmidmid, P, Pmidmid) and (G) and (Grightright, P, Prightright), then the ), then the combined block generates a Carry only ifcombined block generates a Carry only if Left block generates a Carry ORLeft block generates a Carry OR Middle block generates a Carry and Left block Middle block generates a Carry and Left block

propagates that ORpropagates that OR Right block generates a Carry and both Middle and Right block generates a Carry and both Middle and

Left blocks propagate that Carry. Left blocks propagate that Carry.

The combined block propagates only ifThe combined block propagates only if Each of the three blocks propagates the input Each of the three blocks propagates the input

Carry.Carry.

12

Triple-Carry OperatorTriple-Carry Operator

If three blocks (consisting of one or more bits) If three blocks (consisting of one or more bits) have the GP value-pairs as (Ghave the GP value-pairs as (Gleftleft, P, Pleftleft), (G), (Gmidmid, , PPmidmid) and (G) and (Grightright, P, Prightright), then the combined block ), then the combined block has the GP values as follows:has the GP values as follows: GGleft, rightleft, right = G = Gleftleft (P (Pleftleft G Gmidmid) (P) (Pleftleft P Pmidmid

GGrightright))

PPleft, rightleft, right = P = Pleft left P Pmid mid PPrightright

This operation is performed by a This operation is performed by a triple-carry triple-carry operatoroperator or or o3-operatoro3-operator..

13

Triple-Carry OperatorTriple-Carry Operator

Typically, delay of a triple-carry operator is about Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-110% to 130% of the delay of a traditional carry-operator.operator.

Typically, area of a triple-carry operator is about Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-150% to 180% of the area of a traditional carry-operator.operator.

(Gmid, Pmid)(Gright, Pright )

(Gleft, right, Pleft, right)

(Gleft, Pleft)

14

Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork

In the 1In the 1stst level (or topmost level) of the parallel- level (or topmost level) of the parallel-prefix tree network, we use maximum number of prefix tree network, we use maximum number of triple-carry operators to combine groups of three triple-carry operators to combine groups of three GPGP3k3k, GP, GP3k+13k+1 and GP and GP3k+2 3k+2 (k starts from zero)(k starts from zero)

In the quadrant closest to LSB, we use the In the quadrant closest to LSB, we use the traditional carry-operator exclusively.traditional carry-operator exclusively.

In the quadrant closest to MSB, our proposed In the quadrant closest to MSB, our proposed triple-carry operator extensively.triple-carry operator extensively.

In the middle two quadrants, we use both carry-In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-operator and triple-carry operator in a timing-driven fashion.driven fashion.

We restrict the fanout of each operator to 5We restrict the fanout of each operator to 5

15

Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork

Critical path primarily goes through the bits near MSBCritical path primarily goes through the bits near MSB We instantiate more triple-carry operators along the critical path and We instantiate more triple-carry operators along the critical path and

bits near MSB.bits near MSB. This reduces the depth along the critical path of the parallel-prefix This reduces the depth along the critical path of the parallel-prefix

computation tree.computation tree. The delay of o3 operator is about 110%-130% of delay of o operator.The delay of o3 operator is about 110%-130% of delay of o operator.

Bits near LSB are typically less critical and has less Bits near LSB are typically less critical and has less depthdepth We instantiate more traditional carry operators in the bits near LSB.We instantiate more traditional carry operators in the bits near LSB. This saves area occupied by the parallel-prefix computation tree.This saves area occupied by the parallel-prefix computation tree. The area of o3 operator is about 150%-180% of area of o operator.The area of o3 operator is about 150%-180% of area of o operator.

16

Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork

For an example of the 24-bit adder, please For an example of the 24-bit adder, please refer to the paper.refer to the paper.

GP3 GP2 GP1GP0GP7 GP6 GP5

GP

4

C4 C3 C2C8 C7 C6 C5 C1

GP11GP10GP9GP8GP15GP14GP13

GP

12

C12 C11 C10C16 C15 C14 C13 C9

17

Computation of Final Computation of Final Sum ValuesSum Values

At the output of the parallel-prefix computation At the output of the parallel-prefix computation tree, Gtree, Gi, 0i, 0 and P and Pi, 0i, 0 (for each bit i) values are (for each bit i) values are produced. produced.

By definition, if GBy definition, if Gi, 0i, 0 is equal to 1’b1 (logic-1), then is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1)a carry gets fed to the (i+1)thth bit. Hence, bit. Hence, CarryCarryi+1i+1 = G = Gi, 0i, 0

SumSumi+1i+1 is computed by using the following is computed by using the following equationequation SumSumi+1i+1 = P = Pi+1 i+1 CarryCarryi+1i+1

= P= Pi+1 i+1 GGi, 0i, 0

18

Delay ResultsDelay Results

On an average, Our approach produces about 23% faster adder than BK adder and

about 0.5% faster than KS adder

0

500

1000

1500

2000

2500

3000

3500

Adder-16 Adder-24 Adder-32 Adder-48 Adder-64

Type of the Adder Block

Wo

rst

Dela

y

of

th

e

Ad

der

Blo

ck

Worst Delay of the BK Adder Worst Delay of the KS Adder Worst Delay of our Proposed Adder

19

Area ResultsArea Results

On an average, Our approach produces about 9% larger adder than BK adder and

about 30% smaller than KS adder

0

5000

10000

15000

20000

25000

Adder-16 Adder-24 Adder-32 Adder-48 Adder-64

Type of the Adder Block

Are

a

of

th

e

Ad

der

Blo

ck

Area of the BK Adder Area of the KS Adder Area of our Proposed Adder

20

SummarySummary

Triple-carry operator combines GP values of 3 blocksTriple-carry operator combines GP values of 3 blocks

Use triple-carry operator in the parallel-prefix Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-pathcomputation tree to reduce delay of the critical-path

Use traditional carry-operator in non timing-critical Use traditional carry-operator in non timing-critical path to reduce the overall areapath to reduce the overall area

Our approach is 0.5% faster than KS and 23% faster Our approach is 0.5% faster than KS and 23% faster than BKthan BK

Our approach is 29% smaller than KS and 9% larger Our approach is 29% smaller than KS and 9% larger than BKthan BK

21

Thank youThank you