1 design of a parallel-prefix adder architecture with efficient timing-area tradeoff characteristic...
Post on 20-Dec-2015
221 views
TRANSCRIPT
1
Design of a Parallel-Prefix Design of a Parallel-Prefix Adder Architecture with Adder Architecture with
Efficient Timing-Area Efficient Timing-Area Tradeoff CharacteristicTradeoff Characteristic
Sabyasachi DasSabyasachi DasUniversity of Colorado, BoulderUniversity of Colorado, Boulder
Sunil P. KhatriSunil P. KhatriTexas A&M UniversityTexas A&M University
2
What is an Adder?What is an Adder?
IC block that performs addition of 2 data IC block that performs addition of 2 data signals signals
Well-known logic architecturesWell-known logic architectures
Often part of other arithmetic components, Often part of other arithmetic components, like Sum-of-Products, Multiplier etc.like Sum-of-Products, Multiplier etc.
Computationally-intensive and occupies large Computationally-intensive and occupies large areaarea
Wide usage in almost all digital designsWide usage in almost all digital designs
3
Overview of an adderOverview of an adder
aa77 a a6 6 a a55 a a4 4 aa33 a a2 2 a a11 a a00
bb7 7 b b6 6 b b55 b b4 4 bb33 b b2 2 b b11 b b00
__________________________________________________________
SS8 8 S S77 S S6 6 S S55 S S4 4 SS33 S S2 2 S S11 S S00
For each bit (i = 0 to (n-1))For each bit (i = 0 to (n-1)) SSi i = a= aii b bii Carry Carryii
CarryCarryi+1i+1 = (a = (ai i bbi i )) (b(bii Carry Carryii) ) (Carry(Carryi i aai i ))
4
Introduction to Parallel-Introduction to Parallel-Prefix AdderPrefix Adder
Fast family of addersFast family of adders
Computes CarryComputes Carryii for each bit i in a tree for each bit i in a tree structurestructure
Several different flavors are availableSeveral different flavors are available
Brent-Kung and Kogge-Stone are very Brent-Kung and Kogge-Stone are very popularpopular
5
Generate and Propagate Generate and Propagate for a Bitfor a Bit
For each bit i of the adder, For each bit i of the adder, Generate (GGenerate (Gii)) indicates indicates whether a carry is generated from that bitwhether a carry is generated from that bit GGii = a = ai i bbi i
For each bit i of the adder, For each bit i of the adder, Propagate (PPropagate (Pii)) indicates whether a carry is propagated through indicates whether a carry is propagated through that bitthat bit PPii = a = ai i bbi i
Generate and Propagate concept is extendable to Generate and Propagate concept is extendable to blocks comprising multiple bitsblocks comprising multiple bits
6
Generate and Propagate for Generate and Propagate for BlocksBlocks
If two blocks (comprising one or more If two blocks (comprising one or more bits) have the GP value-pairs as (Gbits) have the GP value-pairs as (Gleftleft, P, Pleftleft) ) and (Gand (Grightright, P, Prightright), then the combined block ), then the combined block has the GP values as follows:has the GP values as follows: GGleft, rightleft, right = G = Gleft left (P (Pleftleft G Grightright))
PPleft, rightleft, right = P = Pleft left P Prightright
This operation is performed by a This operation is performed by a
carry-operator or carry-operator or o-operatoro-operator..
(Gleft, Pleft)(Gright, Pright )
(Gleft, right, Pleft, right )
7
Kogge-Stone (KS) AdderKogge-Stone (KS) Adder
Parallel prefix, fast architecture: logParallel prefix, fast architecture: log22n levelsn levels
Requires large area: (n*logRequires large area: (n*log22n-n+1) cellsn-n+1) cells
GP3 GP2 GP1GP0GP7 GP6 GP5 GP4
C4 C3 C2C8 C7 C6 C5 C1
Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973
8
Brent-Kung (BK) AdderBrent-Kung (BK) Adder
Parallel prefix architecture: (2*logParallel prefix architecture: (2*log22n-2) levelsn-2) levels
Optimized for area: (2n-2-logOptimized for area: (2n-2-log22n) cellsn) cells
GP3 GP2 GP1GP0GP7 GP6 GP5 GP4
C4 C3 C2C8 C7 C6 C5 C1
Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982
9
Our Proposed ApproachOur Proposed Approach
Use 2-input XOR and Use 2-input XOR and AND gates to compute AND gates to compute GGii and P and Pii values values
Use triple-carry Use triple-carry operator in parallel-operator in parallel-prefix tree to compute prefix tree to compute CarryCarryii values values
Use PUse Pii and Carry and Carryii to to compute final Sumcompute final Sumii values.values.
G and P Generator (for each bit)
Parallel-Prefix Treeusing Triple-Carry
operator
Computation of Final Sum values
2 Inputs
Outputs
10
Generate and Propagate Generate and Propagate for a Bitfor a Bit
In our approach, we use the traditional way of In our approach, we use the traditional way of computing the computing the Generate (GGenerate (Gii)) and and Propagate (PPropagate (Pii)) for each bit.for each bit. GGii = a = ai i bbi i
PPii = a = ai i bbi i
If GIf Gii is equal to 1, that indicates a Carry is equal to 1, that indicates a Carryi+1i+1 signal signal equal to 1’b1 (logic-1) is generated from the iequal to 1’b1 (logic-1) is generated from the ithth bitbit
If PIf Pii is equal to 1, that indicates the Carry is equal to 1, that indicates the Carryii gets gets fed to the Carryfed to the Carryi+1i+1 signal signal
11
Triple-Carry OperatorTriple-Carry Operator If three blocks (or bits) have the GP value-pairs If three blocks (or bits) have the GP value-pairs
as as
(G(Gleftleft, P, Pleftleft), (G), (Gmidmid, P, Pmidmid) and (G) and (Grightright, P, Prightright), then the ), then the combined block generates a Carry only ifcombined block generates a Carry only if Left block generates a Carry ORLeft block generates a Carry OR Middle block generates a Carry and Left block Middle block generates a Carry and Left block
propagates that ORpropagates that OR Right block generates a Carry and both Middle and Right block generates a Carry and both Middle and
Left blocks propagate that Carry. Left blocks propagate that Carry.
The combined block propagates only ifThe combined block propagates only if Each of the three blocks propagates the input Each of the three blocks propagates the input
Carry.Carry.
12
Triple-Carry OperatorTriple-Carry Operator
If three blocks (consisting of one or more bits) If three blocks (consisting of one or more bits) have the GP value-pairs as (Ghave the GP value-pairs as (Gleftleft, P, Pleftleft), (G), (Gmidmid, , PPmidmid) and (G) and (Grightright, P, Prightright), then the combined block ), then the combined block has the GP values as follows:has the GP values as follows: GGleft, rightleft, right = G = Gleftleft (P (Pleftleft G Gmidmid) (P) (Pleftleft P Pmidmid
GGrightright))
PPleft, rightleft, right = P = Pleft left P Pmid mid PPrightright
This operation is performed by a This operation is performed by a triple-carry triple-carry operatoroperator or or o3-operatoro3-operator..
13
Triple-Carry OperatorTriple-Carry Operator
Typically, delay of a triple-carry operator is about Typically, delay of a triple-carry operator is about 110% to 130% of the delay of a traditional carry-110% to 130% of the delay of a traditional carry-operator.operator.
Typically, area of a triple-carry operator is about Typically, area of a triple-carry operator is about 150% to 180% of the area of a traditional carry-150% to 180% of the area of a traditional carry-operator.operator.
(Gmid, Pmid)(Gright, Pright )
(Gleft, right, Pleft, right)
(Gleft, Pleft)
14
Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork
In the 1In the 1stst level (or topmost level) of the parallel- level (or topmost level) of the parallel-prefix tree network, we use maximum number of prefix tree network, we use maximum number of triple-carry operators to combine groups of three triple-carry operators to combine groups of three GPGP3k3k, GP, GP3k+13k+1 and GP and GP3k+2 3k+2 (k starts from zero)(k starts from zero)
In the quadrant closest to LSB, we use the In the quadrant closest to LSB, we use the traditional carry-operator exclusively.traditional carry-operator exclusively.
In the quadrant closest to MSB, our proposed In the quadrant closest to MSB, our proposed triple-carry operator extensively.triple-carry operator extensively.
In the middle two quadrants, we use both carry-In the middle two quadrants, we use both carry-operator and triple-carry operator in a timing-operator and triple-carry operator in a timing-driven fashion.driven fashion.
We restrict the fanout of each operator to 5We restrict the fanout of each operator to 5
15
Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork
Critical path primarily goes through the bits near MSBCritical path primarily goes through the bits near MSB We instantiate more triple-carry operators along the critical path and We instantiate more triple-carry operators along the critical path and
bits near MSB.bits near MSB. This reduces the depth along the critical path of the parallel-prefix This reduces the depth along the critical path of the parallel-prefix
computation tree.computation tree. The delay of o3 operator is about 110%-130% of delay of o operator.The delay of o3 operator is about 110%-130% of delay of o operator.
Bits near LSB are typically less critical and has less Bits near LSB are typically less critical and has less depthdepth We instantiate more traditional carry operators in the bits near LSB.We instantiate more traditional carry operators in the bits near LSB. This saves area occupied by the parallel-prefix computation tree.This saves area occupied by the parallel-prefix computation tree. The area of o3 operator is about 150%-180% of area of o operator.The area of o3 operator is about 150%-180% of area of o operator.
16
Proposed Parallel-Prefix Proposed Parallel-Prefix NetworkNetwork
For an example of the 24-bit adder, please For an example of the 24-bit adder, please refer to the paper.refer to the paper.
GP3 GP2 GP1GP0GP7 GP6 GP5
GP
4
C4 C3 C2C8 C7 C6 C5 C1
GP11GP10GP9GP8GP15GP14GP13
GP
12
C12 C11 C10C16 C15 C14 C13 C9
17
Computation of Final Computation of Final Sum ValuesSum Values
At the output of the parallel-prefix computation At the output of the parallel-prefix computation tree, Gtree, Gi, 0i, 0 and P and Pi, 0i, 0 (for each bit i) values are (for each bit i) values are produced. produced.
By definition, if GBy definition, if Gi, 0i, 0 is equal to 1’b1 (logic-1), then is equal to 1’b1 (logic-1), then a carry gets fed to the (i+1)a carry gets fed to the (i+1)thth bit. Hence, bit. Hence, CarryCarryi+1i+1 = G = Gi, 0i, 0
SumSumi+1i+1 is computed by using the following is computed by using the following equationequation SumSumi+1i+1 = P = Pi+1 i+1 CarryCarryi+1i+1
= P= Pi+1 i+1 GGi, 0i, 0
18
Delay ResultsDelay Results
On an average, Our approach produces about 23% faster adder than BK adder and
about 0.5% faster than KS adder
0
500
1000
1500
2000
2500
3000
3500
Adder-16 Adder-24 Adder-32 Adder-48 Adder-64
Type of the Adder Block
Wo
rst
Dela
y
of
th
e
Ad
der
Blo
ck
Worst Delay of the BK Adder Worst Delay of the KS Adder Worst Delay of our Proposed Adder
19
Area ResultsArea Results
On an average, Our approach produces about 9% larger adder than BK adder and
about 30% smaller than KS adder
0
5000
10000
15000
20000
25000
Adder-16 Adder-24 Adder-32 Adder-48 Adder-64
Type of the Adder Block
Are
a
of
th
e
Ad
der
Blo
ck
Area of the BK Adder Area of the KS Adder Area of our Proposed Adder
20
SummarySummary
Triple-carry operator combines GP values of 3 blocksTriple-carry operator combines GP values of 3 blocks
Use triple-carry operator in the parallel-prefix Use triple-carry operator in the parallel-prefix computation tree to reduce delay of the critical-pathcomputation tree to reduce delay of the critical-path
Use traditional carry-operator in non timing-critical Use traditional carry-operator in non timing-critical path to reduce the overall areapath to reduce the overall area
Our approach is 0.5% faster than KS and 23% faster Our approach is 0.5% faster than KS and 23% faster than BKthan BK
Our approach is 29% smaller than KS and 9% larger Our approach is 29% smaller than KS and 9% larger than BKthan BK