advanced arithmetic circuits

Digital System Design

ELEC3342

Advanced Arithmetic Circuits

Dr. Hayden So

Department of Electrical and

Electronic Engineering

http://www.eee.hku.hk/~elec3342

worst case delay = max(𝑑𝑒𝑙𝑎𝑦 𝑖𝑛!, 𝑜𝑢𝑡" ∀𝑗, 𝑘)

Speed of Ripple Carry Adder

n The performance of a circuit is determined by

its worst case delay from any input to any

output:

n The worst case delay of a n-bit ripple-carry adder is from cin<0> to cout<n-1>

• Or from cin<0> to s<n-1>

• Technology Dependent

n In general, delay for RCA is O(n) where n is

the number of bit

ELEC3342 - H. So 2

Why…?

Determining combinational delay

n Estimate delay in an arbitrary unit (u) by

tracing flow of signal in the circuit

n Begin with assumption that all input arrives at

exactly the same time @0

n Trace delay of each component with certain

assumption of the underlying technology, e.g.

• 2-input gate = 1u

• 3-input gate = 2u

• Invert = 1u

• Wire = no delay

ELEC3342 - H. So 3

Note: Exact values are highly

technology dependent, but the

analysis technique is the same.

Tracing Delay

ELEC3342 - H. So 4

FA

a b

cico

s

a<0> b<0>

s<0>

FA

a b

cico

s

a<1> b<1>

s<1>

FA

a b

cico

s

a<2> b<2>

s<2>

FA

a b

cico

s

a<3> b<3>

s<3>

CarryIn

Carryout

ci

s

a b

co

𝑠 = 𝑎 ⊕ 𝑏⊕ 𝑐𝑖

𝑐𝑜 = 𝑎 ⋅ 𝑏 + 𝑐𝑖(𝑎 + 𝑏)

0 0

0

0

1

2

1

2

3

0

0

Tracing Delay

ELEC3342 - H. So 5

FA

a b

cico

s

a<0> b<0>

s<0>

FA

a b

cico

s

a<1> b<1>

s<1>

FA

a b

cico

s

a<2> b<2>

s<2>

FA

a b

cico

s

a<3> b<3>

s<3>

CarryIn

Carryout

ci

s

a b

co

𝑠 = 𝑎 ⊕ 𝑏⊕ 𝑐𝑖

𝑐𝑜 = 𝑎 ⋅ 𝑏 + 𝑐𝑖(𝑎 + 𝑏)

0 0

T

T

1

T+1

1

T+2

T+2

𝑐𝑖 → 𝑠:+2𝑢

𝑐𝑖 → 𝑐𝑜:+2𝑢

0

0

T > 3u

Tracing Delay

ELEC3342 - H. So 6

0 0

𝑐𝑖 → 𝑠:+2𝑢

𝑐𝑖 → 𝑐𝑜:+2𝑢

0 00 00 0

FA

a b

cico

s

a<0> b<0>

s<0>

FA

a b

cico

s

a<1> b<1>

s<1>

FA

a b

cico

s

a<2> b<2>

s<2>

FA

a b

cico

s

a<3> b<3>

s<3>

Carry

In

Carry

out

0

2

7

579

5 39

Overall Delay: 𝑡#$%%&' = 𝑛 ⋅ 𝑡()

Improving Adder Performance

n RCA has a linear increase in delay

n Works well for small adder (small n), but

doesn’t scale well to wider adder

• E.g. Modern processor has 64 bit adders

• E.g. Many encryption/decryption schemes need

128 or even 1024 bit adders

ELEC3342 - H. So 7

Carry-Lookahead Addern Observation: Each bit to be added (a and b)is

known, but the carry in is unknown except the 1st one

n Divide a wide n-bit adder into multiple narrower k-bit adders (e.g. k=4)

n “Predict” carry out (Cout) for k-bit blocks using generate and propagate signals

n Some definitions:• Column i produces a carry out by either

generating a carry out or propagating a carry in to the carry out

ELEC3342 - H. So 8

Propagate and Generaten Observe what happens to cout:

• Generate: cout=1 regardless of cin

𝐺 = 𝑎 ⋅ 𝑏

• Propagate: cout=cin

𝑃 = 𝑎 ⊕ 𝑏

• Kill: cout=0 regardless of cin

𝐾 = 𝑎 ⋅ 𝑏

n P,G,K signals determined solely by the two addends

n Carry out of column 𝑖 (𝑐*) is related to that of column 𝑖 −1 (𝑐*+,) by:

𝑐* = 𝑎*𝑏* + 𝑎* + 𝑏* 𝑐*+,

ELEC3342 - H. So 9

a

(𝑎! )

b

(𝑏!)

cin

(𝑐!"# )

cout

(𝑐!)

s

(𝑠!)

K 0 0 0 0 0

K 0 0 1 0 1

P 0 1 0 0 1

P 0 1 1 1 0

P 1 0 0 0 1

P 1 0 1 1 0

G 1 1 0 1 0

G 1 1 1 1 1

𝑐$ = 𝐺$ + 𝑃$ ⋅ 𝑐$%&

Lookahead in a k-bit Blockn When considering k consecutive bits, observe that:

𝑐$ = 𝐺$ + 𝑃$ ⋅ 𝑐$%&𝑐$ = 𝐺$ + 𝑃$ ⋅ (𝐺$%& + 𝑃$%& ⋅ 𝑐$%')

𝑐$ = 𝐺$ + 𝑃$(𝐺$%& + 𝑃$%& 𝐺$%' + 𝑃$%' 𝐺$%( + 𝑃$%( ⋅ 𝑐$%) )

n Define group generate and group propagate for k-bit block:

• 𝐺!:# = 𝐺! + 𝑃! ⋅ (𝐺!$% + 𝑃!$% ⋅ (⋯ (𝐺#&% + 𝑃#&% ⋅ 𝐺#)⋯ )

• 𝑃!:# = P' ⋅ 𝑃!$% ⋅ 𝑃!$(⋯𝑃#

where 𝑗 = 𝑖 − 𝑘 + 1

n Then overall group carry becomes:

ELEC3342 - H. So 10

𝑐$ = 𝐺$:+ + 𝑃$:+ ⋅ 𝑐+%&

32-bit CLA with 4-bit Blocks

ELEC3342 - H. So 11

B0

++++

P3:0

G3

P3

G2

P2

G1

P1

G0

P3

P2

P1

P0

G3:0

Cin

Cout

A0

S0

C0

B1 A1

S1

C1

B2 A2

S2

C2

B3 A3

S3

Cin

A3:0B3:0

S3:0

4-bit CLA

BlockCin

A7:4B7:4

S7:4

4-bit CLA

Block

C3C7

A27:24B27:24

S27:24

4-bit CLA

Block

C23

A31:28B31:28

S31:28

4-bit CLA

Block

C27

Cout

CLA Delay

n For an n-bit CLA with k-bit blocks:• 𝑡!": delay to generate 𝑃# and 𝐺#• 𝑡!"_%&'():delay to generate 𝑃#:+ and 𝐺#:+• 𝑡,-._/0: delay of an AND_OR gate

ELEC3342 - H. So 12

𝑡,-. = 𝑘𝑡/. +𝑛

𝑘− 1 𝑡.01_34 + 𝑡56 + 𝑡56_789:;

Adder Delay Comparisonsn Compare delay of: 32-bit ripple-carry and

carry-lookahead (CLA) adders

n Example:

• CLA has 4-bit blocks

• 2-input gate delay = 100 ps; full adder delay = 300 ps

ELEC3342 - H. So 13

𝑡<$558= = 𝑛 ⋅ 𝑡/. = 32 300 ps = 9.6 ns

𝑡,-. = 𝑡56 + 𝑡56_789:; +>;%& ?!"#_%& + 𝑘𝑡/. = 3.3ns

Wider CLAn In the previous version of CLA, there is still a carry

chain that can be quite long for wide CLA• A 128 bit adder would have 32 4-bit blocks

n Possible to build hierarchical carry look ahead adders

n Example:• Treat each 4-bit block as a single unit

• Build carry look ahead logic for 4 4-bit blocks to form 16 bit adder

ELEC3342 - H. So 14

16 bit CLA with 2 levels of 4-bit

blocks

ELEC3342 - H. So 15

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

g p a b c

s

+

4-bit Carry Lookahead Generator

4-bit Carry Lookahead

Generatorg3 p3c3 g2p2 c2 g1p1 c1 g0 p0

c0

a15b15 a14b14 a13b13 a12b12



c0

a11b11 a10b10 a9 b9 a8 b8



c0

a7 b7 a6 b6 a5 b5 a4 b4



c0

a3 b3 a2 b2 a1 b1 a0 b0

g3 p3 c3 g2p2 c2 g1p1 c1 g0 p0

c0

g3 p3c3

c0

Adder in VHDL

ELEC3342 - H. So 16

library ieee;

use ieee.numeric_std.all;

architecture rtl of adder is

signal a: unsigned(31 downto 0);

signal b: unsigned(31 downto 0);

signal y: unsigned(31 downto 0);begin

y <= a + b;

end rtl;Synthesis tools synthesize

actual adder.

Need to use numeric_std

for signals that need

arithmetic operations with

unsigned type.

Subtraction in VHDL

ELEC3342 - H. So 17

library ieee;

use ieee.numeric_std.all;

architecture rtl of adder is

signal a: unsigned(31 downto 0);

signal b: unsigned(31 downto 0);

signal y: unsigned(31 downto 0);

begin

y <= a - b;

end rtl;

Math Operations in VHDLn Many basic math operations are defined in

numeric_std and are synthesizable:• Operations: +,−,×,÷

• Comparisons: >, <, =

• etc

n Relies on synthesis tools to generate the actual hardware implementation• Qualities can vary (area, power, performance, etc)

n Good for basic integer operations involving signed and unsigned numbers

n Need more manual design if specific architecture needed• floating point, fixed point, etc.

ELEC3342 - H. So 18

More research opportunities

advanced arithmetic circuits

Documents