circuit design for unsigned addition eecs 150 - components …cs150/fa04/lecture/lec18.… · ·...

EECS 150 - Components and Design Techniques for Digital Systems

Lec 18 – Arithmetic II (Multiplication)

David CullerElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs150

Review• Circuit design for unsigned addition

– Full adder per bit slice– Delay limited by Carry Propagation

» Ripple is algorithmically slow, but wires are short

• Carry select– Simple, resource-intensive– Excellent layout

• Carry look-ahead– Excellent asymptotic behavior– Great at the board level, but wire length effects are significant on chip

• Digital number systems– How to represent negative numbers– Simple operations– Clean algorithmic properties

• 2s complement is most widely used– Circuit for unsigned arithmetic– Subtract by complement and carry in– Overflow when cin xor cout of sign-bit is 1

Computer Number Systems

• Positional notation– Dn-1 Dn-2 …D0 represents Dn-1Bn-1 + Dn-2Bn-2 + …+ D0 B0

where Di ∈∈∈∈ { 0, …, B-1 }

• 2s Complement– Dn-1 Dn-2 …D0 represents: - Dn-12n-1 + Dn-22n-2 + …+ D0 20

– MSB has negative weight

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

11101111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

-8 + 5

2s Complement Overflow

��

��

5 + 3 = -8! -7 - 2 = +7!

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

11101111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

0000

0001

0010

0011

1000

0101

0110

0100

1001

1010

1011

1100

1101

0111

11101111

+0

+1

+2

+3

+4

+5

+6

+7-8

-7

-6

-5

-4

-3

-2

-1

How can you tell an overflow occurred?

2s comp. Overflow Detection

�

�

��

��

��

��

��

��

�

��

��

��

�

�

�

��

��

��

��

��

��

��

��

��

� ��

� ��

��

2s Complement Adder/Subtractor

�� !�"�� #� ��!�� !��

A B

CO

S

+ CI

A B

CO

S

+ CI

A B

CO

S

+ CI

A B

CO

S

+ CI

0 1

Add/Subtract

A 3 B 3 B 3

0 1

A 2 B 2 B 2

0 1

A 1 B 1 B 1

0 1

A 0 B 0 B 0

Sel Sel Sel Sel

S 3 S 2 S 1 S 0

Overflow

Adders on the Xilinx Virtex• Dedicated carry logic

provides fast arithmetic carry capability for high-speed arithmetic functions. The Virtex-E CLB supports two separate carry chains, one per Slice. The height of the carry chains is two bits per CLB.

• The arithmetic logic includes an XOR gate and AND gate that allows a 2-bit full adder to be implemented within a slice.

• Cin to Cout delay = 0.1ns, versus 0.4ns for F to X delay. How do we map a 2-bit adder to one slice?

Carry Look-ahead Adders

• In general, for n-bit addition best we can achieve is

delay αααα log(n)

• How do we arrange this? (think trees)• First, reformulate basic adder stage:

carry “kill”ki = ai’ bi’

carry “propagate”pi = ai ⊕ bi

carry “generate”gi = ai bi

ci+1 = gi + picisi = pi ⊕ ci

0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

a b ci ci+1 s

Carry Look-ahead Adders – in blocks

• “Group” propagate and generate signals:

• P true if the group as a whole propagates a carry to cout

• G true if the group as a whole generates a carry

• Group P and G can be generated hierarchically.

pi

gi

pi+1

gi+1

pi+k

gi+k

P = pi pi+1 … pi+kG = gi+k + pi+kgi+k-1 + … + (pi+1pi+2 … pi+k)gi

cin

cout

Cout = G + PCin

Carry Look-ahead Adders

a0b0a1b1a2b2

a

a3b3a4b4a5b5

b

c3 = Ga + Pac0

Pa

Ga

Pb

Gb

a6b6a7b7a8b8

c

c6 = Gb + Pbc3

Pc

Gc

P = PaPbPc

G = Gc + PcGb + PbPcGa

c9 = G + Pc0

c0

9-bit Example of hierarchically generated P and G signals:

Parallel Prefix (generalizing CLA)

• Compute all the prefixes Fi = Fi-1 op Fi-2 op … op F0• Assume associative and commutative

B ABA

BAx

BAxAx

01234567

76 54 32 10

10

10

32

32

54

54

76

76

6 4 2 0

74 30

3074

54 10

70

30

54 10

74 64 30 20

30

30

70 60 50 40

c0

a0b0s0

a1b1s1

c1

a2b2s2

a3b3s3

c3

c2

c0

c0

a4b4s4

a5b5s5

c5

a6b6s6

a7b7s7

c7

c6

c0

c4

c0

c8

p,g

P,G

P,G

cin

cout

P,GPa,Ga

Pb,Gb

P = PaPbG = Gb + GaPb

Cout = G + cinP

aibisi

p,g

ci

ci+1

p = a ⊕ bg = ab

s = p ⊕ ci

ci+1 = g + cip

8-bit Carry Look-ahead Adder

Time / Space (resource) Trade-offs

• Carry select and CLA utilize more silicon to reduce time.

• Can we use more time to reduce silicon?

• How few FAs does it take to do addition?

Bit-serial Adder

• Addition of 2 n-bit numbers:– takes n clock cycles,– uses 1 FF, 1 FA cell, plus registers– the bit streams may come from or go to other circuits, therefore

the registers may be optional.

• Requires controller– What does the FSM look like? Implemented?

• Final carry out?

• A, B, and R held in shift-registers. Shift right once per clock cycle.

• Reset is asserted by controller.

n-bit shift register

n-bit shift registers

sc

reset

R

FAFF

B

A

lsb

Announcements• Reading: 5.8• Regrades in with homework on Friday• Digital Design in the news – from UCB

– Organic e-textiles (Prof. Vivek Subramanian)

Basic concept of multiplication

� ��$��

� ��

��"��#

��"��#

��

��

��

��

%

�� "�&�#

'�� $��

• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products

• unsigned

Combinational Multiplier:accumulation of partial products

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

( ) ( � ( & ( � ( � ( � ( �( �

Array Multiplier

b3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

0

a2

0

a3

0

P0

P1

P2

P3

FA

bj sum in

sum out

carryout

ai

carryin

Each row: n-bit adder with AND gates

What is the critical path?

Generates all n partial products simultaneously.

“Shift and Add” Multiplier• Sums each partial

product, one at a time.

• In binary, each partial product is shifted versions of A or 0.

Control Algorithm:1. P ←←←← 0, A ←←←← multiplicand,

B ←←←← multiplier2. If LSB of B==1 then add A to P

else add 03. Shift [P][B] right 14. Repeat steps 2 and 3 n-1 times.5. [P][B] has product.

Bn-bit shift registers

P

An-bit register

+01

0 n-bit adder

• Cost αααα n, ΤΤΤΤ = n clock cycles.• What is the critical path for

determining the min clock period?

Carry-save Addition• Speeding up multiplication is

a matter of speeding up the summing of the partial products.

• “Carry-save” addition can help.

• Carry-save addition passes (saves) the carries to the output, rather than propagating them.

• Example: sum three numbers,310 = 0011, 210 = 0010, 310 = 0011

310 0011+ 210 0010

c 0100 = 410

s 0001 = 110

310 0011c 0010 = 210

s 0110 = 610

1000 = 810

carry-save add

carry-save add

carry-propagate add

• In general, carry-save addition takes in 3 numbers and produces 2.• Whereas, carry-propagate takes 2 and produces 1.• With this technique, we can avoid carry propagation until final addition

Carry-save Circuits

• When adding sets of numbers, carry-save can be used on all but the final sum.

• Standard adder (carry propagate) is used for final sum.

FA FA FA FA FA FA FA FA 0

CSA

s cs cs cs cs cs cs cs cc

CSA

CPA

CSA

CSA

x0

x1

x2

Array Mult. using Carry-save Additionb3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

a2

a3

P0

P1

P2

P3

1

0

0

0

0

0

0

0

000

FA

bj sum in

sum out

carryout

ai

carryin

Fast carry-propagate adder

Another Representation (from book)

A3 B0

SC

A2 B0

SC

A1 B0

SC

A0 B0

SC

A3 B1

SC

A2 B1

SC

A1 B1

SC

A0 B1

SC

A3 B2

SC

A2 B2

SC

A1 B2

SC

A0 B2

SC

A3 B3

SC

A2 B3

S

A1 B3

S

A0 B3

S

B0

B1

B2

B3

P7 P6 P5 P4 P3 P2 P1 P0

A3 A2 A1 A0

� ��$*+�� !��

&�,�&��-�� $*�

F A

X

Y

A B

S CI CO

Cin Sum In

Sum Out Cout

Add

CPA

Carry-save Addition

CSA is associative and communitive. For example:(((X0 + X1) + X2 ) + X3 ) = ((X0 + X1) +( X2 + X3 ))

• A balanced tree can be used to reduce the logic delay.

• This structure is the basis of the Wallace Tree Multiplier.

• Partial products are summed with the CSA tree. Fast CPA (ex: CLA) is used for final sum.

• Multiplier delay α log3/2N + log2NCSA

CPA

CSA

CSA

x0x1x2

CSA

CSACSA

x3x4x5x6x7

log3/2N

log2N

Signed Multiplier

Signed Multiplication:Remember for 2’s complement numbers MSB has negative weight:

ex: -6 = 110102 = 0•20 + 1•21 + 0•22 + 1•23 - 1•24

= 0 + 2 + 0 + 8 - 16 = -6

• Therefore for multiplication:a) subtract final partial productb) sign-extend partial products

• Modifications to shift & add circuit:a) adder/subtractorb) sign-extender on P shifter register

11

2

0

22 −−

−

=−=�

nn

iN

ii xxX

Signed multiplication

� ��$��

� ��

1101 (-3)

1011 (-5)

1101

11010

000000

1101000

*

"��#

• product of 2 n-bit numbers is an 2n-bit number– sum of n n-bit partial products

• unsigned

1111 Note: 2s complement

Sign extension111

00

1

+(-3)+

++

-

+(-6)

-(-24)

00001111

Signed Array Multiplierb3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

a0

0

a1

0

a2

0

a3

0

P0

P1

P2

P3

Implicit

Sign

extension

- - - -

“Shift and Add” Signed Multiplier

Bn-bit shift registers

P

An-bit register

+01

0 n-bit adder

• Signed extend partial product at each stage

• Final step is a subtract

Summary

• 2 complement number systems– Algebraic and corresponding bit manipulations– Overflow detection– Signficance of “sign bit” -2n-1

• Carry look ahead is form a parallel prefix• Time / Space tradeoffs

– Bit serial adder

• Binary Multiplication algorithm– Array multiplier– Serial multiply (with bit parallel adder)

• Signed multiplication– Sign extend multipicand– Sign bit of multiplier treated as subtract

circuit design for unsigned addition eecs 150 - components …cs150/fa04/lecture/lec18.… · ·...

Documents