multipliers in vlsi

Sp09 CMPEN 411 L20 S.1

CMPEN 411VLSI Digital Circuits

Spring 2009

Lecture 20: Multiplier Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]


Review: Basic Building Blocks Datapath

Execution units- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random logic)

Interconnect Switches, arbiters, buses

Memory Caches (SRAMs), TLBs, DRAMs, buffers


The Binary Multiplication

x

Partial products

Multiplicand

Multiplier

Result

1 0 1 0 1 0

1 0 1 0 1 0

1 0 1 0 1 0

1 1 1 0 0 1 1 1 0

0 0 0 0 0 0

1 0 1 0 1 0

1 0 1 1


Multiply Operation Multiplication is just a a lot of additions

multiplicandmultiplier

partialproductarray

double precision product

N

2N

N can be formed in parallel


Multiplication Approaches Right shift and add

Partial product array rows are accumulated from top to bottom on an N-bit adder

- After each addition, right shift (by one bit) the accumulated partial product to align it with the next row to add

Time for N bits Tserial_mult = O(N Tadder) = O(N2) for a RCA

Making it faster Use a faster adder Use higher radix (e.g., base 4) multiplication – O(N/2 Tadder)

- Use multiplier recoding to simplify multiple formation (booth) Form the partial product array in parallel and add it in parallel

Making it smaller (i.e., slower) Use serial-parallel mult Use an array multiplier

- Very regular structure with only short wires to nearest neighbor cells. Thus, very simple and efficient layout in VLSI Can be easily and efficiently pipelined


Serial-parallel multiplier structure


The Array Multiplier

Y0

Y1

X3 X2 X1 X0

X3

HA

X2

FA

X1

FA

X0

HA

Y2X3

FA

X2

FA

X1

FA

X0

HA

Z1

Z3Z6Z7 Z5 Z4

Y3X3

FA

X2

FA

X1

FA

X0

HA

Sp09 CMPEN 411 L20 S.11

Booth multiplier Encoding scheme to reduce number of stages in

multiplication.

Performs two bits of multiplication at once—requires half the stages.

Each stage is slightly more complex than simple multiplier, but adder/subtracter is almost as small/fast as adder.

Sp09 CMPEN 411 L20 S.12

Booth encoding Two’s-complement form of multiplier:

y = -2nyn + 2n-1yn-1 + 2n-2yn-2 + ... (first bit is the sign bit)(example, y=18=010010 y= -18 = 101110 )

Rewrite using 2a = 2a+1 - 2a: y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.

Sp09 CMPEN 411 L20 S.13

Booth actions

yi yi-1 yi-2 increment

0 0 0 0

0 0 1 x

0 1 0 x

0 1 1 2x

1 0 0 -2x

1 0 1 -x

1 1 0 -x

1 1 1 0

y = 2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2) + ...

Consider first two terms: by looking at three bits of y, we can determine whether to add x, 2x to partial product.

Sp09 CMPEN 411 L20 S.14

Booth example x = 1001 (910), y = 0111 (710).

P0 = 00000000

y3y2y1=011 y1y0y-1=11(0)

y1y0y-1 = 110, P1 = P0 - (1001) = 11110111 x shift left for 2 bits to be 100100

y3y2y1 = 011, P2 = P1 (10*100100) =

11110111+01001000 = 001111111 (6310) An array multiplier needs N addtions, booth multiplier

needs only N/2 additions

Sp09 CMPEN 411 L20 S.15

Review: A 64-bit Adder/Subtractor

1-bit FA S0

C0=Cin

C1

1-bit FA S1

C2

1-bit FA S2

C3

C64=Cout

1-bit FA S63

C63

. . .

Ripple Carry Adder (RCA) built out of 64 FAs

Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in

RCA advantage: simple logic,

so small (low cost)

disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)

A0

B0

A1

B1

A2

B2

A63

B63

add/subt

Sp09 CMPEN 411 L20 S.16

Booth structure

Sp09 CMPEN 411 L20 S.17

Wallace-Tree Multiplier

6 5 4 3 2 1 0 6 5 4 3 2 1 0

Partial products First stage

Bit position

6 5 4 3 2 1 0 6 5 4 3 2 1 0Second stage Final adder

FA HA

(a) (b)

(c) (d)

Sp09 CMPEN 411 L20 S.18

Wallace-Tree Multiplier

Partial products

First stage

Second stage

Final adder

FA FA FA

HA HA

FA

x3y3

z7 z6 z5 z4 z3 z2 z1 z0

x3y2x2y3

x1y1x3y0 x2y0 x0y1x0y2

x2y2x1y3

x1y2x3y1x0y3 x1y0 x0y

Full adder = (3,2) compressor

Sp09 CMPEN 411 L20 S.19

Making it Faster: Tree Multiplier Structure

partial productarray reduction tree

fast carry propagate adder (CPA)

P (product)

mux + reductiontree (log N)+CPA (log N)

Q (‘ier)

D (‘icand)

DD

D

0

00

0

multiple forming circuits

inte

rcon

nect

Sp09 CMPEN 411 L20 S.20

(4,2) Counter Built out of two (3,2) counters (just FA’s!)

all of the inputs (4 external plus one internal) have the same weight (i.e., are in the same bit position)

the internal carry output is fed to the next higher weight position (indicated by the )

(3,2)

(3,2) Note: Two carry outs - one “internal” and one “external”

Sp09 CMPEN 411 L20 S.22

Tiling (4,2) Counters

Reduces columns four high to columns only two high Tiles with neighboring (4,2) counters Internal carry in at same “level” (i.e., bit position weight) as the

internal carry out

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

(3,2)

Sp09 CMPEN 411 L20 S.24

4x4 Partial Product Array Reduction

multiplicandmultiplier

partialproductarray

reduced pp array (to CPA)

double precision product

Fast 4x4 multiplication using (4,2) counters How would you

lay it out?

five (4,2) counters

5-bit CPA

multiplicand

multiplier

8-bit product

Sp09 CMPEN 411 L20 S.25

8x8 Partial Product Array Reduction ‘icand

‘ier

partialproductarray

reduced partial product array

Wallace tree multiplier

two rows of nine (4,2) counters

one row of thirteen (4,2) counters

to a 13-bit fast CPA

Sp09 CMPEN 411 L20 S.26

An 8x8 Multiplier Layout How should it be laid out?

multiplicandm

ultiplier

thirteen (4,2) counters

nine (4,2) counters

nine (4,2) counters

13-bit CPA

Sp09 CMPEN 411 L20 S.32

Multipliers —Summary

• Optimization Goals Different Vs Binary Adder

• Once Again: Identify Critical Path

• Other possible techniques

- Data encoding (Booth)- Pipelining

FIRST GLIMPSE AT SYSTEM LEVEL OPTIMIZATION

- Logarithmic versus Linear (Wallace Tree Mult)

Sp09 CMPEN 411 L20 S.33

Next Lecture and Reminders Next lecture

Shifters, decoders, and multiplexers- Reading assignment – Rabaey, et al, 11.5-11.6

multipliers in vlsi

Technology