decoder design

7/27/2019 Decoder design

1/5

Practice 6: Logical Effort of a Decoder

Design ExampleFor our design we will consider a 64Kbit (256x256) memory block:

To access a certain row, we need to turn on 1 out of 256 Word Lines . This is achieved using an 8-256 Row Decoder .

Our goal is to design the fastest, lowest power decoder possible, using Static CMOS Logic .

A Decoder in GeneralA decoder can be constructed with AND gates or with NOR gates:

An AND Decoder: 0 7 6 5 4 3 2 1 0 255 7 6 5 4 3 2 1 0;WL A A A A A A A A WL A A A A A A A A A NOR Decoder:

0 7 6 5 4 3 2 1 0 255 7 6 5 4 3 2 1 0;WL A A A A A A A A WL A A A A A A A A

We basically need 256 AND gates with 8 inputs each. They eachdrive one Word Line with 256 bitcells connected to it.

We know that we shouldnt use a gate with a Fan In of 8, butshould we use only 2-input NANDs?


2/5

Problem Setup

What is the Load Capacitance? Each Word Line has 256 Cells connected to it. 256WL Cell WireC C C Lets ignore the wire for now

What is the input capacitance? Lets assume that our input capacitance is slightly larger than a bitcell. 4address Cell C C

What about the branching effort? We can look at the simple, 8-input NAND case without losing generality

(because this will be the product of internal branching efforts). Each address input goes into half of the NANDs (the other half gets the complement)

So the branching effort is:

8

256 127 256128

256on path off path Cell Cell

input ion path Cell

C C C C b b

C C

So the total Fanout on each Address Wire is:

13256128 8 24

WL Cell i

address Cell

C C F b k

C C

How many stages are needed? For a best case assumption, we can assume all inverters ( 1 LE )

13 133.62 ; log 2 7opt PE N In other words, we need a lot of stages. We should choose the least complex gates (2-input NAND). As we saw before, this results in 6 stages. For optimum, wed still buffer (not necessary).


3/5

Various Implementations 8 input NAND:

10 3 1 10 3; 8 1 9 LE p

4 input NAND: 2 5 3 10 3; 4 2 6 LE p

2 input NAND/NOR/NAND:

4 3 5 3 4 3 1 80 27; 2 2 2 1 7 LE p

2 input NAND:

34 3 2.37; 2 3 1 3 9 LE p

We see that with the 2-input NAND, we have a lower Logical Effortand a similar intrinsic delay, but we are closer to the (minimum) optimal number of stages we found before.

Lets calculate the new optimal number of stages and see how close we are: 13

3.6

2.37 2 19.418

log 7.7i i

opt

PE F b LE k

N PE

So we are pretty close to the optimum number of stages, though we could add anotherinverter or two for even better performance.


4/5

Predecoding

The ProblemSo far our solution takes 256 blocks with 6-8 stages that each are driven by 8 address lines (ortheir complements).

This solution is non-physical. Why? Because we limited the input capacitance on each addressline to 4Ccell , but each address line is connected to 128 of these. So the gate capacitance of each transistor (neglecting the wire capacitance) in our NAND gate would have to be:

4128 32 gate Cell gate Cell C C C C

Considering that Ccell is already probably close to a minimum transistor size, this is not possible.

The SolutionLooking at two different decoder outputs, we

can see that several components are sharedbetween them.

So we can break up the expressions into sharedgates that we dont have to replicate. It will notchange the branching effort of the input. It will just push it further down the line.

Creating a PredecoderWe can make a single gate for each of the sharedterms. These gates are small decoders, i.e. only one of a group is asserted at a given time.

For example: Make 4 groups of 2 inputs (and their complements): 0 1 2 3 4 5 6 7, ; , ; , ; , A A A A A A A A We input each pair into 2-input AND gates, which are essentially small decoders. For example, for A 0, A1, we get: 0 1 0 1 0 1 0 1, , , A A A A A A A A. Only one of these will be on. So after the predecoders, we have a total of 16 signals, only 4 of which will be on. Now we can start over making our decoder. Each of our 256 Word Lines is a

combination of 4 of these predecoded signals A 4-input AND gate.

Now each of the addresses only drives 2 gates, much more feasible than the 128 gates frombefore Plus we save d a large number of replicated area.


5/5

Choosing the right PredecoderThere are several options for the size of the predecoder.

Which one should we choose?

Lets take in different considerations:

Power: How many long, high fanout wires areswitching in the worst case?In option 1, 4 wires need to switch, whereas inoption 2, only 2 wires switch. Option 2 is better

for power . Pitch: To layout the memory, we need to fit the pitch of the decoder and the bitcells.

This doesnt leave us much room/height. In opti on 2 we only need to fit in a 2 input cell,whereas in option 1, we need to get all 4 inputs in there. Option 2 is better for pitch .

Input Capacitance: In option 1, each input drives 2 gates, while in option 2, each inputdrives 8 gates. If we have a limitation on input capacitance, option 1 is better.

Area/Performance/Power: In option 2, we have minimized the replication of signals. Thepostdecoder compares two independent lines, so we wasted no area on duplicatesignals. This improves the Area/Performance/Power tradeoff, as we get the same PathEffort and can size for the same performance using less area and capacitance.

What about other Logic FamiliesStatic CMOS is far from the best technology to use to create an optimal decoder. The question iswhat are your needs. Do you want a fast decoder? Do you want a low-power decoder? Or is arobust decoder all you need?

decoder design

Documents