1 authors: steven m. nowick, kenneth y. yun, peter a. beerel and ayoob e.dooply reader: pushpinder...

45
1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of High- Performance Asynchronous Dynamic Adders

Upload: ellen-whitehead

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

1

Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply

Reader: Pushpinder Kaur Chouhan

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Page 2: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

2

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction Basic Concepts Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion References

Page 3: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

3

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction Goal of the article Motivation

Basic Concept Counters Classification Architecture of Speculative Completion Speculative Adder Design Conclusion References

Page 4: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

4

Introduction

Goal of the article –

To design high performance asynchronous datapath components, which are faster than synchronous designs and yet have low area overhead.

Page 5: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

5

Motivation

Potential advantages of asynchronous design:

Low power consumption - components use power only “on demand”

High performance - systems not limited to “worst-case” clock rate

Robustness & Scalability - no global timing

Ease of design – global clock distribution and synchronization can be avoided

Use of speculative completion to design the

asynchronous datapath components for early results.

Page 6: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

6

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction

Basic Concept Bundled datapath Completion detection Adders Basic Binary lookahead carry adder design

Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion

Page 7: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

7

Basic Concepts

Bundled datapath –

Completion detection – Implementation in dual-rail, where each bit is mapped to a pair of wires, which encode both the value and validity of the data.

Function

Block

(C/L)

Worst-case matched delay

req ack

Advantages –

Easy implementation

Low power

Limited area

Page 8: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

8

Basic Concepts Adders basic

1-bit Full adder

Si=(Ai Bi) Ci

Ci+1 = AiBi+(Ai Bi)Ci

In terms of generate(g), propagate(p) and absorb(a) signal

gi = AiBi

pi = Ai Bi

ai = AiBi = Ai+Bi

Si = pi Ci

Ci+1 = gi+piCi

Page 9: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

9

Binary Lookahead Carry Adder

Page 10: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

10

Binary Lookahead Carry Adder

Level-1 computes all 2-bit P and G values, where

Pi = pipi-1 and Gi = gi + pigi-1

Level-2 computes all 4-bit P and G values, where

Pi=PiPi-2 and Gi = Gi + PiGi-2

and so on.

Level-6 computes the ith sum bit Si, where

Si = pi Gi-1

Adder computes cumulative P and G values

11

1 1 1 1 122

50

Page 11: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

11

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction Basic Concept

Architecture of Speculative Completion Multiple model delays Abort detection networks Modified result logic

Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion

Page 12: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

12

Architecture of Speculative Completion

Function

Block

(C/L)

Worst-case matched delay

Medium matched delay

Short matched delay

req

req

req

done

Abort 1

Abort 2

1

1

0

0

Abort Logic

Abort Logic

Block Diagram

Page 13: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

13

Architecture of Speculative Completion

Worst-case matched delay

Medium matched delay

Short matched delay

req

req

req

done

Abort 2

1

1

0

0

Abort 1

Multiple model delays:- one for worst-case and the remaining ones for speculative completion. These speculative delays allow different speeds of early completion.

For eg:- In a ripple carry adder, an “average-case” delay might be used if adder input results is short carry chains; a “best-case” delay might be used if there is no carry chain.

Page 14: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

14

Architecture of Speculative Completion

Worst-case matched delay

Medium matched delay

Short matched delay

req

req

req

done

Abort 1

Abort 2

1

1

0

0

Abort Logic

Abort Logic

Abort detection network:- It is associated with each speculative delay. The network determines if the corresponding speculative completion must be aborted, due to worst-case data. Abort detection is computed in parallel with datapath computation.

Page 15: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

15

Speculative CompletionModified result logic

With speculative completion, early completion is allowed when results can be produced early. Modified result logic is required to take advantage of the early production of required inputs to the result logic.

For example:- in adder designs, carry may be produced earlier and hence sum logic needs to be modified.

Page 16: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

16

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders Introduction Basic Concept

Architecture of Speculative Completion Speculative Adder Design

Multiple model delays Abort detection networks Modified result logic

Basic Dynamic Brent-Kung Adders Conclusion

Page 17: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

17

Speculative Adder Design

1

0

A

BSUM32

3232

ADDER

Abort

done

req

req

Completion network (matched delays)

Abort

detection

network

Block Diagram

Page 18: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

18

Speculative Adder Design

Completion Network – Each inverter is roughly corresponds to the delay of one level in

BLC adder. Worst-case delay path has 7 gate delay. Speculative delay path has only 5 gate delays. The finial generate values are available in Level-3. The speculative path is disabled by an abort signal.

1

0

Completion network (matched delays)

done

req

req

Abort

signal

Page 19: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

19

Speculative Adder Design Abort Detection Network –

Conditions for late completion – late completion can only occur if there exists a run of 8 consecutive Level-0 propagate signals.

At the nth level, a generate function of the ith stage is computed as:

Detecting late completionSimple detection network

Page 20: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

20

Speculative Adder Design

Abort Detection Network –

Conditions for late completion

Detecting late completion

Simple detection network

Page 21: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

21

Abort Detection Network – Conditions for late completionDetecting late completionSimple detection network

A simple sum-of-products detection network can be used, where each product contains a short run of Level-0 propagate signals.

For eg- 4-literal products: each product contains a run of 4 propagate signals in Level-0. The network contains 5 products. If any of the run occurs, product will be 1. The sum-of-products eq:

p4p5p6p7+p9p10p11p12+p14p15p16p17+p19p20p21p22+p24p25p26p27

Speculative Adder Design

Page 22: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

22

Speculative Adder Design

Abort Detection Network – Conditions for late completionDetecting late completionSimple detection network

Page 23: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

23

Speculative Adder Design

Modified Sum Generation –

Page 24: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

24

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design

Basic Dynamic Brent-Kung Adders Completion network Abort detection networks Modified sum generation

Conclusion References

Page 25: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

25

Basic Dynamic Brent-Kung Adders

Basic Dynamic P/G Cell –

Pi = Pi Pj and Gi = Gi + Pi Gj

Si = pi Gi-1

n-1n-1 n-1 n-1n-1nn

N

Page 26: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

26

Basic Dynamic Brent-Kung Adders

Completion Network

Page 27: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

27

Basic Dynamic Brent-Kung Adders

Abort Detection Network

Page 28: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

28

Basic Dynamic Brent-Kung Adders

Modified Sum Generation

(a) 2-speed adder, (b) 3-speed adder

Page 29: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

29

Basic Dynamic Brent-Kung Adders

Modified Sum Generation

Page 30: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

30

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders

Conclusion References

Page 31: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

31

ConclusionWith speculative completion, early completion is allowed when results can be produced early.

Asynchronous adder is selected because of the potential advantages of asynchronous design.

Dynamic Brent and kung adder is better because

with dynamic logic all nodes are reset during the precharge phase, so values of internal nodes are known, where as in static CMOS implementation internal nodes are never reset, so their state is general unknown.

No late-enable signal is need to be distributed in dynamic logic, where as in static CMOS implementation late enable signals had to be distributed to the different sum modules.

Page 32: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

32

Conclusion

Advantages Little area overhead (less than 5%) Performance increase for average-case data

(upto 29% increase in 64-bit and 19% increase in 32-bit

BK adders for random input data)

Disadvantages Probabilistic approach, hence performance gain

depends on distribution of input data.

Page 33: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

33

Speculative Completion for the Design of High-Performance Asynchronous

Dynamic Adders

Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion

References

Page 34: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

34

References

Design of low-latency asynchronous adder using speculative completion by S.M.Nowick

High-performance adders with speculative completion by Ayoob E. Dooply

Page 35: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

35

Questions ?

Page 36: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

36

Dual Rail Monotonic Encoding

• Def. Glitch: Nonfinal transition

• Def. Hazard: Potential for glitch

• Encode every signal, X, with two wires, XH and XL:– XH=0, XL=0: data not ready– XH=0, XL=1: logic “0”– XH=1, XL=0: logic “1”– XH=1, XL=1: not used

Page 37: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

37

• Static : – At every point in time (except during the switching

transient), each gate output is connected to either V DD or V SS via a low-resistance path.

– Slower and more complex than dynamic but "safer".

• Dynamic : – Rely on the temporary storage of signal values on the

capacitance of high-impedance circuit nodes.

– Simplier in design and faster than static but more complicated in operation and are sensitive to noise.

Page 38: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

38

• Fan-in– The number of standard loads drawn by an input to

ensure reliable operation. Most inputs have a fan-in of 1.

• Fan-out– The number of standard loads that can be reliably

driven by an output, without causing the output voltage to shift out of its legal range of values.

Page 39: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

39

Benefit: Low Power

• No clock or PLL to start/stop– Faster (instantaneous!) recovery from idling– Easier to idle for short periods– Clock itself is a high-power node

• Only draw power when doing work– No need to explicitly enable/disable units– Automatic fine granularity of power saving

Page 40: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

40

Asynchronous Design

Several Potential Advantages:

– Lower Power• no clock ==> components use power only “on demand”

– Robustness, Scalability• no global timing==>“mix-and-match” varied components

– Higher Performance• systems not limited to “worst-case” clock rate

Page 41: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

41

Should we use Asynch?

• Benefits– Early completion, better EM, low power,

environmental adaptability– No global clock to distribute!

• Drawbacks– Design challenges– Testing and tools

Page 42: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

42

• Asynchronous circuits are advantageous in:• · Low-power applications, by: automatic turn-off for idle parts, if

synchronization is done by handshaking, only were needed; adaptive scaling of supply voltage, as performance of speed-independent circuits does not depend on component speeds and scales continuously over a wide range of power supply voltages.

• · Improved EMI characteristics, including: reduced noise by the absence of clock harmonics; reduced switching activity; accommodation of delays due to electromagnetic noise if communication is done delay-insensitively. If the average signal transition time is T for a voltage swing of V, then an induced electromotive force of V will cause a signal delay of T V/V.

• · High-speed applications: for circuits with completion detection, the speed of the system is determined by the average-case rather than the worst-case speeds of the components.

• · Applications in heterogeneous system timing. According to semiconductor industry forecasts such as ITRS (previously known as SIA roadmap), the systems on chip of the near future will require multiple clock domains. As die sizes increase and the distance that can be traveled by a signal over a clock period becomes smaller, the number of time zones on a chip will grow rapidly, approaching 1000 by 2006 and 10000 by 2012.

Page 43: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

43

Introduction• Synchronous vs. Asynchronous Systems?

– Synchronous Systems: use a global clock• entire system operates at fixed-rate

• uses “centralized control”

clock

Page 44: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

44

Introduction (cont.)

• Synchronous vs. Asynchronous Systems? (cont.)

– Asynchronous Systems: no global clock

• components can operate at varying rates

• communicate locally via “handshaking”

• uses “distributed control”

“handshaking interfaces”

Page 45: 1 Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of

45

Introduction (cont.)

Asynchronous Circuits:• long history (since early 1950’s), but...

• early approaches often impractical: slow, complex

Synchronous Circuits: • used almost everywhere: highly successful

• benefits: simplicity, support by existing design tools

But recently: renewed interest in asynchronous circuits