ELEC516/10 Lecture 111
ELEC 516 VLSI System Design and Design Automation Spring 2010
Lecture 11 – Self-time and asynchronous design
Reading Assignment: Rabaey: Chapter 9
Note: some of the figures in this slide set are adapted from the slide setof “ Digital Integrated Circuits” by Rabaey, Copyright UCB 2002
ELEC516/10 Lecture 112
Characteristic of synchronous design
• Function of clock– Ensure physical timing constraints– Clock events serve as a logical ordering mechanism
• Advantages– Easy to design, only need to satisfy some simple timing
requirement, such as setup time, hold time of the latch/FF– Use clock as global signal
• Disadvantages– Clock skew problem– Noise problems due to current flows over a very short
period of time, close to the clock edge– Performance is worse, worst case timing instead of average
case timing
ELEC516/10 Lecture 113
Asynchronous design
• Eliminate the use of all clocks• Advantage of asynchronous design
– Clock skew free– Average case timing– Low power consumption due to elimination of the global
clock– Lower voltage noise and electromagnetic emission– Supports ‘plug & play’ property. Each sub-system of
asynchronous circuit only needs to care the synchronization with neighborhood sub-systems.
• Need to ensure a correct circuit operation that avoids all potential race condition under any operation condition and
ELEC516/10 Lecture 114
Self-timed and asynchronous design
Functions of clock in synchronous design
1) Acts as completion signal
2) Ensures the correct ordering of events
Truly asynchronous design
2) Ordering of events is implicit in logic
1) Completion is ensured by careful timing analysis
Self-timed design
1) Completion ensured completion signal2) Ordering imposed by handshaking protocol
ELEC516/10 Lecture 115
Self-timed pipelined datapath
R2 OutF2In
tpF2
Start Done
R1 F1
tpF1
Start Done
R3 F3
tpF3
Start Done
Req Req Req Req
Ack Ack Ack ACKHS HS HS
ELEC516/10 Lecture 116
Handshaking Protocol for self-timed system
• Use of Ack(nowledge) and Req(uest) signals• An input word arrives, and a Req signal to the
following block (F1) is raised. If F1 is inactive at the time, it transfer the data and acknowledges this fact to the input buffer, which can go ahead and fetch the next word.
• F1 is enabled by raising the Start signal. After the computation is completed, the done signal goes high
• A Req signal is then issued to F2. If this function is free, an Ack is raised, the output value is transferred, and F1 is freed and can go ahead with its next computation
ELEC516/10 Lecture 117
Completion Signal Generation
LOGIC
NETWORK
DELAY MODULE
In Out
Start Done
Using Delay Element (e.g. in memories)
ELEC516/10 Lecture 118
Completion Signal Generation
Using Redundant Signal Encoding
ELEC516/10 Lecture 119
Completion Signal in DCVSL
PDN
B0
PDN
In1In1In2In2
B1
Start
Start
VDD VDD
DoneB0
B1
ELEC516/10 Lecture 1110
Self-timed Adder
P0
C0
P1
G0
P2
G1
P3
G2 G3
VDD
Start
Start
P0
C0
P1
K0
P2
K1
P3
K2 K3
VDD
Start
Start
C0 C1 C2 C3 C4 C4
C4C0 C1 C2 C3 C4
VDD
Start
C4
C3
C2
C1
C4
C3
C2
C1
Start Done
(a) Differential carry generation
(b) Completion signal
ELEC516/10 Lecture 1111
Completion Signal Using Current Sensing
Min Delay Generator
StartGNDsense
VDD
Inputs
Current Sensor
Static CMOS Logic
Inpu
t R
egis
ter
Done
Output
A
B
tdelay
toverlap
tpd-NOR
tMDG
Start
A
B
Done
Output valid
ELEC516/10 Lecture 1112
Hand-Shaking Protocol
11
3RECEIVERSENDER
ReqReq
Ack
Data
Ack
Data
(a) Sender-receiver configuration
(b) Timing diagram
cycle 1 cycle 2
Sender’s actionReceiver’s action
Two Phase Handshake
ELEC516/10 Lecture 1113
Event Logic – The Muller-C ElementA B Fn1
0
0
1
1
0
1
0
(b) Truth table(a) Schematic
1
0
Fn
Fn
1
F
A
B
S
FF
R
QA
B
(a) Logic
(b) Majority Function
(c) Dynamic
A B
B
B
A
VDD
B
FA
B
VDDVDD
ELEC516/10 Lecture 1114
2-Phase Handshake Protocol
Advantage : FAST - minimal # of signaling events (important for global interconnect)
Disadvantage : edge - sensitive, has state
Senderlogic
Receiverlogic
Data
Handshake logic
Data ready Data accepted
CReq
Ack
ELEC516/10 Lecture 1115
Example: Self-timed FIFO
All 1s or 0s -> pipeline empty
Alternating 1s and 0s -> pipeline full
C C
R1In Out
En
Acki
Reqi
R2 R3
CReq0
Acko
Done
ELEC516/10 Lecture 1116
2-Phase Protocol
ELEC516/10 Lecture 1117
Example
From [Horowitz]
ELEC516/10 Lecture 1118
Example
ELEC516/10 Lecture 1119
Example
ELEC516/10 Lecture 1120
Example
ELEC516/10 Lecture 1121
4-Phase Handshake Protocol
Slower, but unambiguous
Also known as RTZ
1 1
2
3 5
4Req
Ack
Data
Cycle 1 Cycle 2
Sender’s action
Receiver’s action
ELEC516/10 Lecture 1122
4-Phase Handshake ProtocolImplementation using Muller-C elements
Handshake logic
Data ready Data accepted
ReqS
Ack
C C
Senderlogic
Receiverlogic
Data
ELEC516/10 Lecture 1123
Self-Resetting Logic
PrechargedLogic Block(L1)
PrechargedLogic Block(L2)
PrechargedLogic Block(L3)
completiondetection
(L1)
completiondetection
(L2)
completiondetection
(L3)
VDD
A B C
intout
Post-chargelogic
ELEC516/10 Lecture 1124
Clock-Delayed Domino
PulldownNetwork
CLK1
GND
CLK2 (to next stage)
Q1 (also D2)
D1
VDD
ELEC516/10 Lecture 1125
Asynchronous-Synchronous Interface
Asynchronoussystem
Synchronous system
Synchronization
fCLK
fin
ELEC516/10 Lecture 1126
Synchronizers and Arbiters
• Arbiter: Circuit to decide which of 2 events occurred first• Synchronizer: Arbiter with clock as one of the inputs• Problem: Circuit HAS to make a decision in limited time - which
decision is not important• Caveat: It is impossible to ensure correct operation• But, we can decrease the error probability at the expense of
delay
ELEC516/10 Lecture 1127
A Simple Synchronizer
• Data sampled on rising edge of the clock
• Latch will eventually resolve the signal value,but ... this might take infinite time!
CLK
int
I2
I1D Q
CLK
ELEC516/10 Lecture 1128
Synchronizer: Output Trajectories
Single-pole model for a flip-flop
2.0
1.0
0.00 100 200 300
Vou
t
time [ps]
ELEC516/10 Lecture 1129
Mean Time to Failure
ELEC516/10 Lecture 1130
Example
Tf = 10 nsec = T
Tsignal = 50 nsec
tr = 1 nsec
t = 310 psecVIH - VIL = 1 V (VDD = 5 V)
N(T) = 3.9 10-9 errors/secMTF (T) = 2.6 108 sec = 8.3 yearsMTF (0) = 2.5 sec
ELEC516/10 Lecture 1131
Cascaded Synchronizers Reduce MTF
Sync Sync SyncIn O1 O2 Out
ELEC516/10 Lecture 1132
Arbiters
Req1
Req2
Req1
Req2
Ack1
Ack2Arbiter
Ack1
Ack2
(a) Schematic symbol
(b) Implementation
A
B
Req1
Req2
A
B
Ack1 t
(c) Timing diagramVT gap
metastable
ELEC516/10 Lecture 1133
PLL-Based Synchronization
DigitalSystem
Divider
CrystalOscillator
PLL
Chip 1
DigitalSystem
PLL
Chip 2
fsystem = N x fcrystal
fcrystal 200<Mhz
Data
ClockBuffer
referenceclock
ELEC516/10 Lecture 1134
PLL Block Diagram
Phasedetector
Chargepump
Divide byN
Loopfilter
VCO
Referenceclock
Localclock
SystemClock
Up
Down
v
ELEC516/10 Lecture 1135
Phase Detector
ref
localclock
localclock
Output
Output
ref
VDD
-180 -90 90 180 phase error (deg)
Output (Low pass filtered)(a)
(c)
(b)
Output before filtering
Transfercharacteristic
ELEC516/10 Lecture 1136
Phase-Frequency Detector
(c) Timing waveforms
(a) schematic (b) state transition diagram
A
B
UP
DN
A
B
UP
DN
D Q
D Q
A
B
Rst
Rst
UP
DN
UP = 0DN = 1
UP = 0DN = 0
UP = 1DN = 0
B
B A
A
A B
ELEC516/10 Lecture 1137
PFD Response to Frequency
A
B
UP
DN
ELEC516/10 Lecture 1138
PFD Phase Transfer Characteristic
VDD
phase error (deg)
Average (UP-DN)
2
2
ELEC516/10 Lecture 1139
Charge Pump
VDD
UP
DN
To VCO Control Input
ELEC516/10 Lecture 1140
PLL Simulation
00.0
0.2
0.4
0.6
0.8
1.0
1 2 3 4 5
Con
trol
Vo
ltage
(V
)
Time ( s)
ref
div
vco
ref
div
vco
ELEC516/10 Lecture 1141
Clock Generation using DLLs
PhaseDet
ChargePump
Filter
DL
PD CP VCO÷N
Delay-Locked Loop (Delay Line Based)
Phase-Locked Loop (VCO-Based)
U
D
U
D
fREF
fO
fO
fREF
Filter
ELEC516/10 Lecture 1142
Delay Locked Loop
Phasedetect
Chargepump
VCDLFREF
PH
U
DC
VCTRL
FO
REF
OUT
UP
DN
Delay
PH
VCTRL
(a)
ELEC516/10 Lecture 1143
DLL-Based Clock Distribution
VCDL
CP/LF
PhaseDetector
VCDL
CP/LF
PhaseDetector
DigitalCircuit
•••
DigitalCircuit
•••
GLOBAL CLK