© 2002-2009 ran ginosarasynchronous design and synchronization 1 vlsi architectures 048878 lecture...
Post on 20-Dec-2015
214 views
TRANSCRIPT
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 1
VLSI Architectures048878
Lecture 2:
Theoretical Aspects (S&F 2.5)
Data Flow Structures (S&F 3)
Performance (S&F Ch. 4)
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 2
Classification of Async Circuits
• Self-timed (ST)
– Requires some timing assumptions
• Speed-independent (SI)
– Zero (ideal) wire delay, arbitrary gate delay
• Delay-insensitive (DI)
– Arbitrary delays (gates and wires)
• Quasi-delay-insensitive (QDI)
– DI with the Isochronic Fork assumption
– Theoretically equivalent to SI
• SI and DI are mathematically provable
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 3
Speed Independence
• A gate (Boolean function) is either:
– Stable, or
– Excited (inputs have changed and the output should also change to satisfy the Boolean function)
• A gate “fires” the output is changed
• An excited gate eventually fires and become stable.
• SI means: Firing of one gate must never cause another excited gate to become stable without firing.
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 4
Data Flow Structures
• Abstraction similar to sync RTL
– Likewise, described by either schematics or HDL
• Applies to all (3) handshake protocols, but we assume 4-phase
– Alternating VALID / EMPTY tokens
• Assume handshake latches and handshake-ignorant function blocks
– Recall token flow rules
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 5
Abstract Pipeline
• Bubbles
• Tokens
• Valid (0 or 1, who cares) and Empty tokens
• Transparent function blocks (don’t change token flow, only introduce some delays)
E V V E E
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 6
Abstract Rings
• 3 stages, 1 bubble:
– 3 steps for token round
– 6 steps to cycle
V E V
V E E
V V E
E V E
token
bubble
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 7
Abstract Rings
• 4 stages, 2 bubbles:
– How many steps to cycle ?
• An added latch did not change the function (unlike sync pipe)
V E E V
V V E E
E V V E
V V E E
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 8
Building Blocks
Latch Source Sink
Fork Join(wait for all)
Merge(wait for one)
MUX
0
1
DEMUX
0
1
Function Block(Join; CL; Fork)
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 22
Another Ring: Simple FSM
EV
F
E
Next StatePresent State
Input Output
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 23
Another Ring: Iterative Computation
EE F
E
Input Output0
1
0
1
EE F1F2F3
Arbitrary piping also works:
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 24
Latches don’t foul the pipe!
• Don’t try this with sync circuits!
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 25
IF statementif <COND> then <TRUE PART> else <FALSE PART>
0 1
TRUEPART
FALSEPART
0 1
COND
Combinational logic, or latches may be added
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 26
FOR statement
for <COUNT> do <BODY>
BODY
01
COUNT
01 E0
One handshake here
Results in COUNT handshakes here [1x(COUNT-1) + 0]
Warning: Not all latches are shown
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 27
WHILE statement
while <COND> do <BODY>
BODY
01
COND
01 E0
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 28
Async GCD
input (a, b);
while a b do
if a > b then a a-b ;
else b b-a ;
output (a);
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 29
Async GCD input (a, b);
while a b do
if a > b then a a-b ;
else b b-a ;
output (a);
0
1
B-A
A-B
0
1
E
A>B
E
01
01
E0
E
AB
1
1
A,B
A,B
A,B GCD(A,B)
if
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 30
Performance• Sync performance analysis is simple:
– Check all register-to-register paths
– Static Timing Analysis
– Dynamic simulations only check correctness, not performance
• Async performance analysis is COMPLEX:
– Many cycles
– Data dependent delays
– Dependency on environment and initialization
– Not guaranteed to have a solution
• We will only consider simple examples…
– Qualitative, then quantitative
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 31
E 3 E 2 E 1
E 3 E 2 E 1
E 3 E 2 E E
E 3 E 2 2 E
E 3 E E 2 E
E 3 3 E 2 E
E E 3 E 2 E
4 E 3 E 2 E
4 E 3 E 2 E
1
4
4
E
FIFOPerformance:2N on 2N
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 32
FIFO Performance
• 2N=6 tokens (N Valid, N Empty)
• 2N=6 latches
• 2N=6 steps to move all tokensone step to the right
• But is it the best we can do with 2N latches?Let’s try a fast sink.
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 33
E 3 E 2 E E
E 3 E E 2 2
E 3 E 2 E 1 1
E E 3 3 E E4
4 4 E E 3 3
FIFOPerformance:Fast sink(N on 2N)
E 3 E 2 2 E E
E 3 3 E E 2 2
4 E E 3 3 E4 E
E 4 4 E E 3 3
E E 4 4 E E
E
E
5
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 34
FIFO Performance
• Fast sink:
– Tokens spread out
– Bubble every other stage
– Only N tokens in 2N stages
– One step to move every token to the right
• Let’s try to add stages (same # of tokens)
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 35
14 E 3 3 E
4 4 E 3 3
E E 3 E E
4 E E 3 E
E 4 E E 3
2
E
2
E
E
2
2
E
2
E
E
2
E
E
2
1
E
1
E
E
4
4
E
E E
FIFO Performance: 2N on 3N
• 3N=9 stages
• 2N tokens (N Valid, N Empty) + N bubbles
• Only 2 steps to move every token one stage to the right
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 36
Shift Register + Parallel Load
• CTL=0 token:
– Parallel load
– Old values to sink latches
– Valid din[0] to output
• CTL=1 token:
– Shift right
– Valid token output
• CTL=Empty token
– Shift right
– Empty token output
• Two performance issues:
– Too few bubbles
– High fanout on CTL
• Time
• Large C-element for ACK
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
ctl
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 37
Shift Register + Parallel Load
• Buffers added in the CTL path
– Solves both issues together
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
ctl
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
ctlEEE
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 38
E 00
1
0
1E 0
0
1
0
1E 0
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
0EEctl0
E 00
1
0
1E 0
0
1
0
1d1 E
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
00E 0
d3 E0
1
0
1d2 E
0
1
0
1d1 E
0
1
0
1
0
10
din[1]din[2]din[3] din[0]
do
000 0
…
Parallel Load (CTL=0)
Enabled, move not shown
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 39
din, CTL Empty (slow consumer)
…
d3 E0
1
0
1d2 E
0
1
0
1d1 E
0
1
0
1
0
10
EEE E
E, do
E00 E
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
EEE E
E, do
EEE E
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 40
Slow Shift (CTL=1; CTL=E)
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
EEE Ed1,E,do
1EE 1
…
0 E0
1
0
1d3 E
0
1
0
1d2 E
0
1
0
1
0
10
EEE E
d1,E,do
111 1
…
E 00
1
0
1E d3
0
1
0
1E d2
0
1
0
1
0
10
EEE EE,d1,E,do
EEE E
© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 41
Fast Shift (CTL=1;E;1;E…)
E d30
1
0
1E d2
0
1
0
1E d1
0
1
0
1
0
10
EEE E
d1,E,do
1EE 1
E d30
1
0
1E d2
0
1
0
1d2 E
0
1
0
1
0
10
EEE E
E,d1,E,do
E1E E
E d30
1
0
1d3 E
0
1
0
1E d2
0
1
0
1
0
10
EEE E d2,E,d1,E,do
1E1 1
0 E0
1
0
1E d3
0
1
0
1d3 E
0
1
0
1
0
10
EEE E E,d2,E,d1,E,do
E1E E