advanced digital design gals design andreas steininger vienna university of technology
TRANSCRIPT
Advanced Digital DesignGALS Design
Andreas SteiningerVienna University of Technology
© A. Steininger & M. Delvai / TU Vienna 2Lecture "Advanced Digital Design"
Outline
Global synchrony & clock distribution types of synchrony
The GALS approach communication synchronization Muller C-Element, Mutex & Arbiter data driven clock & pausable clock TMR example with pausible clock
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 3
Even/Odd Synchronizer
works for two periodic clocks only with frequency ratio within certain range
avoids performance penalty of synchronizers largely eliminates potential for metastability for details see
[Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010]
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 4
Types of Synchrony (1)
synchronous identical frequency, constant phase relation classical synchronous system driven by one clock source
mesochronous identical frequency (no accumulating drift) but
unknown, constant phase shift (bounded) example: unbalanced clcok tree
multisynchronous identical frequency (no accumulating drift) but
unknown, varying phase relationship (bounded) example: jittering PLLs driven by the same source
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 5
Types of Synchrony (2)
ratiochronous fixed (known) frequency ratio, identical source example: source clock divided by different values
plesiochronous same nominal clock frequency, mutual (low) drift independent clock sources with same nominal frequency
heterochronous clocks totally unrelated but periodic independent clock sources with different nominal frequency
aperiodic events arrive totally unrelated to clock sporadic event (pushbutton) needs to be synchronized
un
corr
ela
ted
© A. Steininger & M. Delvai / TU Vienna 6
Synchronization Concepts
Lecture "Advanced Digital Design"
c
rclkdat
tTf
MTBUFR
exp
10
Probability of (not) resolving early enough
This is what the “waiting synchronizers” exploit
FR can never become zero!
Probability of (not) getting into metastability
Specialized synchronizers exploit a priori knowledge
FR can become zero, if edges properly aligned!
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 7
Data Delay Synchronizer
delay data to match FF timing requirements need to determine appropriate delay
static, a priori (mesochronous, ratiochronous)
dynamic, by phase measurement & prediction (plesiochronous)
D
CLK
R dataD
T data
j - estimate R clk
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 8
Clock Delay Synchronizer
delay R clock to match FF timing requirements need to determine appropriate delay
static, a priori (mesochronous, ratiochronous)
dynamic, by phase measurement & prediction (plesiochronous)
D
CLK
R data
R clkD
T data
j - estimate
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 9
Buffering Synchronizer
transmitter writes data to buffer with T clk receiver reads data from buffer with R clk pointers are maintained to avoid collision
needs clock domain crossing => metastability! register w REQ/ACK, FIFO, ring buffer buffer size determines elasticity
D
CLK
R dataT data
R clkT clk
Bu
ffer
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 10
Global Synchrony?
Problem 1: Clock distribution Low-skew clock distribution becomes difficult for
large chips and high frequencies Clock networks consume a considerable share of the
power
Problem 2: Clock selection SoC contains many IPs, each specified for its own
frequency specific frequencies required for some functions
(interface standards, e.g.) dynamic local changes due to voltage & frequency
scaling, clock & power gating
Globally Synchronous Design
whole design is „isochronic“ time conveyed by clock transitions system-wide co-ordination of activities assumes perfect clock distribution
11
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 12
Clock Distribution
TRGsrc
TRGsnk
tCO tpd tCO
v a l i d v a l i d
synchronous approach:
clock skew 1
setup violation
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 13
Clock Distribution
TRGsrc
TRGsnk
tCO tpd tCO
v a l i d a l i d
synchronous approach:
clock skew 2
hold violation
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 14
Clock Distribution
TRGsrc
tCO tpd
v a l i d
asynchronous approach:
REQ delay
REQ
completion detection
ACK
TRGsrc
TRGsnk
tCO
v a l i d
ACK
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 15
Clock Distribution
TRGsrc
tCO tpd
v a l i d
asynchronous approach:
ACK delay
REQ
completion detection
ACK
TRGsnk
ACK
TRGsrc
tCO
v a l i d
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 16
Clock Distribution
TRGsrc
tCO tpd
a l i d
asynchronous approach:
data delay
ACK
REQ
completion detection
TRGsrc
TRGsnk
tCO
v a l i d
ACK
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 17
The GALS Approach
SoC is clearly structured into IPs anyway
run each at its desired individual frequency=> synchronous islands efficient, well understood
communication between IPs has to bridge clock boundaries may run over larger distances
=> asynchronous paradigm (handshake- based) better suited for composition
Globally Asynchronous Locally Synchronous (GALS)
First mention in PhD thesis by Chapiro / Stanford 84
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 18
A GALS Example
CPU2GHz
PCI-IF533MHz
DSP2,7GHz
USB-IF24MHz
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 19
Communication in GALS
Boundary Synchronizers direct data exchange controlled by handshake => synchronizer
Shared Memory data exchange decoupled through memory shared memory needs arbitration
Dual-Clock FIFOs data exchange buffered through FIFO-queue status flags need synchronization
Local Clock Stretching direct data exchange sender can halt receiver clock while data in transition
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 20
Boundary Synchronizers
data moving over clock domain boundary metastability problems => need to insert handshake …with synchronizers and (optional) buffers control flow sender / receiver strongly coupled handshake loop limits speed
S
0xff14
CPU2GHz
DSP2,7GHz
S
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 21
Shared Memory
perfect decoupling of data path potential metastability problems at arbitration logic potential blocking through arbitration low speed, high efforts => rarely used
CPU2GHz
shared memory
Arbi-tration
0xff14
DSP2,7GHz
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 22
Clocked FIFO
good decoupling of data path potential metastability problems with pointer mgmt. potential blocking through full / empty high speed, high efforts (reg array)
CPU2GHz reg array
0xff14
DSP2,7GHz
*
SPointer mgmt emptyfull
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 23
Pausible Clocking
SRC: request SNK to stop clock SNK: acknowldege stopping of clock
open data latch (safe now!) SRC: release SNK clock blocking SNK: release ACK, close data latch
start clocking (data stable now!)
CPU2GHz
DSP2,7GHz
*
pausible clock
latch0xff14
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 24
Pausible Clocking
coupling of data path potential metastability problems with pausible clock potential blocking through handshake for pausing high speed, moderate efforts (pausible clock) receiver clock distribution delay may cause problems
CPU2GHz
0xff14
DSP2,7GHz
*
pausible clock
latch
© A. Steininger & M. Delvai / TU Vienna 25Lecture "Advanced Digital Design"
Fundamental Asynchronous Building Blocks
will be needed for pausible clocking (and others) …
© A. Steininger & M. Delvai / TU Vienna 26Lecture "Advanced Digital Design"
Muller C-Element
RS
reset
set
a
b
y
IF a = bTHEN y = aELSE hold y
Ca b
y
Ca
by
David Eugene Muller (1924 – 2008), Professor at Univ. of Illinois:Muller, D. E.; Bartky, W. S. (1959), "A Theory of Asynchronous Circuits", Proc. Int'l Symp. Theory of Switching, Part 1 (Harvard Univ. Press): 204–243
© A. Steininger & M. Delvai / TU Vienna 27
Function of a MCE
consider a MCE with n inputs
AND for transitions need a on all inputs for a output need a on all inputs for a output
n-of-n threshold gate change output only if all inputs agree on changing
voter keep old state until agreement on change
memory element storage loop like D-latch, different input stack
Lecture "Advanced Digital Design" *
© A. Steininger & M. Delvai / TU Vienna 28Lecture "Advanced Digital Design"
Muller C-Element: Circuit
[Sutherland]
[Martin]
[van Berkel]
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 29
Mutual Exclusion
purpose: decide order of asynchronous events
function: handle pairs of request_in / grant_out requests may arrive in any order MUTEX must activate only one grant_out at a time
(respond to the first requester)
problem: resolve concurrent requests
=> metastability problem
r1
r2
g1
g2
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 30
MUTEX: Circuit
SR-latch
g1’
g2’
r1
r2
g1
g2
„Metastability filter“: e.g., lo-threshold inverter
[from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley]
Vout,latch
t
Vth,inv
Vmeta
BUT: Doesn’t a lo-threshold inverter produce glitches?
*
Vout,inv
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 31
Popular MUTEX Implem.
C.L. Seitz, Ideas about arbiters, Lambda, 1 (fi rst quarter):10–14, 1980.
*
SR-latch metastability filter
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 32
MUTEX: Operation
g1’
g2’
r1
r2
g1
g2
Vout,FF
t
Vth,inv
Vmeta
r1
g1
r2
g2
4-phase protocol
© A. Steininger & M. Delvai / TU Vienna 33
MUTEX vs. Synchronizer
Synchronizer purpose: “safeness”: important: freedom: circuit:
MUTEX purpose: “safeness”: important: freedom: circuit:
Lecture "Advanced Digital Design"
synchronize asynchronous input
serialize concurrent requests
fast resolution in both directions
never activate both grants
final decision not important
infinite resolution time
flip flop (special design)
SR-latch plus metastability filter
*
time safe
value safe
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 34
Arbiter: Principle
purpose: manage access of clients to shared ressource(s)
method: handle pairs of request_in / grant_out
on the client side on the ressource side
client requests may arrive in any order arbiter must assign one ressource
to only one client at a time (respond to the first requester)
=> needs Mutual Exclusion (MUTEX)
© A. Steininger & M. Delvai / TU Vienna 35
Arbiter: Function
Lecture "Advanced Digital Design"
C1r
C2r
C1g
C2g
R1g
R1rClient 1
Client 2
CommonResource 1
can have more than two clients: “multiway arbiter”
can have more than one resource
*
PhD Naqvi: Fault Tolerant NoC (incl . Arbiter)
© A. Steininger & M. Delvai / TU Vienna 36
Arbiter: Operation
Lecture "Advanced Digital Design"
C1r
R1r
C2r
R1g
C1g
C2g
*
© A. Steininger & M. Delvai / TU Vienna 37
Arbiter: Circuit
Lecture "Advanced Digital Design"
MUTEX
client 1
client 2
Common Resource
R1g
R1r
C
C
r1
r2
g1
g2
C1r
C2r
C1g
C2g
allow one request at a time only
delay request until previous cycle finished
merge requests
relay grant to requester
keep grant alive until resource disables it
*
© A. Steininger & M. Delvai / TU Vienna 38
Tree Arbiter
Lecture "Advanced Digital Design"
C1r
C2r
C1g
C2g
R1g
R1r
Client 1
Client 2
CommonResource
can add further tree levels to handle more clients
C1r
C2r
C1g
C2g R1g
R1r
C1r
C2r
C1g
C2g
R1g
R1rClient 3
Client 4
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 39
Data-Driven Clocking
Principle: as soon as new data arrive => start clocking determine number k of clock cycles required to
process new data stop clocking after k cycles, wait for next data
Properties: need to switch clock on and off
=> beware spurious clock pulses! no metastability problem: data stable as soon as
consumer clock starts potential for power saving useful for very specific applications only (no pipe!)
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 40
Data-Driven Clock: Circuit
CLK out
D
CLK out
CLK half period deter-mined by D
D
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 41
Data-Driven Clock: Circuit
D
C
REQ
ACK
CLK out
REQ
ACK
transition on REQ answered by transition on CLK out
min CLK half period deter-mined by D
CLK out
D
met
asta
bilit
y?
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 42
Pausible Clocking
Principle: producer requests consumer‘s clock to pause data are provided to input register during idle time consumer‘s clock may resume
free running („pausible clock“) with one cycle only („stoppable clock“)
Properties: need to switch clock on and off
=> beware spurious clock pulses!=> beware of clock tree delays!
producer controls consumer‘s clock (blocking!) applications must be able to cope with paused clock
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 43
Pausible Clock: Circuit
D
C
REQ
ACK
CLK out
REQ
ACK
inverter generates next REQ from ACK
self-oscillation
CLK out
D
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 44
Pausible Clock: Circuit
D
C
REQ’
ACK’ external unit can safely stop CLK by activating REQ’
… and gets ACK’ as a response
CLK out
CLK out
REQ’
ACK’
Mu-tex
Dmetasta
bility?
*
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 45
Pausible Clock: Circuit
C
REQ’
ACK’ external unit can safely stop CLK by activating REQ’
… and gets ACK’ as a response
CLK out
Mu-tex
*
D
timing loop control loop
redraw: replace inverters by inverted C-element output
Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 46
Pausible Clock: n Clients
C
REQ1ACK1
CLK out
Mu-tex
for more external sources an arbiter can be added before the Mutex
Arb-iter REQ2
ACK2
R. Mullins and S. Moore “Demystifying Data-Driven and Pausible Clocking Schemes”, Proc. 13th Intl. Symp. on Advanced Research in Asynchronous Circuits and Systems (ASYNC), 2007 pp. 175–185
D
© A. Steininger & M. Delvai / TU Vienna 47
Pausible Clock vs. Crystal
pros: cheap to implement internally no extra pins no mechanical issues (acceleration) stoppable
cons: arbiter is no standard cell frequency is not as stable (PVT) frequency is not as high lacking tool support
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna 48
Stoppable Crystal Clock?
simply gating crystal clock to stop it …leads to glitches need a synchronizer for the stop input Gain of stoppable clock solution versus
pure synchronizer (REQ/ACK) approach?
Lecture "Advanced Digital Design"
clk_inclk_out
stop
sync
*
[R. Najvirt, A. Steininger. Equivalence of Clock Gating and Synchronization with Applicability to GALS Communication. PATMOS 2014]
© A. Steininger & M. Delvai / TU Vienna 49
Metastability Comparison
pausible clock (ring oscillator) can be safely stopped through mutex
phase adapts to gate signal upon start => no metastability issue
stoppable clock (crystal oscillator) could be safely stopped through mutex
phase does NOT adapt to gate signal upon start => metastability issue
cannot use mutex, since BOTH edges of gate need sync
Lecture "Advanced Digital Design"
Time safe or value safe?
Time safe need result at given point in time
accept the risk that FF has not yet decided
example: synchronizer
Value safe take result only after decision
accept that there is no time bound for this
example: mutex
50
FR = 0
© A. Steininger & M. Delvai / TU Vienna 51
How far you can get…
use the original circuit for pausible clocking
this will allow MS free switching on and off
add a C-element to the timing loop
this will eventually adjust CLK out to ref CLK,if ref CLK is slightly slower than free running clock
delay D guarantees min pulse width after switching on
Lecture "Advanced Digital Design"
C
REQ’
ACK’CLK out
Mu-tex
D
ref CLK
*
C
© A. Steininger & M. Delvai / TU Vienna 52
Practical Implementation
use 3 parallel loops to allow for more resolution time[R. Najvirt, A. Steininger. How to synchronize a Pausible Clock to a Reference, ASYNC 2015]
Lecture "Advanced Digital Design" *
© A. Steininger & M. Delvai / TU Vienna 53
Conventional TMR
Advantages: mask all single
faults
Drawbacks: single clock source no recovery
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna 54
GALS-TMR
Lecture "Advanced Digital Design"
use independent clock => avoid single point of failure cannot do concurrent voting, since operation not in sync use voting over FF state at predefined intervals instead
PhD Lechner: Fault Tolerant GALS Architecture
© A. Steininger & M. Delvai / TU Vienna 55
GALS-TMR Details
every nth clock cycle stop own clock synchronize with others perform recovery step
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna 56
Summary (1)
The generally used MTBU formula does not assume any knowledge about the input signal and its relation to the clock. In practice, such knowledge can often be exploited to optimize the synchronizer.
Synchrony is not a binary property, there is a range of globally synchronous, mesochronous, plesiochronous and heterochronous systems.
Asynchronous systems are tolerant against delays, while synchronous systems are not. The GALS approach therefore makes long-term communication asynchronous, while retaining the efficient and well proven synchronous paradigm for locally restricted islands.
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna 57
Summary (2)
GALS allows choosing the most appropriate clock for each island.
Communication in GALS can be based on synchro-nizers, shared memory, FIFO or pausible clocking.
A data driven clock is activated on demand only when data arrives to be processed.
A pausible clock can be stopped on demand. This is useful in GALS when moving data from one domain to the other, as it confines the potential for metastability to the arbiter.
Even a fault-tolerant TMR solution based on pausible clocks can be implemented that avoids the clock source as a single point of failure.
Lecture "Advanced Digital Design"
© A. Steininger & M. Delvai / TU Vienna 58
Summary (3)
For the Muller C-Element, if both inputs match the output will assume the same value.
The purpose of a MUTEX element is to select one among two (or more) possibly concurrent client requests. It may remain undecided for an arbitrary time, but never select more than one clients.
The purpose of an arbiter is to grant access to one (or more) resource(s) shared between two (or more) clients. Again access must be granted to one client at a time only.
Lecture "Advanced Digital Design"