advanced digital design gals design andreas steininger vienna university of technology

58
Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Upload: georgiana-nash

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Advanced Digital DesignGALS Design

Andreas SteiningerVienna University of Technology

Page 2: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 2Lecture "Advanced Digital Design"

Outline

Global synchrony & clock distribution types of synchrony

The GALS approach communication synchronization Muller C-Element, Mutex & Arbiter data driven clock & pausable clock TMR example with pausible clock

Page 3: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 3

Even/Odd Synchronizer

works for two periodic clocks only with frequency ratio within certain range

avoids performance penalty of synchronizers largely eliminates potential for metastability for details see

[Dally & Tell, The Even/Odd Synchronizer, ASYNC 2010]

Page 4: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 4

Types of Synchrony (1)

synchronous identical frequency, constant phase relation classical synchronous system driven by one clock source

mesochronous identical frequency (no accumulating drift) but

unknown, constant phase shift (bounded) example: unbalanced clcok tree

multisynchronous identical frequency (no accumulating drift) but

unknown, varying phase relationship (bounded) example: jittering PLLs driven by the same source

Page 5: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 5

Types of Synchrony (2)

ratiochronous fixed (known) frequency ratio, identical source example: source clock divided by different values

plesiochronous same nominal clock frequency, mutual (low) drift independent clock sources with same nominal frequency

heterochronous clocks totally unrelated but periodic independent clock sources with different nominal frequency

aperiodic events arrive totally unrelated to clock sporadic event (pushbutton) needs to be synchronized

un

corr

ela

ted

Page 6: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 6

Synchronization Concepts

Lecture "Advanced Digital Design"

c

rclkdat

tTf

MTBUFR

exp

10

Probability of (not) resolving early enough

This is what the “waiting synchronizers” exploit

FR can never become zero!

Probability of (not) getting into metastability

Specialized synchronizers exploit a priori knowledge

FR can become zero, if edges properly aligned!

Page 7: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 7

Data Delay Synchronizer

delay data to match FF timing requirements need to determine appropriate delay

static, a priori (mesochronous, ratiochronous)

dynamic, by phase measurement & prediction (plesiochronous)

D

CLK

R dataD

T data

j - estimate R clk

Page 8: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 8

Clock Delay Synchronizer

delay R clock to match FF timing requirements need to determine appropriate delay

static, a priori (mesochronous, ratiochronous)

dynamic, by phase measurement & prediction (plesiochronous)

D

CLK

R data

R clkD

T data

j - estimate

Page 9: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 9

Buffering Synchronizer

transmitter writes data to buffer with T clk receiver reads data from buffer with R clk pointers are maintained to avoid collision

needs clock domain crossing => metastability! register w REQ/ACK, FIFO, ring buffer buffer size determines elasticity

D

CLK

R dataT data

R clkT clk

Bu

ffer

Page 10: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 10

Global Synchrony?

Problem 1: Clock distribution Low-skew clock distribution becomes difficult for

large chips and high frequencies Clock networks consume a considerable share of the

power

Problem 2: Clock selection SoC contains many IPs, each specified for its own

frequency specific frequencies required for some functions

(interface standards, e.g.) dynamic local changes due to voltage & frequency

scaling, clock & power gating

Page 11: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Globally Synchronous Design

whole design is „isochronic“ time conveyed by clock transitions system-wide co-ordination of activities assumes perfect clock distribution

11

Page 12: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 12

Clock Distribution

TRGsrc

TRGsnk

tCO tpd tCO

v a l i d v a l i d

synchronous approach:

clock skew 1

setup violation

*

Page 13: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 13

Clock Distribution

TRGsrc

TRGsnk

tCO tpd tCO

v a l i d a l i d

synchronous approach:

clock skew 2

hold violation

*

Page 14: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 14

Clock Distribution

TRGsrc

tCO tpd

v a l i d

asynchronous approach:

REQ delay

REQ

completion detection

ACK

TRGsrc

TRGsnk

tCO

v a l i d

ACK

*

Page 15: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 15

Clock Distribution

TRGsrc

tCO tpd

v a l i d

asynchronous approach:

ACK delay

REQ

completion detection

ACK

TRGsnk

ACK

TRGsrc

tCO

v a l i d

*

Page 16: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 16

Clock Distribution

TRGsrc

tCO tpd

a l i d

asynchronous approach:

data delay

ACK

REQ

completion detection

TRGsrc

TRGsnk

tCO

v a l i d

ACK

*

Page 17: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 17

The GALS Approach

SoC is clearly structured into IPs anyway

run each at its desired individual frequency=> synchronous islands efficient, well understood

communication between IPs has to bridge clock boundaries may run over larger distances

=> asynchronous paradigm (handshake- based) better suited for composition

Globally Asynchronous Locally Synchronous (GALS)

First mention in PhD thesis by Chapiro / Stanford 84

Page 18: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 18

A GALS Example

CPU2GHz

PCI-IF533MHz

DSP2,7GHz

USB-IF24MHz

Page 19: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 19

Communication in GALS

Boundary Synchronizers direct data exchange controlled by handshake => synchronizer

Shared Memory data exchange decoupled through memory shared memory needs arbitration

Dual-Clock FIFOs data exchange buffered through FIFO-queue status flags need synchronization

Local Clock Stretching direct data exchange sender can halt receiver clock while data in transition

Page 20: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 20

Boundary Synchronizers

data moving over clock domain boundary metastability problems => need to insert handshake …with synchronizers and (optional) buffers control flow sender / receiver strongly coupled handshake loop limits speed

S

0xff14

CPU2GHz

DSP2,7GHz

S

*

Page 21: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 21

Shared Memory

perfect decoupling of data path potential metastability problems at arbitration logic potential blocking through arbitration low speed, high efforts => rarely used

CPU2GHz

shared memory

Arbi-tration

0xff14

DSP2,7GHz

*

Page 22: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 22

Clocked FIFO

good decoupling of data path potential metastability problems with pointer mgmt. potential blocking through full / empty high speed, high efforts (reg array)

CPU2GHz reg array

0xff14

DSP2,7GHz

*

SPointer mgmt emptyfull

Page 23: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 23

Pausible Clocking

SRC: request SNK to stop clock SNK: acknowldege stopping of clock

open data latch (safe now!) SRC: release SNK clock blocking SNK: release ACK, close data latch

start clocking (data stable now!)

CPU2GHz

DSP2,7GHz

*

pausible clock

latch0xff14

Page 24: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 24

Pausible Clocking

coupling of data path potential metastability problems with pausible clock potential blocking through handshake for pausing high speed, moderate efforts (pausible clock) receiver clock distribution delay may cause problems

CPU2GHz

0xff14

DSP2,7GHz

*

pausible clock

latch

Page 25: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 25Lecture "Advanced Digital Design"

Fundamental Asynchronous Building Blocks

will be needed for pausible clocking (and others) …

Page 26: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 26Lecture "Advanced Digital Design"

Muller C-Element

RS

reset

set

a

b

y

IF a = bTHEN y = aELSE hold y

Ca b

y

Ca

by

David Eugene Muller (1924 – 2008), Professor at Univ. of Illinois:Muller, D. E.; Bartky, W. S. (1959), "A Theory of Asynchronous Circuits", Proc. Int'l Symp. Theory of Switching, Part 1 (Harvard Univ. Press): 204–243

Page 27: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 27

Function of a MCE

consider a MCE with n inputs

AND for transitions need a on all inputs for a output need a on all inputs for a output

n-of-n threshold gate change output only if all inputs agree on changing

voter keep old state until agreement on change

memory element storage loop like D-latch, different input stack

Lecture "Advanced Digital Design" *

Page 28: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 28Lecture "Advanced Digital Design"

Muller C-Element: Circuit

[Sutherland]

[Martin]

[van Berkel]

Page 29: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 29

Mutual Exclusion

purpose: decide order of asynchronous events

function: handle pairs of request_in / grant_out requests may arrive in any order MUTEX must activate only one grant_out at a time

(respond to the first requester)

problem: resolve concurrent requests

=> metastability problem

r1

r2

g1

g2

Page 30: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 30

MUTEX: Circuit

SR-latch

g1’

g2’

r1

r2

g1

g2

„Metastability filter“: e.g., lo-threshold inverter

[from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley]

Vout,latch

t

Vth,inv

Vmeta

BUT: Doesn’t a lo-threshold inverter produce glitches?

*

Vout,inv

Page 31: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 31

Popular MUTEX Implem.

C.L. Seitz, Ideas about arbiters, Lambda, 1 (fi rst quarter):10–14, 1980.

*

SR-latch metastability filter

Page 32: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 32

MUTEX: Operation

g1’

g2’

r1

r2

g1

g2

Vout,FF

t

Vth,inv

Vmeta

r1

g1

r2

g2

4-phase protocol

Page 33: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 33

MUTEX vs. Synchronizer

Synchronizer purpose: “safeness”: important: freedom: circuit:

MUTEX purpose: “safeness”: important: freedom: circuit:

Lecture "Advanced Digital Design"

synchronize asynchronous input

serialize concurrent requests

fast resolution in both directions

never activate both grants

final decision not important

infinite resolution time

flip flop (special design)

SR-latch plus metastability filter

*

time safe

value safe

Page 34: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 34

Arbiter: Principle

purpose: manage access of clients to shared ressource(s)

method: handle pairs of request_in / grant_out

on the client side on the ressource side

client requests may arrive in any order arbiter must assign one ressource

to only one client at a time (respond to the first requester)

=> needs Mutual Exclusion (MUTEX)

Page 35: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 35

Arbiter: Function

Lecture "Advanced Digital Design"

C1r

C2r

C1g

C2g

R1g

R1rClient 1

Client 2

CommonResource 1

can have more than two clients: “multiway arbiter”

can have more than one resource

*

PhD Naqvi: Fault Tolerant NoC (incl . Arbiter)

Page 36: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 36

Arbiter: Operation

Lecture "Advanced Digital Design"

C1r

R1r

C2r

R1g

C1g

C2g

*

Page 37: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 37

Arbiter: Circuit

Lecture "Advanced Digital Design"

MUTEX

client 1

client 2

Common Resource

R1g

R1r

C

C

r1

r2

g1

g2

C1r

C2r

C1g

C2g

allow one request at a time only

delay request until previous cycle finished

merge requests

relay grant to requester

keep grant alive until resource disables it

*

Page 38: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 38

Tree Arbiter

Lecture "Advanced Digital Design"

C1r

C2r

C1g

C2g

R1g

R1r

Client 1

Client 2

CommonResource

can add further tree levels to handle more clients

C1r

C2r

C1g

C2g R1g

R1r

C1r

C2r

C1g

C2g

R1g

R1rClient 3

Client 4

Page 39: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 39

Data-Driven Clocking

Principle: as soon as new data arrive => start clocking determine number k of clock cycles required to

process new data stop clocking after k cycles, wait for next data

Properties: need to switch clock on and off

=> beware spurious clock pulses! no metastability problem: data stable as soon as

consumer clock starts potential for power saving useful for very specific applications only (no pipe!)

Page 40: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 40

Data-Driven Clock: Circuit

CLK out

D

CLK out

CLK half period deter-mined by D

D

Page 41: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 41

Data-Driven Clock: Circuit

D

C

REQ

ACK

CLK out

REQ

ACK

transition on REQ answered by transition on CLK out

min CLK half period deter-mined by D

CLK out

D

met

asta

bilit

y?

*

Page 42: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 42

Pausible Clocking

Principle: producer requests consumer‘s clock to pause data are provided to input register during idle time consumer‘s clock may resume

free running („pausible clock“) with one cycle only („stoppable clock“)

Properties: need to switch clock on and off

=> beware spurious clock pulses!=> beware of clock tree delays!

producer controls consumer‘s clock (blocking!) applications must be able to cope with paused clock

Page 43: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 43

Pausible Clock: Circuit

D

C

REQ

ACK

CLK out

REQ

ACK

inverter generates next REQ from ACK

self-oscillation

CLK out

D

*

Page 44: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 44

Pausible Clock: Circuit

D

C

REQ’

ACK’ external unit can safely stop CLK by activating REQ’

… and gets ACK’ as a response

CLK out

CLK out

REQ’

ACK’

Mu-tex

Dmetasta

bility?

*

Page 45: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 45

Pausible Clock: Circuit

C

REQ’

ACK’ external unit can safely stop CLK by activating REQ’

… and gets ACK’ as a response

CLK out

Mu-tex

*

D

timing loop control loop

redraw: replace inverters by inverted C-element output

Page 46: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Lecture "Advanced Digital Design" © A. Steininger / TU Vienna 46

Pausible Clock: n Clients

C

REQ1ACK1

CLK out

Mu-tex

for more external sources an arbiter can be added before the Mutex

Arb-iter REQ2

ACK2

R. Mullins and S. Moore “Demystifying Data-Driven and Pausible Clocking Schemes”, Proc. 13th Intl. Symp. on Advanced Research in Asynchronous Circuits and Systems (ASYNC), 2007 pp. 175–185

D

Page 47: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 47

Pausible Clock vs. Crystal

pros: cheap to implement internally no extra pins no mechanical issues (acceleration) stoppable

cons: arbiter is no standard cell frequency is not as stable (PVT) frequency is not as high lacking tool support

Lecture "Advanced Digital Design"

Page 48: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 48

Stoppable Crystal Clock?

simply gating crystal clock to stop it …leads to glitches need a synchronizer for the stop input Gain of stoppable clock solution versus

pure synchronizer (REQ/ACK) approach?

Lecture "Advanced Digital Design"

clk_inclk_out

stop

sync

*

[R. Najvirt, A. Steininger. Equivalence of Clock Gating and Synchronization with Applicability to GALS Communication. PATMOS 2014]

Page 49: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 49

Metastability Comparison

pausible clock (ring oscillator) can be safely stopped through mutex

phase adapts to gate signal upon start => no metastability issue

stoppable clock (crystal oscillator) could be safely stopped through mutex

phase does NOT adapt to gate signal upon start => metastability issue

cannot use mutex, since BOTH edges of gate need sync

Lecture "Advanced Digital Design"

Page 50: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

Time safe or value safe?

Time safe need result at given point in time

accept the risk that FF has not yet decided

example: synchronizer

Value safe take result only after decision

accept that there is no time bound for this

example: mutex

50

FR = 0

Page 51: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 51

How far you can get…

use the original circuit for pausible clocking

this will allow MS free switching on and off

add a C-element to the timing loop

this will eventually adjust CLK out to ref CLK,if ref CLK is slightly slower than free running clock

delay D guarantees min pulse width after switching on

Lecture "Advanced Digital Design"

C

REQ’

ACK’CLK out

Mu-tex

D

ref CLK

*

C

Page 52: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 52

Practical Implementation

use 3 parallel loops to allow for more resolution time[R. Najvirt, A. Steininger. How to synchronize a Pausible Clock to a Reference, ASYNC 2015]

Lecture "Advanced Digital Design" *

Page 53: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 53

Conventional TMR

Advantages: mask all single

faults

Drawbacks: single clock source no recovery

Lecture "Advanced Digital Design"

Page 54: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 54

GALS-TMR

Lecture "Advanced Digital Design"

use independent clock => avoid single point of failure cannot do concurrent voting, since operation not in sync use voting over FF state at predefined intervals instead

PhD Lechner: Fault Tolerant GALS Architecture

Page 55: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 55

GALS-TMR Details

every nth clock cycle stop own clock synchronize with others perform recovery step

Lecture "Advanced Digital Design"

Page 56: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 56

Summary (1)

The generally used MTBU formula does not assume any knowledge about the input signal and its relation to the clock. In practice, such knowledge can often be exploited to optimize the synchronizer.

Synchrony is not a binary property, there is a range of globally synchronous, mesochronous, plesiochronous and heterochronous systems.

Asynchronous systems are tolerant against delays, while synchronous systems are not. The GALS approach therefore makes long-term communication asynchronous, while retaining the efficient and well proven synchronous paradigm for locally restricted islands.

Lecture "Advanced Digital Design"

Page 57: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 57

Summary (2)

GALS allows choosing the most appropriate clock for each island.

Communication in GALS can be based on synchro-nizers, shared memory, FIFO or pausible clocking.

A data driven clock is activated on demand only when data arrives to be processed.

A pausible clock can be stopped on demand. This is useful in GALS when moving data from one domain to the other, as it confines the potential for metastability to the arbiter.

Even a fault-tolerant TMR solution based on pausible clocks can be implemented that avoids the clock source as a single point of failure.

Lecture "Advanced Digital Design"

Page 58: Advanced Digital Design GALS Design Andreas Steininger Vienna University of Technology

© A. Steininger & M. Delvai / TU Vienna 58

Summary (3)

For the Muller C-Element, if both inputs match the output will assume the same value.

The purpose of a MUTEX element is to select one among two (or more) possibly concurrent client requests. It may remain undecided for an arbitrary time, but never select more than one clients.

The purpose of an arbiter is to grant access to one (or more) resource(s) shared between two (or more) clients. Again access must be granted to one client at a time only.

Lecture "Advanced Digital Design"