clocking and timing in fault-tolerant systems-on-chip

63
Clocking and Timing in Fault-Tolerant Systems-on-Chip Andreas Steininger

Upload: brigid

Post on 23-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

Clocking and Timing in Fault-Tolerant Systems-on-Chip. Andreas Steininger. Outline. The Clock as a Blessing The Clock as a Curse Alternative Synchronization Schemes GALS fully asynchronous the DARTS approach Conclusion. Contributors to this Work. The DARTS project team - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Clocking and Timing in Fault-Tolerant Systems-on-Chip

Andreas Steininger

Page 2: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Outline

• The Clock as a Blessing• The Clock as a Curse• Alternative Synchronization Schemes

- GALS- fully asynchronous- the DARTS approach

• Conclusion

2

Page 3: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Contributors to this Work

The DARTS project team

TU Vienna Gottfried FuchsMatthias FueggerUlrich SchmidThomas Handl

RUAG Space Gerald KempfManfred SustWolfgang Zangerl

3

Page 4: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Need for Fault Tolerance

miniaturization is key to progress in VLSI=> smaller structures=> lower voltage swing=> smaller critical charge=> higher operating frequencies

…result in higher susceptibility to faults (SET, EMI,…)

=> cannot avoid faults, need to tolerate them

4

Page 5: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Role of Time

“The only reason for time is so that everything doesn’t happen at once”, Albert Einstein

5

Page 6: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Need for Clocking

activities need to be co-ordinated• on system level (braking of wheels, …)• on algorithmic level (consensus, …)• on communication level• on logic level (state machine switching,…)

co-ordination in the time domain (synchronization) is an efficient way to attain this=> need a global notion of time (discrete „ticks“)

6

Page 7: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Quality of Synchronization

real time

local time (number of ticks)

precision π

7

Page 8: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Typical Precision Values

on system level: ms … mson algorithm level: ms … mson communication level: ns … mson logic level: ps … ns

8

Page 9: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Synchronization Requirements

9

phase synchronisation(for „hardware clock“

on logic level)

clock synchronisation(for distributed time base

on algorithmic level)

1ms is excellent precision for distributed clock

at 1GHz this means 360.000° phase shift

Page 10: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Globally Synchronous Design

• whole design is „isochronic“ („perfect“ precision)• time conveyed by clock transitions• perfect co-ordination of all activities

• very efficient design• can assume consistent states• high level of abstraction

• very efficient implementation:• single crystal oscillator• single control line (clock net)

10

Page 11: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

„Isochronic“ Regions ?

speed of light (in medium) = 2 x 108 m/s = 20cm/ns

11

2cm

Ref

1GHz

4GHz

8GHz

Page 12: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Variation Problem

12

Designer

system model

projected conditions

User

actual conditions

actual system

worst case

safety margins

?(unknown)

?(imperfections)

Timing completely fixed after designNo way to react to actual conditions & system („PVT variations“)

Page 13: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Fault-Tolerant Architectures

Duplication & Comparison

Triple-Modular Redundancy

13

FU

FU=?

ERR

FU

FU

vo-ter

YFU

Page 14: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Lock-Step Operation

single clock

14

„3“ „4“

„3“ „4“

single point of failure good replica determinism

FU

FU

vo-ter

YFU

„3“ „4“

Page 15: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Lock-Step Operationindependent clocks

15

„3“ „4“

„3“ „4“

single fault tolerant bad replica determinism

FU

FU

vo-ter

YFU

„3“ „4“

Page 16: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Fault-Tolerant HW-Clocking

16

FU

FU

vo-ter

YFU

v

v

v

Page 17: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Fault-Tolerant HW-Clocking

17

FU

FU

vo-ter

YFU

v

v

v

D

D

?

?

Page 18: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Charme of SoCs

billions of transistors fit on one die=> structuring into (IP) modules

„System-on-Chip“BUT:• large clock distribution networks => „isochronic“??• FT clocking does not work with large skew• may need individual clocks for function modules

=> clock-synchrony neither attainable nor desirable

18

Page 19: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Co-ordination of Data Exchange

19

SRC SNK f(x)

When it is valid and consistent

When SNK has consumed the previous one

When can SNK use its input?

When can SRC apply the next input?

Page 20: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Synchronous Approach

20

SRC SNK f(x)

co-ordination based on (global) time

Page 21: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Alternative: Asynchronous Design

21

SRC SNK f(x)

co-ordination based on handshaking

REQ: „Data word valid, you can use it“

ACK: „Data word consumed, send the next“

Page 22: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Advantages

• closed-loop control makes timing much more robust and adaptive to PVT variations

• no need for worst-case timing• local handshakes replace global clock• activity only when needed• beneficial for EMI• tends to stop operation in case of fault

22

Page 23: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Disadvantages

• Need to handle race between REQ and data

23

Page 24: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Disadvantages

• Need to handle race between REQ and data

24

SRC SNK f(x)

REQ: „Data word valid, you can use it“

Page 25: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Disadvantages

• Need to handle race between REQ and dataSolution 1: „Bundled Data“

25

SRC SNK f(x)

REQ: „Data word valid, you can use it“

Page 26: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Disadvantages

• Need to handle race between REQ and dataSolution 2: „Delay Insensitive“ (Coding)

26

SRC SNK f(x)

REQ: „Data word valid, you can use it“

Completion detection

Page 27: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Async. Design – Disadvantages

• Need to handle race between REQ and data• significant HW overhead (coding, delay elements)• „adaptive“ timing not as predictable• more difficult to design• classical fault-tolerance schemes not applicable• tends to stop operation in case of fault

27

Page 28: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Best of Both Worlds

GALS: Globally Asynchronous Locally Synchronous

28

retain efficiency of synchronous design wherever possible:„intra-module“

use asynchronousprinciple whereclock distributiontoo cumbersome:„inter-module“

First mention in PhD thesis by Chapiro / Stanford 84

Page 29: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

A GALS Example

29

CPU2GHz

PCI-IF533MHz

DSP2,7GHz

USB-IF24MHz

Page 30: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Communication in GALS

Shared Memoryproducer writes to memory, consumer reads from therepro: control flow stays independent• shared single-port memory • true dual-port memory

Direct Messages (Data words)move data word from producer‘s output register to consumer‘s input register• non-buffered / buffered (FIFO-queues)• clock fixed, data-driven or pausible

30

Page 31: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Shared Memory

decoupling of clock domains by memory acting as a third party => high area overhead => unusual

for single port memory arbitration required• arbitration problem (unbounded delay…)• one side may block the other at the arbiter

for multiport memory problems are confined to access to the same cell• busy flag may become metastable• blocking still possible for one specific address

31

Page 32: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Shared Memory

32

CPU2GHz

shared memory

Arbi-tration

0xff14

DSP2,7GHz

• perfect decoupling of data path

• potential metastability problems at arbitration logic

• potential blocking through arbitration

Page 33: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Direct Messagesclock domain boundary is between producer‘s output register

and consumer‘s input register

in general a synchronizer is needed at consumer‘s input• definitely for conventional (fixed) clock• can be avoided by data-driven / pausible clocking

control flows of producer and consumer are strongly coupled: not maintaining the input/output register blocks other party

buffers/queues/FIFOs can • mitigate, but not avoid this problem (full/empty)• compensate variations in the data rate on both sides, but not

different average data rates33

Page 34: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Direct Messages

data moving over clock domain boundarymetastability problems=> need to insert handshake…with synchronizers

34

S

0xff14

CPU2GHz

DSP2,7GHz

S

and (optional) buffers

Page 35: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Arbiter: Principle

purpose: ○ manage concurring requests to shared resource

method: ○ handle pairs of request_in / grant_out ○ requests may arrive in any order ○ arbiter must activate only one grant_out at a

time (respond to the first requester)

Mutual Exclusion (MUTEX)

problem: ○ resolve concurrent requests=> metastability problem

35

Page 36: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Arbiter: Circuit

36

„Metastability filter“: e.g., hi-threshold inverter

[from D. J. Kinniment „Synchronization and Arbitration in Digital Systems“, Wiley]

MUTEX-element: SR-latch

G1’

G2’

R1

R2

G1

G2

Vout,FF

t

Vth,inv

Vmeta

Page 37: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Arbiter: Operation

37

R1

G1

R2

G2

G1’

G2’

R1

R2

G1

G2

Page 38: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Muller C-Element

38

RS

reset

set

a

b

y

IF a = bTHEN y = aELSE hold yC

a b

y

Ca

by

Page 39: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Muller C-Element: Circuit

39

[Alan Martin, Caltech]

Page 40: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Data-Driven Clocking

Principle:○ as soon as new data arrive => start clocking○ determine number k of clock cycles

required to process new data

○ stop clocking after k cycles, wait for next data

Properties: ○ need to switch clock on and off => beware spurious clock pulses!

○ no metastability problem: data stable as soon

as consumer clock starts○ potential for power saving○ useful for specific applications only (no

pipe!)

40

Page 41: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Data-Driven Clock: Circuit / 1

41

CLK out

D

CLK out

CLK half period determined by D

D

Page 42: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Data-Driven Clock: Circuit / 2

42

D

C

REQ

ACK

CLK out

REQ

ACK

transition on REQ answered by transition on CLK out

min CLK half period deter-mined by D

CLK out

D

Page 43: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Pausible Clocking

Principle:○ producer requests consumer‘s clock to pause○ data provided to input register during idle

time○ consumer‘s clock may resume

- free running („pausible clock“)- with one cycle only („stoppable clock“)

Properties: ○ need to switch clock on and off => beware spurious clock pulses!=> beware of clock tree delays!

○ producer controls consumer‘s clock (blocking!)

○ applications must cope with paused clock43

Page 44: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Pausible Clock: Circuit / 1

44

D

C

REQ

ACK

CLK out

REQ

ACK

inverter generates next REQ from ACK

self-oscillation

CLK out

D

Page 45: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Pausible Clock: Circuit / 2

45

D

C

REQ’ACK’ external unit can

safely stop CLK by activating REQ’

… and gets ACK’ as a response

CLK out

CLK out

REQ’

ACK’

Arb

D

Page 46: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Pausible Clock: Circuit / 3

46

D

C

REQ1ACK1

for more external sources arbiters can be added and “anded” before the Muller C-Element

the two inverters can be eliminated by using a Muller C-Element with inverting output

CLK outArb REQn

ACKn

Arb

Page 47: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Advantages of GALS

• synchronous islands can be designed efficiently• modules operate independently• can use module specific-clock & timing• clocking is no single point of failure

47

Page 48: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Problems with GALS

• operation of modules not (inherently) co-ordinatedsynchrony for communication but not on system / algorithm level

• communication has to cross clock boundaries• potential for metastability

=> performance penalty through synchronizers OR => module must handle irregular clocking

48

Page 49: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The DARTS Idea

49

phase synchronisation

tick synchronisation

clock synchronisation

Distributed Algorithms for Robust Tick Synchronization

Page 50: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

TG-AlgsFu1

Data Bus

Fu3

Fu2

TG-Net

The DARTS Approach

Concept: Multiple synchronized tick generators Method: Distributed algorithm for fault-tolerant

tick generation implemented in (asynchronous) digital logic

Advantages- No crystal oscillator(s)- No critical clock tree- Clock is no single point of failure! - Reasonable synchrony

50

Page 51: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The DARTS Principle

51

Every function unit Fui augmented with simple local clock unit (TG-Alg)

TG-Algs communicate over dedicated TG-Net to generate tick-synchronized local clock signals

Up to f TG-Algs can be Byzantine faulty need n ≥ 3f + 2 TG-Algs

Fu1

Fu2

Fu3

data bus

Clock tree

TG-Algs

TG-Net

DARTS clocksStandard synchronous clocking

Formally proven

synchronization properties

Page 52: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

A Comparison

52

TG-AlgsFu1

Data Bus

Fu3

Fu2

TG-Net

tick(3) tick(4)

Fu1 clk

Fu2 clk52

global synchrony (< 1 tick)

synchronous SoC GALSDARTS

Fu1Data Bus Fu3

Fu2

Oscillator

Oscillator

Oscillator

Clo

ck

Tree

Oscillator

Fu1

Data Bus Fu3

Fu2

single point of failure

global synchrony (potentially 1 tick)

no single point of failure

no single point of failure NO (inherent) global synchrony

Page 53: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The Distributed Algorithm

(1) Initially:(2) send tick(0) to all; clock:= 0;(3) “Relay Rule”(4) If received tick(m) from at least f+1 remote nodes and m > clock:(5) send tick(clock+1),…, tick(m) to all [once]; clock:= m;(6) “Increment Rule”(7) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(8) send tick(m+1) to all [once]; clock:= m+1;

[Srikanth & Toueg, 87]

TG-Alg 1

TG-Alg 6

TG-Alg 5

TG-Alg 4

TG-Alg 3

TG-Alg 2

TG-Net

Page 54: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Implementation Challenges

54

(1) Initially:(2) send tick(0) to all; clock:= 0;(3) “Relay Rule”(4) If received tick(m) from at least f+1 remote nodes and m > clock:(5) send tick(clock+1),…, tick(m) to all [once]; clock:= m;(6) “Increment Rule”(7) If received tick(m) from at least 2f+1 remote nodes and m >= clock:(8) send tick(m+1) to all [once]; clock:= m+1;

Replacement by zero-bit messages

k-bit messagesk unbounded Atomicity of actions

To be ensured by the architecture and delay constraints

Thresholds functions for fault tolerance

Glitch-free asynchronous implementation

TICK(k)

TICK(k-1)

...

TICK(1)

TICK(0)

k-bit msg vs. zero-bit tick

Software-based algorithm

Page 55: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

The DARTS Prototype

55

ASIC design:

• radhard 180nm technology

• 2 designs:- flexible- fast

Prototype board:8 chips plus fixed & programmable interconnect

Page 56: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Proof of Concept

56

Page 57: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Frequency Stability (Warm-up)

57

0 2 4 6 8 10 12 14 16 1853.15

53.2

53.25

53.3

53.35

53.4

53.45

time in [hours]

frequ

ency

in [M

Hz]

Page 58: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Frequency Stability (detail)

58

0 5 10 1551.94

51.96

51.98

52.0

time in [min]

frequ

ency

in [M

Hz]

0 5 10 151.7968

1.7970

1.7972

1.7974

core

vol

tage

in [V

]

Page 59: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

DARTS – General Properties

Fully asynchronous implementation NO oscillators

Tolerates up to three Byzantine faulty nodes(configurable number of TG-Algs; 5 to 12)

Adapts to operating conditions (asynchronous logic)

59

Page 60: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Still Room for Improvements

o Transient faults are permanently stored in the elastic pipelines

o No on-the-fly integration of TG-Algo Relatively low clock speedo Interfacing to traditional synchronous designso Scaling with number of faults is costly

60

Page 61: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Summary: Trends & Needs

• Preceding miniaturization necessitates fault tolerance

• Co-ordinaton of activities is fundamental, thus tight synchrony is a desirable feature on all levels

• SoCs are large modular designs on a single die

61

Page 62: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

Summary: SoC Clocking

• globally synchronous clock:+ ideal synchrony, efficient in design & implementation- isochrony unrealistic, single point of failure

• DARTS clock+ best attainable global synchrony, adaptive timing, FT- high implementation efforts, frequency not stable

• GALS+ uses best of syn & asyn, indep. & module-specific clock- no global synchrony, metastability issues

• asynchronous design+ power-efficient, robust against faults & PVT- high overheads, difficult to design, timing hard to predict

62

Page 63: Clocking  and Timing in Fault-Tolerant Systems-on-Chip

More information on DARTS

http://ti.tuwien.ac.at/ecs/research/projects/darts

63