nanonet’07, catania 17/09/2007 1 asynchronous links, for nanonets? alex yakovlev university of...

78
17/09/20 07 1 NanoNet’07, Catania Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

Upload: nora-matkin

Post on 01-Apr-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 1NanoNet’07, Catania

Asynchronous Links, for NanoNets?Alex Yakovlev

University of Newcastle, UK

Page 2: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 2NanoNet’07, Catania

Motivation-1 At very deep submicron, gate delay is much less than

interconnect delay: total interconnect length can reach several meters; interconnect delay can be as much as 90% of total path delay in VDSM circuits

Timing issue is a problem, particularly for global wires

Source: ITRS, 2003Source: ITRS, 20030.1

1

10

100

250 180 130 90 65 45 32

Feature size (nm)Relativedelay

Gate delay (fanout 4)Local interconnect (M1,2)Global interconnect with repeatersGlobal interconnect without repeaters

Multiple clock domains are reality, problem of interface between them

ITRS’05 predicted: 4x (8x) increase in global asynchronous signalling by 2012 (2020)

Page 3: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 3NanoNet’07, Catania

Motivation-2

Variability and uncertainty– Geometry and process: for long channels intra-die

variations are less correlated for different part of the interconnect, both for interconnects and repeaters

• e.g., M4 and M5 resistance/um massively differ, leading to mistracking (C.Visuweswariah, SLIP’06)

• e.g. 250nm clock skew has 25% variability due to interconnect variations (Y.Liu et.al. DAC’00)

– Behavioural: crosstalk (sidewall capacitance can cause up to 7x variation in delay (R. Ho, M.Horowitz))

Page 4: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 4NanoNet’07, Catania

A Network on Chip

Synchronization required

Arbitration required

Multiple ClocksAsync Links

Page 5: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 5NanoNet’07, Catania

Example from the Past: Fault-Tolerant Self-Timed Ring (Varshavsky et al. 1986)

For an onboard airborne computer-control system which tolerated up to two faults. Self-timed ring was a GALS system with self-checking and self-repair at the hardware level

Individually clocked subsystems

Self-timed adapters forming a ring

Page 6: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 6NanoNet’07, Catania

Communication Channel Adapter

Data (DR,DS) is encoded using 3-of-6 Sperner code (16 data values for half-byte, plus 4 tokens for ring acquisition protocol)AR, AS – acknowledgementsRR, RS – spare (for self-repair) lines

Much higher reliability than a bus and other forms of redundancy

MCC was developed TTL-Schottky gate arrays, approx 2K gates.

Page 7: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 7NanoNet’07, Catania

Outline

Token-based view of communication Basics of asynchronous signalling Self-timed data encoding Pipelining How to hide acknowledgements Serial vs Parallel links Arbiters and routers Async2sync interface CAD issues

Page 8: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 8NanoNet’07, Catania

Data exchange: token-based view

Question 1: when can Rx look at the incoming data?Data validity issue – Forming a well-defined token

source tx rx destData

Page 9: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 9NanoNet’07, Catania

Data exchange: token-based view

Question 1: when can Rx looked at the data?Data validity issue – Forming a well-defined token

Question 2: when can Tx send new data?Acknowledgement issue – Separation b/w tokens

source tx rx destData

Page 10: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 10NanoNet’07, Catania

Data exchange: token-based view

Question 1: when can Rx looked at the data?Data validity issue – Forming a well-defined token

Question 2: when can Tx send new data?Acknowledgement issue – Separation b/w tokens

These are fundamental issues of flow control at the physical and link levels

The answers are determined by many design aspects: technology level, system architecture (application, pipelining), latency, throughput, power, design process etc.

source tx rx destData

Page 11: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 11NanoNet’07, Catania

Tokens and spaces with global clocking

In globally clocked systems both Q1 and Q2 are resolved with the aid of clock pulses

source tx rx destData

clk

Page 12: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 12NanoNet’07, Catania

Tokens and spaces

Without global clocking: Q1 can be resolved differently from Q2

E.g.: Q1 – source-synchronous (mesochronous), bundled data or self-synchronising codes; Q2 – ack or stop signal, or by local timing

source tx rx dest

Data

Clk_tx Clk_rx

D_valid

bundle

Page 13: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 13NanoNet’07, Catania

Tokens and spaces

Without global clocking: Q1 can be resolved differently from Q2

E.g.: Q1 – source-synchronous (mesochronous), bundled data or self-synchronising codes; Q2 – ack or stop signal, or by local timing

source tx rx dest

Data

D_valid

bundleack

ack ack

Page 14: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 14NanoNet’07, Catania

Petri net modelTx Rxsource dest

Tx delay Rx delay

Tx Rxsource dest

Tx delay or ack Rx delay or ack

Data Valid

Data Valid

ack

Always safe but with a round trip delay!

One way delay, but may be unsafe!

Page 15: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 15NanoNet’07, Catania

Asynchronous handshake signalling

Valid data tokens and safe spaces between them can be created by different means of signalling and encoding

Level-based -> Return-To-Zero (RTZ) or 4-phase protocol

Transition-based -> Non-Return-to-Zero (NRZ) or 2-phase protocol

Pulse-based, e.g. GasP Phase-difference-based Data encoding: bundled data (BD), Delay-

insensitive (DI)

Page 16: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 16NanoNet’07, Catania

Handshake Signalling Protocols Level Signalling (RTZ or 4-phase)

Transition Signalling (RTZ or 4-phase)

One cycle

req

ack

req

ack

One cycle

req

ackOne cycle

Page 17: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 17NanoNet’07, Catania

Handshake Signalling Protocols Pulse Signalling

Single-track Signalling (GasP)

One cycle

req

ack

req

ack

One cycle

req + ackreq

ack

Page 18: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 18NanoNet’07, Catania

GasP signalling

Pull up from pred (req)

Pull down here (ack)

Pull up from here (req)

Pull down from succ (ack)

Pulse length control loops

Source: R. Ho et al, Async’04

Page 19: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 19NanoNet’07, Catania

Data encoding

Bundled data– Code is positional binary, token is determined by Req+

signal; Req+ arrives with a safe set-up delay from data Delay-insensitive codes (tokens determined by the

codeword values, require a spacer, or NULL, state if RTZ)– 1-of-2 (Dual-rail per bit) – systematic code, encoding,

decoding straightforward– m-of-n (n>2) – not systematic, i.e. incur encoding and

decoding costs, optimal when m=n/2– One-hot ,1-of-n (n>2), completion detection is easy, not

practical beyond n>4– Systematic, such as Berger, incur complex completion

detection

Page 20: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 20NanoNet’07, Catania

Bundled Data

req

ack

Data

One cycle

req

ack

Data

RTZ:

NRZ:

One cycle

req

ack

Data

One cycle

Page 21: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 21NanoNet’07, Catania

DI encoded data (Dual-Rail)

ack

Data.0

One cycle

Data.1

ack

Data.0Data.1Logical 1

Logical 0

One cycle

NULL (spacer) NULL

cycle

Data.1

ack

Data.0Logical 1

Logical 0

cycle cycle

Logical 1 Logical 1

cycle

RTZ:

NRZ:

Page 22: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 22NanoNet’07, Catania

DI encoded data (Dual-Rail)

ack

Data.0

One cycle

Data.1

ack

Data.0Data.1Logical 1

Logical 0

One cycle

NULL (spacer) NULL

cycle

Data.1

ack

Data.0Logical 1

Logical 0

cycle cycle

Logical 1 Logical 1

cycle

RTZ:

NRZ:This coding leads to complex logic implementation; hard to track odd and even phases and logic values – hence see LEDR

below

Page 23: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 23NanoNet’07, Catania

DI codes (1-of-n and m-of-n)

1-of-4: – 0001=> 00, 0010=>01, 0100=>10, 1000=>11

2-of-4:– 1100, 1010, 1001, 0110, 0101, 0011 – total 6

combinations (cf. 2-bit dual-rail – 4 comb.) 3-of-6:

– 111000, 110100, …, 000111 – total 20 combinations (can encode 4 bits + 4 control tokens)

2-of-7:– 1100000, 1010000, …, 0000011 – total 21

combinations (4 bits + 5 control tokens)

Page 24: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 24NanoNet’07, Catania

DI codes completion detection and decoding

1-of-4 completion detection is a 4-input OR gate (CD=d0+d1+d2+d3)

Decode 1-of-4 to dual rail is a set of four 2-input OR gates (q0.0=d0+d2; q0.1=d1+d3; q1.0=d0+d1; q1.1=d2+d3)

For m-of-n codes CD and decoding is non-trivial

From J.Bainbridge et al, ASYNC’03

Page 25: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 25NanoNet’07, Catania

Incomplete DI codes

Incomplete 2-of-7:

Composed of

1-of-3

and

1-of-4

From J.Bainbridge et al ASYNC’03

Page 26: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 26NanoNet’07, Catania

Phase difference based encoding (C. D’Alessandro et al. ASYNC’06,’07)

sp0 sp1

0

sp0

1

sp1

0

sp0

0

ref

t_1

t_0

data

t_1 before t_0t_0 before t_1

The proposed system consists in encoding a bit of data in the phase relationship between two signals generated using a reference

This would ensure that any transient fault appearing on one of the reference signals will be ignored if it is not mirrored by a corresponding transition on the other line

Similarity with multi-wire communication

Page 27: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 27NanoNet’07, Catania

Phase encoding: multiple rail No group of wires has the same delay All wires toggle when an item of data is sent Increased number of states available ( n wires = n! states) hence

more bits/symbol Table illustrates examples of phase encoding compared to the

respective m-of-n counterpart

Type of LinkNumber of states

Bits per Symbol

Extra states

Transitions per symbol

Symbols per packet

Transitions per packet

Phase enc. (4) 24 4 8 4 32 128

1-of-4 4 2 0 2 64 128

Phase enc. (6) 720 9 208 6 15 90

3-of-6 20 4 4 6 32 192

Page 28: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 28NanoNet’07, Catania

Phase encoding Repeater

i1

i2

i3

o1

o2

o3

sender

go

receiver

Phase detectors (Mutexes)

1<3

3<1

2<3

3<2

1<2

2<1

Page 29: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 29NanoNet’07, Catania

PipelinesDual-rail pipeline

From J.Bainbridge & S. Furber IEEE Micro, 2002

Page 30: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 30NanoNet’07, Catania

The problem of Acking

Question 2 “when can Tx send new data?” has two aspects: – Safety (not to overflow the channel or

when Tx and Rx have much variation in delay)

– Performance (to maximize throughput and reduce latency)

Can we hide ack (round trip) delay?

Page 31: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 31NanoNet’07, Catania

From R.Ho et al. ASYNC’04

To maintain throughput more pipeline stages are required but that costs too much latency and power

First minimize latency along a long wire (not specific to asynchronous) and then maximize throughput (using “wagging tail buffer” approach)

Page 32: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 32NanoNet’07, Catania

From R.Ho et al. ASYNC’04

Use of wagging buffer approach

Alternate between top and bottom

control

Page 33: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 33NanoNet’07, Catania

“Wagging tail buffer” approachreqtop

acktop

ackbot

reqbot

data

Top and bot

control channels work at

½ frequency of data channel

Page 34: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 34NanoNet’07, Catania

Serial Link vs Parallel Link (from R. Dobkin) Why Serial Link?

– Less interconnect area– Less routing congestion– Less coupling– Less power (depends on

range)

The relative improvement grows with technology scaling. The example on the right refers to: – Single gate delay serial link– Fully-shielded parallel link

with 8 gate delay clock cycle– Equal bit-rate– Word width N=8

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

180 130 90 65 30 15

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

180 130 90 65 30 15

Parallel Link dissipates less power

Serial Link dissipates less power

Technology Node [nm]

Link Length [mm]

Parallel Link requires less area

Serial Link requires less area

Page 35: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 35NanoNet’07, Catania

Serialization model

Tx Rx

Acking at the bit level

… …

Page 36: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 36NanoNet’07, Catania

Serialization model

Tx Rx

Acking at the word level

Page 37: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 37NanoNet’07, Catania

Serialization model

Tx Rx

Acking at the word level (with more concurrency)

Page 38: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 38NanoNet’07, Catania

Serial Link – Top Structure (R.Dobkin, Async’07)

Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)

Acknowledge per word instead of per bit Synchronizers used at the level of the ack signals Wave-pipelining over channel Differential encoding (DS-DE, IEEE1355-95) Reported throughput: 67Gps for 65nm process (viz. one bit per

15ps – expected FO4 inverter delay), based on simulations

Page 39: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 39NanoNet’07, Catania

Encoding –Two Phase NRZ LEDR

Two Phase Non-Return-to-Zero Level Encoded Dual Rail – “delta” encoding (one transition per bit)

Uncoded (B)

State bit (S)

Phase bit (P)

0 0 1 1 0 0 0 0 1 0

( ),( )

( ),

B i i oddP i

B i i even

( ) ( )S i B i i

Page 40: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 40NanoNet’07, Catania

Transmitter – Fast SR Approach (from R. Dobkin)

Page 41: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 41NanoNet’07, Catania

Receiver Splitter (from R. Dobkin)

Page 42: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 42NanoNet’07, Catania

Self Timed Networks

Router requires priority arbitration– Arbitration necessary at every router merge– Potential delay at every node on the pathBUT– Asynchronous merge/arbitration time is average not worst

case Adapters to locally clocked cells require

synchronization Synchronization necessary when clocks are unknown

– Occurs when receiving data (data valid), and when sending (acknowledge)

BUT– Time can be long (2 cycles?)– Must assume worst case time (maybe)

Page 43: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 43NanoNet’07, Catania

Router priority

Virtual channels implement scheduling algorithm Contention for link resolved by priority circuits

Merge Split

Link

Flow Control

Page 44: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 44NanoNet’07, Catania

Asynchronous Arbiters

Multiway arbiters (e.g. for Xbar switches):– Cascaded mesh (latency ~ N)– Cascaded Tree (latency ~ logN)– Token-Ring (busy ring and lazy ring) (latency ~

from 1 to N) Priority arbiters (e.g. for Routers with different QS):

– Static priority (topological order)– Dynamic priority (request arrives with priority

code)– Ordered (time-priority) - multiway arbiter, followed

by a FIFO buffer

Page 45: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 45NanoNet’07, Catania

Static Priority Arbiter

s q

r*C

MUTEX

Cs* q

r

MUTEX

Cs* q

r

MUTEX

Cs* q

r

G1

G2

G3

R1

R2

R3

Lock

Lock Register

Pri

ori

ty M

od

ule

r1

r2

r3

s1

s2

s3

Page 46: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 46NanoNet’07, Catania

Why Synchronizer?

Here one clock cycle is used for the metastability to resolve.

DFFCLK

DATA QDATA

CLK

Q

Metastability

DFFCLK

DATA

DFFQ

0

1

0

1

Metastability

Two DFF Synchronizer

Page 47: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 47NanoNet’07, Catania

CAD support: Async design flow

Page 48: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 48NanoNet’07, Catania

DeviceLDS

LDTACK

D

DSr

DSw

DTACK

VME BusController

DataTransceiver

BusDSr

LDS

LDTACK

D

DTACK

Read Cycle

Synthesis of Asynchronous link interfaces

Page 49: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 49NanoNet’07, Catania

DTACK-DSr+

LDS+

LDTACK+

D+

DTACK+

DSr-

D-

LDS-

LDTACK-

DSw-

DSw+

D+

LDS+

LDTACK+

D-

DTACK+

Page 50: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 50NanoNet’07, Catania

DSr+

DSr+

DSr+

DTACK-

DTACK-

DTACK-

LDS-LDS-LDS-

LDTACK- LDTACK- LDTACK-

D-

DSr-DTACK+

D+

LDTACK+

LDS+

Complete State Coding (CSC)

csc -

csc +

Boolean equations:Boolean equations:

LDS = D cscDTACK = DD = LDTACK csc = DSr

Logic asynchronous circuit

DTACK

D

DSr

LDS

LDTACK

csc

synthesis

DTACK-DSr+

LDS+

LDTACK+

D+

DTACK+

DSr-

D-

LDS-

LDTACK-

DSw-

DSw+

D+

LDS+

LDTACK+

D-

DTACK+

Page 51: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 51NanoNet’07, Catania

Conclusions on Async Links

At nm level links will be more asynchronous, perhaps first, mesochronous to avoid global clock skew

Delay-insensitive codes can be used to tolerate interwire-delay variability

Phase-encoding can be used for higher power-bit efficiency and SEU tolerance

Acking will be mainly used for flow control (word level) and its overhead can be ‘hidden’ by using the “wagging buffer” technique

Serial Links save area and power for long interconnects, with buffering (pipelining) if one wants to maintain high throughput; they also simplify building switches

Synthesis tools can be used to build clock-free interfaces between different links

Asynchronous logic can be used for building higher level circuits, e.g. arbiters for switches and routers

Page 52: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 52NanoNet’07, Catania

And finally …

Page 53: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 53NanoNet’07, Catania

ASYNC’08 and NOCs’08 …plus SLIP’08

Held in Newcastle upon Tyne, UK, 7-11 April 2008 (SLIP on 5-6 April – weekend)

async.org.uk/async2008 async.org.uk/nocs2008 Submission deadlines:

– Async’08: Abstract – Oct. 8 , Full paper – Oct. 15– NOCs’08: Abstract – Nov. 12, Full paper – Nov. 19

Page 54: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 54NanoNet’07, Catania

Extras

More slides if I have time!

Page 55: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 55NanoNet’07, Catania

Chain Network Components

From J.Bainbridge & S. Furber IEEE Micro, 2002

Page 56: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 56NanoNet’07, Catania

A Network on Chip

Synchronization required

Arbitration required

Multiple Clocks

Page 57: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 57NanoNet’07, Catania

Transmitter – Fast SR Approach (from R. Dobkin)

Page 58: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 58NanoNet’07, Catania

Receiver Splitter (from R. Dobkin)

Page 59: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 59NanoNet’07, Catania

Self Timed Networks

Router requires priority arbitration– Arbitration necessary at every router merge– Potential delay at every node on the pathBUT– Asynchronous merge/arbitration time is average not worst

case Adapters to locally clocked cells require

synchronization Synchronization necessary when clocks are unknown

– Occurs when receiving data (data valid), and when sending (acknowledge)

BUT– Time can be long (2 cycles?)– Must assume worst case time (maybe)

Page 60: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 60NanoNet’07, Catania

Router priority

Virtual channels implement scheduling algorithm Contention for link resolved by priority circuits

Merge Split

Link

Flow Control

Page 61: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

NanoNet’07, Catania 61

17/09/2007

Static priority arbiter

s q

r*C

MUTEX

Cs* q

r

MUTEX

Cs* q

r

MUTEX

Cs* q

r

G1

G2

G3

R1

R2

R3

Lock

Lock Register

Pri

ori

ty M

od

ule

r1

r2

r3

s1

s2

s3

Page 62: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 62NanoNet’07, Catania

Reliability and latency

Asynchronous arbiters fail only if time is bounded– Latency depends on fixed gates plus MUTEX lock time– for 2 channels, + ln(N-1) for more– This likely to be small compared with flow control latency

Synchronizers fail at (fairly) predictable rates but these rates may get worse– Latency can be 35 now for good reliability

Page 63: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 63NanoNet’07, Catania

The synchronizer

Clock and valid can happen very close together Flip Flop #1 gets caught in metastability We wait until it is resolved (1 –2 clock periods)

D Q D Q

CLK2

VALID#1 #2

DATA

CLK1

Page 64: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 64NanoNet’07, Catania

MTBF

For a 0.18 process is 20 – 50 ps Tw is similar Suppose the clock and data frequencies are 2 GHz t needs to be > 25 (more than one clock period) to get

MTBF > 28 days– 100 synchronizers + 5 – MTBF > 1year + 2 – PVT variations +5 - 10 . . .

MTBFe

T f f

t

w

/

. .

c d

Page 65: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 65NanoNet’07, Catania

Event Histogram

Metastability Time

1E-19

1E-16

1E-13

1E-10

-1.0E-08 -8.0E-09 -6.0E-09 -4.0E-09 -2.0E-09 0.0E+00

Q to Clock time

Eff

ecti

ve

In

pu

t O

ve

rlap

100ps input variation10ps noise and jitterDeep meta

Measurement Convert to log scale, slope is

Page 66: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 66NanoNet’07, Catania

Not always simple

Metastability Time

1E-20

1E-18

1E-16

1E-14

1E-12

1E-10

-1.000E-

08

-9.000E-

09

-8.000E-

09

-7.000E-

09

-6.000E-

09

-5.000E-

09

-4.000E-

09

-3.000E-

09

Q to Clock time

Effe

ctiv

e In

pu

t O

verl

ap

10ps noise and jitter

Deep meta

More than one slope350ps120ps140ps

Page 67: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 67NanoNet’07, Catania

Synchronization Strategies

Avoid synchronization time (and arbitration time) by – predicting clocks, stoppable clocks– dedicate link paths for long periods of time

Minimize time by circuit methods– Higher power, better – Reducing apparent device variability - wide transistors– many parallel synchronizers increase throughput

Reduce average latency by speculation– Reduce synchronization time, detect errors and roll back

Page 68: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 68NanoNet’07, Catania

Timing regions can have predictable relationships

Locked– Two clocks from same source– Linked by PLL– One produced by dividing the other– Some asynchronous systems– Some GALS

Not locked together but predictable– Two clocks same frequency, but different

oscillators.– As above, same frequency ratio

Page 69: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 69NanoNet’07, Catania

Don’t synchronise when you don’t need to

If the two clocks are locked together, you don’t need a synchroniser, just an asynchronous FIFO big enough to accommodate any jitter/skew

FIFO must never overflow Next read clock can be predicted and metastability avoided

REQ INWrite Data Available

Read done

ACK IN REQ OUT

ACK OUT

FIFODATA DATA

Page 70: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 70NanoNet’07, Catania

Conflict Prediction

Receiver Clock

Transmitter Clock

Predicted Transmitter Clock Synchronization problem

known a cycle in advance of the Receiver clock.

We can do this thanks to the periodic nature of the clocks

Page 71: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 71NanoNet’07, Catania

Problems predicting next cycle

Difficult to predict– Multiple source clocks– Input output interfaces

Dynamic jitter and noise – GALS start up clocks take several cycles to stabilise– Crosstalk– power supply variations introducing noise into both data and

clock .– temperature changes alter relative delays

As a proportion of cycle time, this is likely to increase with smaller geometries

Page 72: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 72NanoNet’07, Catania

Synchronizer reliability trends

Clock rates increase. 10 GHz gives 100ps for a cycle.– Both data and clock rates up by n down by n

Assume scales with cycle time reliability (MTBF) of one synchronizer down by n

Number of synchronizers goes up by N – Die reliability down by N

Die – die and on-die variability increases to as much as 40%– 40% more time needed for all synchronizers

Page 73: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 73NanoNet’07, Catania

An example

Example– 10 GHz clock and data rate = 10 ps– 100 synchronizers– MBTF required 3.8 months (107 seconds )– Time required 41 , or 4.1 cycles + 40% =5.8

cycles

Does this matter?

Page 74: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 74NanoNet’07, Catania

Power futures

Total synchronizer area/power small, BUT very sensitive to voltage/power – both n and p

transistors can turn off at low voltages – no gain This affects MUTEX circuits as well

tau

0

50

100

150

200

250

0.5 1 1.5 2

Vdd

ps

Page 75: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 75NanoNet’07, Catania

Power/speed tradeoffs

Increase Vdd when synchronisation required

Make synchronizer transistors wide to reduce variation and, to some extent,

Make many synchronizer circuits, and select the consistently fastest one

Avoid reducing synchronizer Vdd when running slow

Page 76: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 76NanoNet’07, Catania

Speculation

Mostly, the synchronizer does not need 35 to settle

Only e-10 (0.005%) need more than 10

Why not go ahead anyway, and try again if more time was needed

Page 77: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 77NanoNet’07, Catania

Low latency synchronization Data Available, or Free to write are produced early

– After one cycle?. If they prove to be in error, synchronization failed

– Only know this after two of more cycles Read Fail or Write Fail flag is then raised and the action can be

repeated.

Read Fail

Data Available

WRITE

FIFO

Write Fail

Write Data Read done

Free to writeFull Not Empty

READ

DATA DATA

Write clock Read Clock

Speculativesynchronizer

Speculativesynchronizer

Page 78: NanoNet’07, Catania 17/09/2007 1 Asynchronous Links, for NanoNets? Alex Yakovlev University of Newcastle, UK

17/09/2007 78NanoNet’07, Catania

Comments

Synchronization time will be an issue for future GALS

Latency and throughput can be affected– Should the flit be large to reduce the effective

overhead of time and power?

Some power speed trade off is possible– Higher power synchronization can buy some

performance ?

Speculation is complex – Is it worth it?