© 2002-2009 ran ginosarasynchronous design and synchronization 1 vlsi architectures 048878 lecture...

42
© 2002-2009 Ran Ginosa r Asynchronous Design and Synch ronization 1 VLSI Architectures 048878 Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures (S&F 3) Performance (S&F Ch. 4)

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 1

VLSI Architectures048878

Lecture 2:

Theoretical Aspects (S&F 2.5)

Data Flow Structures (S&F 3)

Performance (S&F Ch. 4)

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 2

Classification of Async Circuits

• Self-timed (ST)

– Requires some timing assumptions

• Speed-independent (SI)

– Zero (ideal) wire delay, arbitrary gate delay

• Delay-insensitive (DI)

– Arbitrary delays (gates and wires)

• Quasi-delay-insensitive (QDI)

– DI with the Isochronic Fork assumption

– Theoretically equivalent to SI

• SI and DI are mathematically provable

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 3

Speed Independence

• A gate (Boolean function) is either:

– Stable, or

– Excited (inputs have changed and the output should also change to satisfy the Boolean function)

• A gate “fires” the output is changed

• An excited gate eventually fires and become stable.

• SI means: Firing of one gate must never cause another excited gate to become stable without firing.

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 4

Data Flow Structures

• Abstraction similar to sync RTL

– Likewise, described by either schematics or HDL

• Applies to all (3) handshake protocols, but we assume 4-phase

– Alternating VALID / EMPTY tokens

• Assume handshake latches and handshake-ignorant function blocks

– Recall token flow rules

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 5

Abstract Pipeline

• Bubbles

• Tokens

• Valid (0 or 1, who cares) and Empty tokens

• Transparent function blocks (don’t change token flow, only introduce some delays)

E V V E E

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 6

Abstract Rings

• 3 stages, 1 bubble:

– 3 steps for token round

– 6 steps to cycle

V E V

V E E

V V E

E V E

token

bubble

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 7

Abstract Rings

• 4 stages, 2 bubbles:

– How many steps to cycle ?

• An added latch did not change the function (unlike sync pipe)

V E E V

V V E E

E V V E

V V E E

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 8

Building Blocks

Latch Source Sink

Fork Join(wait for all)

Merge(wait for one)

MUX

0

1

DEMUX

0

1

Function Block(Join; CL; Fork)

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 9

Example

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 10

Example: t0

EE

E

EV

EE

EEV

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 11

Example: t1

VE

E

EV

EE

EEV

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 12

Example: t2

VV

E

EV

EE

EVE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 13

Example: t3

EV

V

EV

EE

VVE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 14

Example: t4

EE

V

EE

EE

VE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 15

Example: t5

EE

V

VE

VE

VE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 16

Example: t6

EE

E

VE

VV

VE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 17

Example: t7

EE

E

VV

VV

EEV

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 18

Example: t8

EE

E

EV

EV

EE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 19

Example: t9

EE

E

EV

EE

EE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 20

Example: t10

EE

E

EV

EE

EEE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 21

Example: t11

EE

E

EV

EE

EE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 22

Another Ring: Simple FSM

EV

F

E

Next StatePresent State

Input Output

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 23

Another Ring: Iterative Computation

EE F

E

Input Output0

1

0

1

EE F1F2F3

Arbitrary piping also works:

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 24

Latches don’t foul the pipe!

• Don’t try this with sync circuits!

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 25

IF statementif <COND> then <TRUE PART> else <FALSE PART>

0 1

TRUEPART

FALSEPART

0 1

COND

Combinational logic, or latches may be added

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 26

FOR statement

for <COUNT> do <BODY>

BODY

01

COUNT

01 E0

One handshake here

Results in COUNT handshakes here [1x(COUNT-1) + 0]

Warning: Not all latches are shown

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 27

WHILE statement

while <COND> do <BODY>

BODY

01

COND

01 E0

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 28

Async GCD

input (a, b);

while a b do

if a > b then a a-b ;

else b b-a ;

output (a);

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 29

Async GCD input (a, b);

while a b do

if a > b then a a-b ;

else b b-a ;

output (a);

0

1

B-A

A-B

0

1

E

A>B

E

01

01

E0

E

AB

1

1

A,B

A,B

A,B GCD(A,B)

if

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 30

Performance• Sync performance analysis is simple:

– Check all register-to-register paths

– Static Timing Analysis

– Dynamic simulations only check correctness, not performance

• Async performance analysis is COMPLEX:

– Many cycles

– Data dependent delays

– Dependency on environment and initialization

– Not guaranteed to have a solution

• We will only consider simple examples…

– Qualitative, then quantitative

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 31

E 3 E 2 E 1

E 3 E 2 E 1

E 3 E 2 E E

E 3 E 2 2 E

E 3 E E 2 E

E 3 3 E 2 E

E E 3 E 2 E

4 E 3 E 2 E

4 E 3 E 2 E

1

4

4

E

FIFOPerformance:2N on 2N

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 32

FIFO Performance

• 2N=6 tokens (N Valid, N Empty)

• 2N=6 latches

• 2N=6 steps to move all tokensone step to the right

• But is it the best we can do with 2N latches?Let’s try a fast sink.

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 33

E 3 E 2 E E

E 3 E E 2 2

E 3 E 2 E 1 1

E E 3 3 E E4

4 4 E E 3 3

FIFOPerformance:Fast sink(N on 2N)

E 3 E 2 2 E E

E 3 3 E E 2 2

4 E E 3 3 E4 E

E 4 4 E E 3 3

E E 4 4 E E

E

E

5

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 34

FIFO Performance

• Fast sink:

– Tokens spread out

– Bubble every other stage

– Only N tokens in 2N stages

– One step to move every token to the right

• Let’s try to add stages (same # of tokens)

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 35

14 E 3 3 E

4 4 E 3 3

E E 3 E E

4 E E 3 E

E 4 E E 3

2

E

2

E

E

2

2

E

2

E

E

2

E

E

2

1

E

1

E

E

4

4

E

E E

FIFO Performance: 2N on 3N

• 3N=9 stages

• 2N tokens (N Valid, N Empty) + N bubbles

• Only 2 steps to move every token one stage to the right

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 36

Shift Register + Parallel Load

• CTL=0 token:

– Parallel load

– Old values to sink latches

– Valid din[0] to output

• CTL=1 token:

– Shift right

– Valid token output

• CTL=Empty token

– Shift right

– Empty token output

• Two performance issues:

– Too few bubbles

– High fanout on CTL

• Time

• Large C-element for ACK

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

ctl

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 37

Shift Register + Parallel Load

• Buffers added in the CTL path

– Solves both issues together

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

ctl

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

ctlEEE

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 38

E 00

1

0

1E 0

0

1

0

1E 0

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

0EEctl0

E 00

1

0

1E 0

0

1

0

1d1 E

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

00E 0

d3 E0

1

0

1d2 E

0

1

0

1d1 E

0

1

0

1

0

10

din[1]din[2]din[3] din[0]

do

000 0

Parallel Load (CTL=0)

Enabled, move not shown

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 39

din, CTL Empty (slow consumer)

d3 E0

1

0

1d2 E

0

1

0

1d1 E

0

1

0

1

0

10

EEE E

E, do

E00 E

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

EEE E

E, do

EEE E

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 40

Slow Shift (CTL=1; CTL=E)

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

EEE Ed1,E,do

1EE 1

0 E0

1

0

1d3 E

0

1

0

1d2 E

0

1

0

1

0

10

EEE E

d1,E,do

111 1

E 00

1

0

1E d3

0

1

0

1E d2

0

1

0

1

0

10

EEE EE,d1,E,do

EEE E

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 41

Fast Shift (CTL=1;E;1;E…)

E d30

1

0

1E d2

0

1

0

1E d1

0

1

0

1

0

10

EEE E

d1,E,do

1EE 1

E d30

1

0

1E d2

0

1

0

1d2 E

0

1

0

1

0

10

EEE E

E,d1,E,do

E1E E

E d30

1

0

1d3 E

0

1

0

1E d2

0

1

0

1

0

10

EEE E d2,E,d1,E,do

1E1 1

0 E0

1

0

1E d3

0

1

0

1d3 E

0

1

0

1

0

10

EEE E E,d2,E,d1,E,do

E1E E

© 2002-2009 Ran Ginosar Asynchronous Design and Synchronization 42

Shift Register Behavior

• Dynamic, depends on relative timing of Consumer and the shifter

• Every stage has 2 tokens

• Slow consumer: CTL has bubbles

• Fast consumer: CTL has tokens

• Nothing (V,E) moves without a CTL token