equalization lidcentral repetitive problem equalization in lid and the central repetitive problem...

35
Equalizati Equalizati on on in LID LID and the Central Central Repetitive Problem Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON , Robert de SIMONE, Jean Vivien MILLO AOSTE Team

Upload: preston-mills

Post on 14-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

EqualizationEqualization in LIDLID and the Central Central

Repetitive ProblemRepetitive Problem

MEMOCODE’06 - 29th July – Napa

Julien BOUCARON, Robert de SIMONE, Jean Vivien MILLO

AOSTE Team

Page 2: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

General MotivationsGeneral MotivationsSingle clockSingle clock design of SoC soon of SoC soon no longer feasibleno longer feasible..• Due toDue to increasing size and density

– Long wires latencies.Long wires latencies.– Clock Tree propagation issues.Clock Tree propagation issues.– Reaching Timing Closure when assembling IPs.Reaching Timing Closure when assembling IPs.

GALS Models & Latency Insensitive Design (LID)GALS Models & Latency Insensitive Design (LID)

• Techniques need to be developed for analysis/optimization/synthesis on such models

• We focus on correct static scheduling and efficient hardware implementation for LID.

2

Page 3: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

OutlineOutline• Preliminaries

– Basic Synchronous/Asynchronous Models– Introducing latencies

• Dynamic Scheduling mechanisms for LID• Static Scheduling & K-Periodic behavior:

Central Repetitive Problem

Our contributionsOur contributionsStatic Periodic Scheduling MechanismsStatic Periodic Scheduling Mechanisms

Latency-based EqualizationLatency-based Equalization process process.. Fractional Registers Fractional Registers for residual latenciesfor residual latencies..

KPassa tool & ExperimentsKPassa tool & Experiments..• Conclusion & Further Topics

3

Page 4: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Common Basis: Computation Network Scheme

CComputation omputation NNodesodes and DData ata LLink communication Arcs.ink communication Arcs.

4

CN

DL

Intuitive (incomplete) semantics:

CNCN nodes consume/produce data on all nodes consume/produce data on all input/output DL arcs input/output DL arcs

Data values abstracted as tokens tokens

(data present/absent)

No conflict choice: No conflict choice: each link has one source and one target.each link has one source and one target.

Various ModelsVarious Models obtained by specializing obtained by specializing– Firing Rule (Sync, Async) Firing Rule (Sync, Async) – Nature of DL Buffering (Capacity, Latency)Nature of DL Buffering (Capacity, Latency)

Page 5: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Main Models 5

1.1. SynchronousSynchronous– All computations simultaneousAll computations simultaneous– Data links: Data links: wireswires or or unit delay/capacityunit delay/capacity

registers/latchesregisters/latches– CorrectCorrect if at least one latch in each loop if at least one latch in each loop

2.2. Asynchronous (Marked Graphs)Asynchronous (Marked Graphs)– Independent computation triggeringIndependent computation triggering– Data links: unbounded buffersData links: unbounded buffers– Correct if at least one token in each loopCorrect if at least one token in each loop

REG

REG

Page 6: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Main Models 6

33

1.1. SynchronousSynchronous– All computations simultaneousAll computations simultaneous– Data links: wires or unit delay/capacity Data links: wires or unit delay/capacity

registers/latchesregisters/latches– Correct if at least one latch in each loopCorrect if at least one latch in each loop

2.2. Asynchronous (Marked/Event Graphs)Asynchronous (Marked/Event Graphs)– Independent computation triggeringIndependent computation triggering– Data links: Data links: unbounded buffersunbounded buffers– CorrectCorrect if at least one token in each loop if at least one token in each loop

[Commoner, Holt, Even & Pnueli 1971][Commoner, Holt, Even & Pnueli 1971]

Page 7: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Variation on Async Model

Marked Graphs with capacitiesMarked Graphs with capacities– Data links: Data links: finite capacity buffersfinite capacity buffers

New issue: buffer fullNew issue: buffer full

Can be reduced to previous asynchronous Can be reduced to previous asynchronous unboundedunbounded buffers case buffers case by adding reverse by adding reverse back-pressureback-pressure arcs, avoiding arcs, avoiding congestioncongestion..

– Correct if at least one token in each loop Correct if at least one token in each loop of the of the completedcompleted graph. graph.

7

Capacity NK

Page 8: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Variation on Async Model

Marked Graphs with capacitiesMarked Graphs with capacities– Data links: Data links: finite capacity buffersfinite capacity buffers

New issue: buffer fullNew issue: buffer full

Can be reduced to previous asynchronous Can be reduced to previous asynchronous unboundedunbounded buffers case by adding reverse buffers case by adding reverse back-pressureback-pressure arcs, avoiding arcs, avoiding congestioncongestion..

– CorrectCorrect if at least one token in each loop if at least one token in each loop of the of the completedcompleted graph. graph.

8

N-K

UnboundedK

Unbounded

Page 9: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Introducing arc latenciesIntroducing arc latenciesLatency = Duration <> CapacityLatency = Duration <> CapacityDefined from

• Asynchronous modelsTimed Marked Graph theoryTimed Marked Graph theory(C. RAMCHANDANI 1973)(C. RAMCHANDANI 1973)

• Synchronous modelsLatency Insensitive Design theoryLatency Insensitive Design theory

(CARLONI, SANGIOVANNI-VINCENTELLI, MCMILLAN 1999)(CARLONI, SANGIOVANNI-VINCENTELLI, MCMILLAN 1999)

• ASAPASAP semantics semantics (Synchronous in Nature)(Synchronous in Nature)

• All CN that may fire, do so. All CN that may fire, do so.

Some CN may idle because some tokens unavailable Some CN may idle because some tokens unavailable due to different latencies.due to different latencies.

9

3

1

11

Page 10: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Variation on latenciesVariation on latenciesCN computation latency can be

represented by transportation latency

Can also deal with pipelined computations

Latency: time needed from input to output

Delay: time needed between successive inputs

10

LL

b.Comp

e.Comp

L

D b.Comp

e.Comp

Comp

Page 11: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Expanding latenciesExpanding latencies

Extra Transportation NodesTransportation Nodes can be explicitly introduced to expand latencies in between unit timeunit time travel sections

Tokens only travel from a buffer to the next in one unit of time

Transportation Nodes similar to Computation Nodes

11

1

11

1

1

1 3

TN

Page 12: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Latency Insensitive DesignLatency Insensitive DesignHardware Implementation:• Relay Stations Relay Stations for buffering.for buffering.• Shell wrappersShell wrappers around CN

local clock gating.

RS Behavior:• IDEA: hold its token (if any) when congestion ahead.• BUT: cannot warn its predecessor.• THUS may simultaneously receive a second token.• THEN: will signal congestion at next instant (no new

token).

12

RS

SW

NEEDS buffers of capacity 2 in between computation/transport nodes.

Page 13: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Central repetitive problemCentral repetitive problem13

• The ASAP firing rule is deterministic (disregarding free primary inputs, otherwise their flow should be modeled)

Global behavior ultimately k-periodic[Carlier, ChretienneCarlier, Chretienne 1987]

– each node fires according to a pattern init.(periodic)*init.(periodic)* ex. 1011.ex. 1011.(01011)*(01011)*

– all nodes have the same throughput (inherited from the slowest cycle)the slowest cycle).

• periodicity kk (occ. of 1) on a period of length pp • throughput = throughput = k/pk/p (here 3/5). (here 3/5). • p = lcmcritical_SCCs (gcdcycles_in_critical_SCC (latency))

[Baccelli,Cohen,Quadrat et. alBaccelli,Cohen,Quadrat et. al 1992]

Page 14: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

OutlineOutline• PreliminariesPreliminaries

– Basic Synchronous/Asynchronous ModelsBasic Synchronous/Asynchronous Models– Introducing latenciesIntroducing latencies

• Dynamic Scheduling mechanisms for LIDDynamic Scheduling mechanisms for LID• Static Scheduling & K-Periodic behavior: Static Scheduling & K-Periodic behavior:

Central Repetitive ProblemCentral Repetitive Problem

Our contributionsOur contributionsStatic Periodic Scheduling MechanismsStatic Periodic Scheduling Mechanisms

Latency-based EqualizationLatency-based Equalization process process.. Fractional Registers Fractional Registers for residual latenciesfor residual latencies..

KPassa tool & ExperimentsKPassa tool & Experiments..• Conclusion & Further TopicsConclusion & Further Topics

14

Page 15: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Our goalOur goal• Provide a hardware implementation for static Provide a hardware implementation for static

schedules schedules (preserving original throughput).

• Do this by addingDo this by adding – extra virtual integer latencies, to equalizeequalize as much as

possible the throughputs amongst concurrent data paths. – for residual differences: Fractional Buffers

see next examplesee next example• NoticeNotice::

– This provides specific schedules (tokens are rather evenly This provides specific schedules (tokens are rather evenly spread). spread).

– Extra Extra virtualvirtual latencies can be used as computation latencies can be used as computation latencies for re-synthesis.latencies for re-synthesis.

15

Page 16: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (1/3)(1/3)

• 1) List all elementary cycles1) List all elementary cycles

• 2) Compute all throughputs2) Compute all throughputs Find critical cycles

• Here : 2 cycles

– C1 : 2 tokens / 2 latencies

– C2 : 3 tokens / 5 latencies

– 3/5 < 2/2 so C2C2 is is criticalcritical..

C1

C2

FF

BB

AA

DD

EE

Page 17: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (2/3)(2/3)

• 3) Add virtual latencies to fast cycles, 3) Add virtual latencies to fast cycles, but not too muchbut not too much

• redred arc is the only possible location FF

BB

AA

DD

EE

Page 18: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (2/3)(2/3)

• 3) Add virtual latencies to fast cycles, 3) Add virtual latencies to fast cycles, but not too muchbut not too much

• redred arc is the only possible location

• Adding a unitary latency

2/(2+2/(2+11) = ) = 2/2/33 > 3/5 > 3/5 (still ok)(still ok)

FFCC

BB

AA

DD

EE

Page 19: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (2/3)(2/3)

• 3) Add virtual latencies to fast cycles, 3) Add virtual latencies to fast cycles, but not too muchbut not too much

• redred arc is the only possible location

• Adding a unitary latency

2/(2+2/(2+11) = ) = 2/2/33 > 3/5 > 3/5 (still OK)(still OK)

• Adding a second latency is too muchAdding a second latency is too much

2/(3+2/(3+11) = ) = 2/2/44 < 3/5 < 3/5 (KO)

It would not preserve the global throughputIt would not preserve the global throughput

CC

BB

AA

DD

EE

C2C2

FF

Page 20: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (2/3)(2/3)

• 3) Add virtual latencies to fast 3) Add virtual latencies to fast cycles, but not too muchcycles, but not too much

• redred arc is the only possible location

• Adding a unitary latency

2/(2+2/(2+11) = ) = 2/2/33 > 3/5 > 3/5 (still OK)(still OK)

Still 2/3 <> 3/5 so Still 2/3 <> 3/5 so “fractional”“fractional” buffering buffering needed at one placeneeded at one place

CC

BB

AA

DD

EE

FF

FR

Page 21: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Fractional Register behaviorFractional Register behavior• Acts as combinatorial Acts as combinatorial wirewire when not holdnot hold.• Acts as registerActs as register latch latch when when holdhold,, keepingkeeping

• a a singlesingle token several steps token several steps, or, or

• a token a token sequencesequence in a row in a row ((each one once)each one once)

• Correctness property: Correctness property: no no Val_outVal_out is sent when is sent when recipient not ready (by static schedule)recipient not ready (by static schedule)

Latches + FRs provide expressiveness of Relay Latches + FRs provide expressiveness of Relay Stations.Stations.

But the goal is to add them only where neededBut the goal is to add them only where needed

21

FR

not hold

Val_in Val_out

reg

Page 22: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Fractional Register behaviorFractional Register behavior• Acts as combinatorial Acts as combinatorial wirewire when when not holdnot hold..• Acts as register latch when holdhold, keeping

• a single token several steps, or

• a token sequence in a row (each one once)

• Correctness property: Correctness property: no no Val_outVal_out is sent when is sent when recipient not ready (by static schedule)recipient not ready (by static schedule)

Latches + FRs provide expressiveness of Relay Latches + FRs provide expressiveness of Relay Stations.Stations.

But the goal is to add them only where neededBut the goal is to add them only where needed

22

FR

hold

Val_in Val_out

reg

Page 23: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Fractional Register behaviorFractional Register behavior• Acts as combinatorial Acts as combinatorial wirewire when when not holdnot hold..• Acts as registerActs as register latch latch when when holdhold,, keepingkeeping

• a a singlesingle token several steps token several steps, or, or

• a token a token sequencesequence in a row in a row ((each one once)each one once)

• Global Correctness propertyGlobal Correctness property (of static schedule)(of static schedule) : : no no Val_outVal_out is sent when recipient not ready is sent when recipient not ready

FR in each section provides expressiveness of Relay Stations.FR in each section provides expressiveness of Relay Stations.But the goal is to add them only where needed after But the goal is to add them only where needed after

saturation by integer latencies.saturation by integer latencies.

23

FR

hold

Val_in Val_out

reg

Page 24: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (3/3)(3/3)• 4) Symbolic Simulation provides explicit schedules 4) Symbolic Simulation provides explicit schedules

and Fractional Registersand Fractional Registers

001101(01101)*

000010(00000)*

011100(11010)*

011000(10000)*

110011(01011)*

FFCC

BB

AA

DD

EE

FR2FR1

Page 25: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization example Equalization example (3/3)(3/3)• 4) Symbolic Simulation provides explicit schedules 4) Symbolic Simulation provides explicit schedules

and Fractional Registersand Fractional Registers

001101(01101)*

000010(00000)*

011100(11010)*

011000(10000)*

110011(01011)*

FFCC

BB

AA

DD

EE

FR2FR1

N-Synchronous Processes

[POPL’06]

Page 26: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Equalization Algorithmic stepsEqualization Algorithmic steps1.1. EnumeratesEnumerates all ElementaryElementary Cycles.Cycles.

2.2. ComputeCompute each cycle throughputthroughput (k/p) (k/p)

find Critical cycles.

3.3. Add integer latencies toAdd integer latencies to non-critical arcsnon-critical arcs (as many as possible)

Use a LP Solver (needs all cycles of step 1).

4.4. SimulateSimulate to build the k-periodick-periodic schedule

and place place the extra FFractional ractional RRegistersegisters and and their schedulestheir schedules (hold)(hold)..

26

Page 27: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

KPassa ToolKPassa Tool27

• Pronounced “Que passaQue passa”, for KK--PPeriodiceriodic AAs-s-SSoon-oon-as-possible as-possible SScheduling and cheduling and AAnalysisnalysis.

– Written in JAVA®.– uses ILOG® CPLEX® LP Solver to add virtual

latencies.– uses INRIA Mascopt Lib: : (Graph algorithms)

www-sop.inria.fr/mascotte/mascopt/ www-sop.inria.fr/mascotte/mascopt/

Page 28: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

KPassa ToolKPassa Tool28

Page 29: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

KPassa ToolKPassa Tool29

Relatively Time Efficient, but Space Efficiency can be greatly improved.

Page 30: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Wrong conjecture (paper)Wrong conjecture (paper)30

Conjecture:Conjecture:FR elements are needed only where faster cycles merge into slower cycles (because there is less than one full latency difference).

Recently found a non trivial counter-examples, Recently found a non trivial counter-examples, using KPassa: using KPassa: two loops of respective throughput 8/22 and 5/13.

Conjecture true under evenly spread token distribution (see proof in paper)

Page 31: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Future WorkFuture Work

• Study efficient asynchronous initializationStudy efficient asynchronous initialization– so that smooth periodic regimes are met fast – so number of required FR is minimized

• Optimize allocation of virtual latenciesOptimize allocation of virtual latencies

• Avoid Power PeaksAvoid Power Peaks– Distribute computations evenly (recycling)

• Improve accuracy of our conjectureImprove accuracy of our conjecture– Smoothness property

31

Page 32: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Thank you !Thank you !Any Question ?Any Question ?

32

Page 33: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Historical biblio recapHistorical biblio recap

• [ReiterReiter, 1968]: the systemsystem is limitedlimited to the throughputthroughput of its slowest cycle component.slowest cycle component.

• [Carlier, ChretienneCarlier, Chretienne 1987]: the system under ASAP system under ASAP rulerule is actually ultimately k-periodic with the ultimately k-periodic with the throughput of its slowest cycle.throughput of its slowest cycle.

• [Baccelli,Cohen,Quadrat et. alBaccelli,Cohen,Quadrat et. al 1992]: value of p

p = lcmcritical_SCCs (gcdcycles_in_critical_SCC (latency))

33

Page 34: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Latency Insensitive DesignLatency Insensitive Design34

RS

SW

RS FULL

SEND STOP !

HOLD 2 DATA

CANNOT SEND STOP

Hardware Implementation:• Relay Stations Relay Stations for buffering.for buffering.• Shell wrappersShell wrappers around CN

local clock gating.

RS Behavior:• IDEA: hold its token (if any) when congestion.• BUT: cannot warn its predecessor.• THUS may simultaneously receive a second token.• THEN: will signal congestion at next instant (no new

token).

NEEDS buffers of capacity 2 in between computation/transport nodes.

Page 35: Equalization LIDCentral Repetitive Problem Equalization in LID and the Central Repetitive Problem MEMOCODE’06 - 29th July – Napa Julien BOUCARON, Robert

Latency Insensitive DesignLatency Insensitive Design35

RS

SW

RS FULL

RS FULL

HOLD 2 DATA

NEEDS buffers of capacity 2 in between computation/transport nodes.

Hardware Implementation:• Relay Stations Relay Stations for buffering.for buffering.• Shell wrappersShell wrappers around CN

local clock gating.

RS Behavior:• IDEA: hold its token (if any) when congestion.• BUT: cannot warn its predecessor.• THUS may simultaneously receive a second token.• THEN: will signal congestion at next instant (no new

token).