discrete event

90
1 Discrete Event Discrete Event Explicit notion of time (global order…) Explicit notion of time (global order…) DE simulator maintains a global event DE simulator maintains a global event queue (Verilog and VHDL) queue (Verilog and VHDL) Drawbacks Drawbacks global event queue => tight coordination global event queue => tight coordination between parts between parts Simultaneous events => non-deterministic Simultaneous events => non-deterministic behavior behavior Some simulators use delta delay to Some simulators use delta delay to prevent non-determinacy prevent non-determinacy

Upload: risa

Post on 22-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Discrete Event. Explicit notion of time (global order…) DE simulator maintains a global event queue (Verilog and VHDL) Drawbacks global event queue => tight coordination between parts Simultaneous events => non-deterministic behavior Some simulators use delta delay to prevent non-determinacy. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Discrete Event

1

Discrete EventDiscrete Event

Explicit notion of time (global order…)Explicit notion of time (global order…)

DE simulator maintains a global event queue (Verilog and DE simulator maintains a global event queue (Verilog and

VHDL)VHDL)

DrawbacksDrawbacksglobal event queue => tight coordination between partsglobal event queue => tight coordination between parts

Simultaneous events => non-deterministic behaviorSimultaneous events => non-deterministic behavior

Some simulators use delta delay to prevent non-Some simulators use delta delay to prevent non-

determinacydeterminacy

Page 2: Discrete Event

2

Simultaneous Events in DE Simultaneous Events in DE

AA BB CCtt

tt

Fire B or C?Fire B or C?

AA BB CC

tt

AA BB CC

tt

tt

B has 0 delayB has 0 delay B has delta delayB has delta delay

Fire C once? or twice?Fire C once? or twice?

t+t+

Fire C twice.Fire C twice.

Still have problem with 0-delay Still have problem with 0-delay (causality) loop(causality) loop

Can be refinedCan be refined

E.g. introduce timing constraintsE.g. introduce timing constraints

(minimum reaction time 0.1 s)(minimum reaction time 0.1 s)

Page 3: Discrete Event

3

OutlineOutline

Synchrony and asynchronySynchrony and asynchrony

CFSM definitionsCFSM definitionsSignals & networksSignals & networks

Timing behaviorTiming behavior

Functional behaviorFunctional behavior

CFSM & process networksCFSM & process networks

Example of CFSM behaviorsExample of CFSM behaviorsEquivalent classesEquivalent classes

Page 4: Discrete Event

4

Codesign Finite State MachineCodesign Finite State Machine

Underlying MOC of PolisUnderlying MOC of Polis

Combine aspects from several other MOCsCombine aspects from several other MOCs

Preserve formality and efficiency in implementationPreserve formality and efficiency in implementation

Mix Mix synchronicitysynchronicity

zero and infinite timezero and infinite time

asynchronicityasynchronicity non-zero, finite, and bounded timenon-zero, finite, and bounded time

Embedded systems often contain both aspectsEmbedded systems often contain both aspects

Page 5: Discrete Event

5

Synchrony: Basic OperationSynchrony: Basic Operation

Synchrony is often implemented with clocksSynchrony is often implemented with clocks

At clock ticksAt clock ticksModule reads inputs, computes, and produce outputModule reads inputs, computes, and produce output

All synchronous events happen simultaneouslyAll synchronous events happen simultaneously

Zero-delay computationsZero-delay computations

Between clock ticksBetween clock ticks Infinite amount of time passedInfinite amount of time passed

Page 6: Discrete Event

6

Synchrony: Basic Operation (2)Synchrony: Basic Operation (2)

Practical implementation of synchronyPractical implementation of synchrony Impossible to get zero or infinite delayImpossible to get zero or infinite delay

Require: computation time <<< clock periodRequire: computation time <<< clock period

Computation time = 0, w.r.t. reaction time of environmentComputation time = 0, w.r.t. reaction time of environment

Feature of synchronyFeature of synchronyFunctional behavior independent of timingFunctional behavior independent of timing

Simplify verificationSimplify verification

Cyclic dependencies may cause problemCyclic dependencies may cause problem Among (simultaneous) synchronous eventsAmong (simultaneous) synchronous events

Page 7: Discrete Event

7

Synchrony: Synchrony: Triggering and OrderingTriggering and Ordering

All modules are triggered at each clock tickAll modules are triggered at each clock tick

Simultaneous signalsSimultaneous signalsNo a priori orderingNo a priori ordering

Ordering may be imposed by dependenciesOrdering may be imposed by dependencies Implemented with delta stepsImplemented with delta steps

computation

continuous time

ticks

delta steps

Page 8: Discrete Event

8

Synchrony: System SolutionSynchrony: System Solution

System solutionSystem solutionOutput reaction to a set of inputsOutput reaction to a set of inputs

Well-designed system:Well-designed system: Is completely specified and functionalIs completely specified and functionalHas an unique solution at each clock tickHas an unique solution at each clock tick Is equivalent to a single FSMIs equivalent to a single FSMAllows efficient analysis and verificationAllows efficient analysis and verification

Well-design-ness Well-design-ness May need to be checked for each design (Esterel)May need to be checked for each design (Esterel)

Cyclic dependency among simultaneous events Cyclic dependency among simultaneous events

Page 9: Discrete Event

9

Synchrony: Synchrony: Implementation CostImplementation Cost

Must verify synchronous assumption on final designMust verify synchronous assumption on final designMay be expensiveMay be expensive

Examples:Examples:HardwareHardware

Clock cycle > maximum computation timeClock cycle > maximum computation time Inefficient for average case

Software Software Process must finish computation beforeProcess must finish computation before

New input arrival Another process needs to start computation

Page 10: Discrete Event

10

Asynchrony: Basic OperationAsynchrony: Basic Operation

Events are never simultaneousEvents are never simultaneousNo two events have the same tagNo two events have the same tag

Computation starts at a change of the inputComputation starts at a change of the input

Delays are arbitrary, but boundedDelays are arbitrary, but bounded

Page 11: Discrete Event

11

Asynchrony: Triggering and OrderingAsynchrony: Triggering and Ordering

Each module is triggered to run at a change of inputEach module is triggered to run at a change of input

No a priori ordering among triggered modulesNo a priori ordering among triggered modulesMay be imposed by scheduling at implementationMay be imposed by scheduling at implementation

Page 12: Discrete Event

12

Asynchrony: System SolutionAsynchrony: System Solution

Solution strongly dependent on input timingSolution strongly dependent on input timing

At implementationAt implementationEvents may “appear” simultaneousEvents may “appear” simultaneous

Difficult/expensive to maintain total orderingDifficult/expensive to maintain total ordering Ordering at implementation decides behaviorOrdering at implementation decides behavior Becomes DE, with the same pitfallsBecomes DE, with the same pitfalls

Page 13: Discrete Event

13

Asynchrony: Asynchrony: Implementation CostImplementation Cost

Achieve low computation time (average)Achieve low computation time (average)Different parts of the system compute at different ratesDifferent parts of the system compute at different rates

Analysis is difficultAnalysis is difficultBehavior depends on timingBehavior depends on timing

Maybe be easier for designs that are insensitive to Maybe be easier for designs that are insensitive to Internal delayInternal delay External timingExternal timing

Page 14: Discrete Event

14

Asynchrony vs. Synchrony in System DesignAsynchrony vs. Synchrony in System Design

They are different at least atThey are different at least atEvent bufferingEvent buffering

Timing of event read/writeTiming of event read/write

AsynchronyAsynchronyExplicit buffering of events for each moduleExplicit buffering of events for each module

Vary and unknown at start-timeVary and unknown at start-time

SynchronySynchronyOne global copy of eventOne global copy of event

Same start time for all modulesSame start time for all modules

Page 15: Discrete Event

15

Combining Combining Synchrony and AsynchronySynchrony and Asynchrony

Wants to combineWants to combineFlexibility of asynchrony Flexibility of asynchrony

Verifiability of synchrony Verifiability of synchrony

AsynchronyAsynchronyGlobally, a timing independent style of thinking Globally, a timing independent style of thinking

SynchronySynchronyLocal portion of design are often tightly synchronizedLocal portion of design are often tightly synchronized

Globally asynchronous, locally synchronousGlobally asynchronous, locally synchronousCFSM networksCFSM networks

Page 16: Discrete Event

16

CFSM OverviewCFSM Overview

CFSM is FSM extended withCFSM is FSM extended withSupport for data handlingSupport for data handling

Asynchronous communicationAsynchronous communication

CFSM hasCFSM hasFSM partFSM part

Inputs, outputs, states, transition and output relationInputs, outputs, states, transition and output relation

Data computation partData computation part External, instantaneous functionsExternal, instantaneous functions

Page 17: Discrete Event

17

CFSM Overview (2)CFSM Overview (2)

CFSM has:CFSM has:Locally synchronous behaviorLocally synchronous behavior

CFSM executes based on snap-shot input assignmentCFSM executes based on snap-shot input assignment Synchronous from its own perspectiveSynchronous from its own perspective

Globally asynchronous behaviorGlobally asynchronous behavior CFSM executes in non-zero, finite amount of timeCFSM executes in non-zero, finite amount of time Asynchronous from system perspectiveAsynchronous from system perspective

GALS modelGALS modelGlobally: Scheduling mechanismGlobally: Scheduling mechanism

Locally: CFSMsLocally: CFSMs

Page 18: Discrete Event

12/09/1999 18

Network of CFSMs: Depth-1 BuffersNetwork of CFSMs: Depth-1 Buffers

CFSM2

CFSM3

C=>G

CFSM1

C=>FB=>C

F^(G==1)

(A==0)=>B

C=>ACFSM1 CFSM2

C=>B

F

G

C

C

BA

C=>G

C=>B

Globally Asynchronous, Locally Synchronous (GALS)

model

Page 19: Discrete Event

19

Introducing a CFSMIntroducing a CFSM

A Finite State MachineA Finite State Machine

Input events, output events and Input events, output events and statestate events events

Initial values (for state events)Initial values (for state events)

A transition functionA transition functionTransitions may involve Transitions may involve complex, memory-less, instantaneouscomplex, memory-less, instantaneous

arithmetic and/or Boolean functionsarithmetic and/or Boolean functions

All the state of the system is under form of eventsAll the state of the system is under form of events

Need rules that define the CFSM behaviorNeed rules that define the CFSM behavior

Page 20: Discrete Event

20

CFSM Rules: phasesCFSM Rules: phases

Four-phase cycle:Four-phase cycle: IdleIdle Detect input eventsDetect input events Execute one transitionExecute one transition Emit output eventsEmit output events

Discrete timeDiscrete timeSufficiently accurate for synchronous systemsSufficiently accurate for synchronous systemsFeasible formal verificationFeasible formal verification

Model semantics: Model semantics: Timed Traces Timed Traces i.e. sequences of events i.e. sequences of events labeled by time of occurrencelabeled by time of occurrence

Page 21: Discrete Event

21

CFSM Rules: phasesCFSM Rules: phases

Implicit Implicit unbounded delayunbounded delay between phases between phases

Non-zeroNon-zero reaction time reaction time (avoid (avoid inconsistenciesinconsistencies when interconnected) when interconnected)

Causal Causal model based on model based on partial orderpartial order (global asynchronicity)(global asynchronicity)potential verification speed-uppotential verification speed-up

Phases Phases may not overlapmay not overlap

Transitions always Transitions always clear input buffersclear input buffers(local synchronicity)(local synchronicity)

Page 22: Discrete Event

22

Communication PrimitivesCommunication Primitives

SignalsSignalsCarry information in the form of events and/or valuesCarry information in the form of events and/or values

Event signals: present/absenceEvent signals: present/absence Data signals: arbitrary valuesData signals: arbitrary values

Event, data may be paired

Communicate between two CFSMsCommunicate between two CFSMs 1 input buffer / signal / receiver1 input buffer / signal / receiver

Emitted by a sender CFSMEmitted by a sender CFSM

Consumed by a receiver CFSM by setting buffer to 0Consumed by a receiver CFSM by setting buffer to 0

““Present” if emitted but not consumedPresent” if emitted but not consumed

Page 23: Discrete Event

23

Communication Primitives (2)Communication Primitives (2)

Input assignmentInput assignmentA set of values for the input signals of a CFSMA set of values for the input signals of a CFSM

Captured input assignmentCaptured input assignmentA set of input values read by a CFSM at a particular timeA set of input values read by a CFSM at a particular time

Input stimulusInput stimulus Input assignment with at least one event presentInput assignment with at least one event present

Page 24: Discrete Event

24

Signals and CFSMSignals and CFSM

CFSMCFSM Initiates communication through eventsInitiates communication through events

Reacts only to input stimulusReacts only to input stimulus except initial reactionexcept initial reaction

Writes data first, then emits associated eventWrites data first, then emits associated event

Reads event first, then reads associated dataReads event first, then reads associated data

Page 25: Discrete Event

25

CFSM networksCFSM networks

NetNetA set of connections on the same signalA set of connections on the same signalAssociated with single sender and multiple receiversAssociated with single sender and multiple receiversAn input buffer for each receiver on a netAn input buffer for each receiver on a net

Multi-cast communicationMulti-cast communication

Network of CFSMsNetwork of CFSMsA set of CFSMs, nets, and a scheduling mechanismA set of CFSMs, nets, and a scheduling mechanismCan be implemented asCan be implemented as

A set of CFSMs in SW (program/compiler/OS/uC)A set of CFSMs in SW (program/compiler/OS/uC) A set of CFSMs in HW (HDL/gate/clocking)A set of CFSMs in HW (HDL/gate/clocking) Interface (polling/interrupt/memory-mapped)Interface (polling/interrupt/memory-mapped)

Page 26: Discrete Event

26

Scheduling MechanismScheduling Mechanism

At the specification levelAt the specification levelShould be as abstract as possible to allow optimizationShould be as abstract as possible to allow optimization

Not fixed in any way by CFSM MOCNot fixed in any way by CFSM MOC

May be implemented asMay be implemented asRTOS for single processorRTOS for single processor

Concurrent execution for HWConcurrent execution for HW

Set of RTOSs for multi-processorSet of RTOSs for multi-processor

Set of scheduling FSMs for HWSet of scheduling FSMs for HW

Page 27: Discrete Event

27

Timing Behavior Timing Behavior

Scheduling MechanismScheduling MechanismGlobally controls the interaction of CFSMsGlobally controls the interaction of CFSMs

Continually deciding which CFSMs can be executedContinually deciding which CFSMs can be executed

CFSM can beCFSM can be IdleIdle

Waiting for input eventsWaiting for input events Waiting to be executed by schedulerWaiting to be executed by scheduler

ExecutingExecuting Generate a single reactionGenerate a single reaction Reads its inputs, computes, writes outputsReads its inputs, computes, writes outputs

Page 28: Discrete Event

28

Timing Behavior: Mathematical ModelTiming Behavior: Mathematical Model

Transition PointTransition PointPoint in time a CFSM starts executingPoint in time a CFSM starts executing

For each executionFor each execution Input signals are read and clearedInput signals are read and cleared

Partial order between input and outputPartial order between input and output

Event is read before dataEvent is read before data

Data is written before event emissionData is written before event emission

Page 29: Discrete Event

29

Timing Behavior: Transition PointTiming Behavior: Transition Point

A transition point tA transition point tii

Input may be read between tInput may be read between tii and t and ti+1i+1

Event that is read may have occurred between tEvent that is read may have occurred between ti-1i-1 and t and ti+1i+1

Data that is read may have occurred between tData that is read may have occurred between t00 and t and ti+1i+1

Outputs are written between tOutputs are written between tii and t and ti+1i+1

CFSM allow loose synchronization of event & dataCFSM allow loose synchronization of event & dataLess restrictive implementationLess restrictive implementation

May lead to non intuitive behaviorMay lead to non intuitive behavior

Page 30: Discrete Event

30

Event/Data SeparationEvent/Data Separation

Sender S

Receiver R

t1ti-1 t2 ti t3 t4 ti+1

Read Event Read Value

Write v1 Emit Write v2 Emit

Value v1 is lost even thoughValue v1 is lost even though It is sent with an eventIt is sent with an event

Event may not be lostEvent may not be lost

Need atomicityNeed atomicity

Page 31: Discrete Event

31

AtomicityAtomicity

Group of actions considered as a single entityGroup of actions considered as a single entity

May be costly to implementMay be costly to implement

Only atomicity requirement of CFSMOnly atomicity requirement of CFSM Input events are read atomicallyInput events are read atomically

Can be enforced in SW (bit vector) HW (buffer)Can be enforced in SW (bit vector) HW (buffer) CFSM is guaranteed to see a snapshot of input eventsCFSM is guaranteed to see a snapshot of input events

Non-atomicity of event and dataNon-atomicity of event and dataMay lead to undesirable behaviorMay lead to undesirable behavior

Atomicized as an implementation trade-off decisionAtomicized as an implementation trade-off decision

Page 32: Discrete Event

32

Non Atomic Data Value ReadingNon Atomic Data Value Reading

Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4)Receiver R1 gets (X=4, Y=5), R2 gets (X=5 Y=4)

X=4 Y=5 never occursX=4 Y=5 never occurs

Can be remedied if values are sent with eventsCan be remedied if values are sent with events still suffers from separation of data and eventstill suffers from separation of data and event

Sender S

Receiver R1

t1 t2 t3 t4 t5 t6

Receiver R2

X:=4Y:=4 X:=5 Y:=5

Read X

Read X Read Y

Read Y

Page 33: Discrete Event

33

Atomicity of Event ReadingAtomicity of Event Reading

R1 sees no events, R2 sees X, R3 sees X, YR1 sees no events, R2 sees X, R3 sees X, Y

Each sees a snapshot of events in timeEach sees a snapshot of events in time

Different captured input assignmentDifferent captured input assignmentBecause of scheduling and delayBecause of scheduling and delay

Sender S

Receiver R1

t1 t2 t3 t4 t5

Receiver R2

Receiver R3

Emit X Emit Y

Read

Read

Read

Page 34: Discrete Event

34

Functional BehaviorFunctional Behavior

Transition and output relationsTransition and output relations input, present_state, next_state, outputinput, present_state, next_state, output

At each execution, a CFSMAt each execution, a CFSMReads a captured input assignmentReads a captured input assignment

If there is a match in transition relationIf there is a match in transition relation consume inputs, transition to next_state, write outputsconsume inputs, transition to next_state, write outputs

OtherwiseOtherwise consume no inputs, no transition, no outputsconsume no inputs, no transition, no outputs

Page 35: Discrete Event

35

Functional Behavior (2)Functional Behavior (2)

Empty TransitionEmpty TransitionNo matching transition is foundNo matching transition is found

Trivial TransitionTrivial TransitionA transition that has no output and no state changesA transition that has no output and no state changesEffectively throw away inputsEffectively throw away inputs

Initial transitionInitial transitionTransition to the init (reset) stateTransition to the init (reset) stateNo input event needed for this transitionNo input event needed for this transition

Page 36: Discrete Event

36

CFSM and Process NetworksCFSM and Process Networks

CFSMCFSMAn asynchronous extended FSM modelAn asynchronous extended FSM model

Communication via bounded non-blocking buffersCommunication via bounded non-blocking buffers Versus CSP and CCS (rendezvous)Versus CSP and CCS (rendezvous) Versus SDL (unbounded queue & variable topology)Versus SDL (unbounded queue & variable topology)

Not continuous in Kahn’s senseNot continuous in Kahn’s sense Different event ordering may change behaviorDifferent event ordering may change behavior

Versus dataflow (ordering insensitive)

Page 37: Discrete Event

37

CFSM NetworksCFSM Networks

Defined based on a global notion of timeDefined based on a global notion of timeTotal order of eventsTotal order of events

Synchronous with relaxed timingSynchronous with relaxed timing Global consistent state of signals is requiredGlobal consistent state of signals is required Input and output are in partial orderInput and output are in partial order

Page 38: Discrete Event

38

Buffer OverwriteBuffer Overwrite

CFSM Network hasCFSM Network hasFinite BufferingFinite Buffering

Non-blocking writeNon-blocking write Events can be overwrittenEvents can be overwritten

if the sender is “faster” than receiver

To ensure no overwriteTo ensure no overwriteExplicit handshaking mechanismExplicit handshaking mechanism

SchedulingScheduling

Page 39: Discrete Event

39

Example of CFSM BehaviorsExample of CFSM Behaviors

A and B produce i1 and i2 at every iA and B produce i1 and i2 at every i

C produce C produce errerr or or oo at every i1,i2 at every i1,i2

Delay (Delay (ii to to oo) for normal operation is n) for normal operation is nrr, , errerr operation 2n operation 2nrr

Minimum input interval is nMinimum input interval is nii

Intuitive “correct” behaviorIntuitive “correct” behavior No events are lostNo events are lost

A

B

Ci

i1

i2

err o

Page 40: Discrete Event

40

Equivalent Classes of CFSM BehaviorEquivalent Classes of CFSM Behavior

Assume parallel execution (HW, 1 CFSM/processor)Assume parallel execution (HW, 1 CFSM/processor)

Equivalent classes of behaviors are:Equivalent classes of behaviors are:Zero DelayZero Delay

nnrr= 0= 0

Input buffer overwriteInput buffer overwrite nniinnrr

Time critical operationTime critical operation nnii/2/2nnrrnnii

Normal operation Normal operation nnrrnnii/2/2

Page 41: Discrete Event

41

Equivalent Classes of CFSM Behavior (2)Equivalent Classes of CFSM Behavior (2)

Zero delay: nZero delay: nrr= 0 = 0 If C emits an error on some inputIf C emits an error on some input

A, B can react instantaneously & output differentlyA, B can react instantaneously & output differently

May be logically inconsistentMay be logically inconsistent

Input buffers overwrite: nInput buffers overwrite: niinnrr Execution delay of A, B is larger than arrival intervalExecution delay of A, B is larger than arrival interval

always loss of event always loss of event requirements not satisfiedrequirements not satisfied

Page 42: Discrete Event

42

Equivalent Classes of CFSM Behavior (3)Equivalent Classes of CFSM Behavior (3)

Time critical operation: nTime critical operation: nii/2/2nnrrnnii

Normal operation results in no loss of eventNormal operation results in no loss of event

Error operation may cause lost inputError operation may cause lost input

Normal operation: nNormal operation: nrrnnii/2/2No events are lostNo events are lost

May be expensive to implementMay be expensive to implement

If error is infrequentIf error is infrequentDesigner may accept also time critical operationDesigner may accept also time critical operation

Can result in lower-cost implementationCan result in lower-cost implementation

Page 43: Discrete Event

43

Equivalent Classes of CFSM Behavior (4)Equivalent Classes of CFSM Behavior (4)

Implementation on a single processorImplementation on a single processorLoss of Event may be caused byLoss of Event may be caused by

Timing constraintsTiming constraints ni<3nr

Incorrect schedulingIncorrect scheduling If empty transition also takes nr

• ACBC round robin will miss event• ABC round robin will not

Page 44: Discrete Event

44

Some Possibility of Equivalent ClassesSome Possibility of Equivalent Classes

Given 2 arbitrary implementations, 1 input stream:Given 2 arbitrary implementations, 1 input stream:Dataflow equivalenceDataflow equivalence

Output streams are the same orderingOutput streams are the same ordering

Petri net equivalencePetri net equivalence Output streams satisfied some partial orderOutput streams satisfied some partial order

Golden model equivalenceGolden model equivalence Output streams are the same orderingOutput streams are the same ordering

Except reordering of concurrent events One of the implementations is a reference specificationOne of the implementations is a reference specification

Filtered equivalenceFiltered equivalence Output streams are the same after filtered by observerOutput streams are the same after filtered by observer

Page 45: Discrete Event

45

ConclusionConclusion

CFSMCFSMExtension: ACFSM: Initially unbounded FIFO buffersExtension: ACFSM: Initially unbounded FIFO buffers

Bounds on buffers are imposed by refinement to yield ECFSMBounds on buffers are imposed by refinement to yield ECFSM

Delay is also refined by implementationDelay is also refined by implementation

Local synchronyLocal synchrony Relatively large atomic synchronous entitiesRelatively large atomic synchronous entities

Global asynchronyGlobal asynchrony Break synchrony, no compositional problemBreak synchrony, no compositional problem Allow efficient mapping to heterogeneous architecturesAllow efficient mapping to heterogeneous architectures

Page 46: Discrete Event

46

Data-flow networksData-flow networks

A bit of historyA bit of history

Syntax and semanticsSyntax and semantics actors, tokens and firingsactors, tokens and firings

Scheduling of Static Data-flowScheduling of Static Data-flow static schedulingstatic scheduling

code generationcode generation

buffer sizingbuffer sizing

Other Data-flow modelsOther Data-flow models Boolean Data-flowBoolean Data-flow

Dynamic Data-flowDynamic Data-flow

Page 47: Discrete Event

47

Data-flow networksData-flow networks

Powerful formalism for data-dominated system specificationPowerful formalism for data-dominated system specification

Partially-ordered model (no over-specification)Partially-ordered model (no over-specification)

Deterministic execution independent of schedulingDeterministic execution independent of scheduling

Used forUsed for simulationsimulation

schedulingscheduling

memory allocationmemory allocation

code generationcode generation

for Digital Signal Processors (HW and SW)for Digital Signal Processors (HW and SW)

Page 48: Discrete Event

48

A bit of historyA bit of history

Karp computation graphs (‘66): seminal work Karp computation graphs (‘66): seminal work

Kahn process networks (‘58): formal modelKahn process networks (‘58): formal model

Dennis Data-flow networks (‘75): programming language for MIT DF machineDennis Data-flow networks (‘75): programming language for MIT DF machine

Several recent implementationsSeveral recent implementations graphical:graphical:

Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven)Ptolemy (UCB), Khoros (U. New Mexico), Grape (U. Leuven) SPW (Cadence), COSSAP (Synopsys)SPW (Cadence), COSSAP (Synopsys)

textual:textual: Silage (UCB, Mentor)Silage (UCB, Mentor) Lucid, HaskellLucid, Haskell

Page 49: Discrete Event

49

Data-flow network Data-flow network

A Data-flow network is a collection of A Data-flow network is a collection of functional nodesfunctional nodes

which are connected and communicate over which are connected and communicate over unbounded unbounded

FIFO queuesFIFO queues

Nodes are commonly called Nodes are commonly called actorsactors

The bits of information that are communicated over the The bits of information that are communicated over the

queues are commonly called queues are commonly called tokenstokens

Page 50: Discrete Event

50

Intuitive semanticsIntuitive semantics

(Often stateless) actors perform computation(Often stateless) actors perform computation

Unbounded FIFOs perform communication via Unbounded FIFOs perform communication via sequences of tokenssequences of tokens carrying values carrying values integer, float, fixed pointinteger, float, fixed point matrix of integer, float, fixed pointmatrix of integer, float, fixed point image of pixelsimage of pixels

State implemented as self-loop State implemented as self-loop

Determinacy: Determinacy: unique output sequences given unique input sequences unique output sequences given unique input sequences

Sufficient condition: Sufficient condition: blocking readblocking read(process cannot test input queues for emptiness)(process cannot test input queues for emptiness)

Page 51: Discrete Event

51

Intuitive semanticsIntuitive semantics

At each time, one actor is At each time, one actor is firedfired

When firing, actors When firing, actors consumeconsume input tokens and input tokens and produceproduce

output tokensoutput tokens

Actors can be fired only if there are enough tokens in the Actors can be fired only if there are enough tokens in the

input queuesinput queues

Page 52: Discrete Event

52

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 53: Discrete Event

53

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 54: Discrete Event

54

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 55: Discrete Event

55

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 56: Discrete Event

56

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 57: Discrete Event

57

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

i(-1)

Page 58: Discrete Event

58

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

Page 59: Discrete Event

59

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

Page 60: Discrete Event

60

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

Page 61: Discrete Event

61

Intuitive semanticsIntuitive semantics

Example: FIR filterExample: FIR filtersingle input sequence i(n)single input sequence i(n)

single output sequence o(n)single output sequence o(n)

o(n) = c1 i(n) + c2 i(n-1) o(n) = c1 i(n) + c2 i(n-1)

* c1

+ o

i * c2

Page 62: Discrete Event

62

QuestionsQuestions

Does the order in which actors are fired affect the final Does the order in which actors are fired affect the final

result?result?

Does it affect the “operation” of the network in any way?Does it affect the “operation” of the network in any way?

Go to Radio Shack and ask for an unbounded queue!!Go to Radio Shack and ask for an unbounded queue!!

Page 63: Discrete Event

63

Formal semantics: sequencesFormal semantics: sequences

Actors operate from a Actors operate from a sequencesequence of input tokens to a of input tokens to a sequencesequence of of

output tokensoutput tokens

Let tokens be noted by xLet tokens be noted by x11, x, x22, x, x33, etc…, etc…

A A sequencesequence of tokens is defined as of tokens is defined as

X = [ xX = [ x11, x, x22, x, x33, …], …]

Over the execution of the network, each queue will grow a particular Over the execution of the network, each queue will grow a particular

sequence of tokenssequence of tokens

In general, we consider the actors mathematically as functions from In general, we consider the actors mathematically as functions from

sequences to sequences (not from tokens to tokens)sequences to sequences (not from tokens to tokens)

Page 64: Discrete Event

64

Ordering of sequencesOrdering of sequences

Let XLet X11 and X and X22 be two sequences of tokens. be two sequences of tokens.

We say that We say that XX11 is less than X is less than X22 if and only if (by definition) if and only if (by definition) XX11

is an initial segment of Xis an initial segment of X22

Homework: prove that the relation so defined is a partial Homework: prove that the relation so defined is a partial

order (reflexive, antisymmetric and transitive)order (reflexive, antisymmetric and transitive)

This is also called the This is also called the prefix orderprefix order

Example: [ xExample: [ x11, x, x2 2 ] <= [ x] <= [ x11, x, x22, x, x33 ] ]

Example: [ xExample: [ x11, x, x2 2 ] and [ x] and [ x11, x, x33, x, x44 ] are incomparable ] are incomparable

Page 65: Discrete Event

65

Chains of sequencesChains of sequences

Consider the set S of all finite and infinite sequences of Consider the set S of all finite and infinite sequences of tokenstokens

This set is partially ordered by the prefix orderThis set is partially ordered by the prefix order

A subset C of S is called a A subset C of S is called a chainchain iff all pairs of elements of iff all pairs of elements of C are C are comparablecomparable

If C is a chain, then it must be a If C is a chain, then it must be a linear orderlinear order inside S inside S (otherwise, why call it chain?)(otherwise, why call it chain?)

Example: { [ xExample: { [ x11 ], [ x ], [ x11, x, x2 2 ], [ x], [ x11, x, x22, x, x33 ], … } is a chain ], … } is a chain

Example: { [ xExample: { [ x11 ], [ x ], [ x11, x, x2 2 ], [ x], [ x11, x, x3 3 ], … } is not a chain ], … } is not a chain

Page 66: Discrete Event

66

(Least) Upper Bound(Least) Upper Bound

Given a subset Y of S, an Given a subset Y of S, an upper boundupper bound of Y is an element z of Y is an element z of S such that z is of S such that z is largerlarger than all elements of Y than all elements of Y

Consider now the set Z (subset of S) of all the upper Consider now the set Z (subset of S) of all the upper bounds of Ybounds of Y

If Z has a least element u, then u is called the If Z has a least element u, then u is called the least upper least upper boundbound (lub) of Y (lub) of Y

The least upper bound, if it exists, is unique The least upper bound, if it exists, is unique

Note: u might not be in Y (if it is, then it is the largest value Note: u might not be in Y (if it is, then it is the largest value of Y)of Y)

Page 67: Discrete Event

67

Complete Partial OrderComplete Partial Order

Every chain in S has a least upper boundEvery chain in S has a least upper bound

Because of this property, S is called a Because of this property, S is called a Complete Partial Complete Partial

OrderOrder

Notation: if C is a chain, we indicate the least upper bound Notation: if C is a chain, we indicate the least upper bound

of C by lub( C )of C by lub( C )

Note: the least upper bound may be thought of as the limit Note: the least upper bound may be thought of as the limit

of the the chainof the the chain

Page 68: Discrete Event

68

ProcessesProcesses

Process: function from a p-tuple of sequences to a q-tuple of Process: function from a p-tuple of sequences to a q-tuple of

sequencessequences

F : SF : Spp -> S -> Sqq

Tuples have the induced point-wise order:Tuples have the induced point-wise order:

Y = ( yY = ( y11, … , y, … , ypp ), Y’ = ( y’ ), Y’ = ( y’11, … , y’, … , y’pp ) in S ) in Spp :Y <= Y’ iff y :Y <= Y’ iff yii <= y’ <= y’ii for for

all 1 <= i <= pall 1 <= i <= p

Given a chain C in SGiven a chain C in Spp, F( C ) may or may not be a chain in S, F( C ) may or may not be a chain in Sqq

We are interested in conditions that make that trueWe are interested in conditions that make that true

Page 69: Discrete Event

69

Continuity and MonotonicityContinuity and Monotonicity

Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) Continuity: F is continuous iff (by definition) for all chains C, lub( F( C ) ) exists andexists and

F( lub( C ) = lub( F( C ) )F( lub( C ) = lub( F( C ) )

Similar to continuity in analysis using limitsSimilar to continuity in analysis using limits

Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ Monotonicity: F is monotonic iff (by definition) for all pairs X, X’ X <= X’ => F( X ) <= F( X’ )X <= X’ => F( X ) <= F( X’ )

Continuity implies monotonicityContinuity implies monotonicity intuitively, outputs cannot be “withdrawn” once they have been producedintuitively, outputs cannot be “withdrawn” once they have been produced timeless causality. F transforms chains into chainstimeless causality. F transforms chains into chains

Page 70: Discrete Event

70

Least Fixed Point semanticsLeast Fixed Point semantics

Let X be the set of all sequencesLet X be the set of all sequences

A network is a mapping F from the sequences to the A network is a mapping F from the sequences to the

sequencessequences

X = F( X, I )X = F( X, I )

The behavior of the network is defined as the The behavior of the network is defined as the unique least unique least

fixed pointfixed point of the equation of the equation

If F is continuous then the least fixed point existsIf F is continuous then the least fixed point exists LFP = LFP =

LUB( { FLUB( { Fnn( ( , I ) : n >= 0 } ), I ) : n >= 0 } )

Page 71: Discrete Event

71

From Kahn networks to Data Flow networksFrom Kahn networks to Data Flow networks

Each process becomes an Each process becomes an actoractor: set of pairs of: set of pairs of firing rule firing rule

(number of required tokens on inputs)(number of required tokens on inputs)

function function

(including number of consumed and produced tokens) (including number of consumed and produced tokens)

Formally shown to be equivalent, but actors with firing are Formally shown to be equivalent, but actors with firing are

more intuitivemore intuitive

Mutually exclusiveMutually exclusive firing rules imply monotonicity firing rules imply monotonicity

Generally simplified to Generally simplified to blocking readblocking read

Page 72: Discrete Event

72

Examples of Data Flow actorsExamples of Data Flow actors

SDF: Synchronous (or, better, Static) Data FlowSDF: Synchronous (or, better, Static) Data Flow fixed input and output tokensfixed input and output tokens

BDF: Boolean Data FlowBDF: Boolean Data Flow control token determines consumed and produced tokenscontrol token determines consumed and produced tokens

+

1

11

FFT1024 1024 10 1

merge selectT F

FT

Page 73: Discrete Event

73

Static scheduling of DFStatic scheduling of DF

Key property of DF networks: output sequences do not depend on Key property of DF networks: output sequences do not depend on time time

ofof firingfiring of actors of actors

SDF networks can be SDF networks can be statically scheduledstatically scheduled at compile-time at compile-time execute an actor when it is execute an actor when it is knownknown to be fireable to be fireable no overhead due to sequencing of concurrencyno overhead due to sequencing of concurrency static buffer sizingstatic buffer sizing

Different schedules yield different Different schedules yield different code sizecode size buffer sizebuffer size pipeline utilizationpipeline utilization

Page 74: Discrete Event

74

Static scheduling of SDFStatic scheduling of SDF

Based only on Based only on process graphprocess graph (ignores functionality) (ignores functionality)

Network state: number of tokens in FIFOsNetwork state: number of tokens in FIFOs

Objective: find schedule that is Objective: find schedule that is validvalid, i.e.:, i.e.: admissibleadmissible

(only fires actors when fireable)(only fires actors when fireable)

periodic periodic

(brings network back to initial state firing each actor at least once)(brings network back to initial state firing each actor at least once)

Optimize cost function over admissible schedulesOptimize cost function over admissible schedules

Page 75: Discrete Event

75

Balance equationsBalance equations

Number of produced tokens must equal number of consumed tokens on every Number of produced tokens must equal number of consumed tokens on every edgeedge

Repetitions (or firing) vector vRepetitions (or firing) vector vS S of schedule S: number of firings of each actor of schedule S: number of firings of each actor

in Sin S

vvSS(A) n(A) npp = vvSS(B) n(B) ncc

must be satisfied for each edgemust be satisfied for each edge

np nc

A B

Page 76: Discrete Event

76

Balance equationsBalance equations

B C

A3

1

1

1

22

11

Balance for each edge:Balance for each edge: 3 v3 vSS(A) - v(A) - vSS(B) = 0(B) = 0

vvSS(B) - v(B) - vSS(C) = 0(C) = 0

2 v2 vSS(A) - v(A) - vSS(C) = 0(C) = 0

2 v2 vSS(A) - v(A) - vSS(C) = 0(C) = 0

Page 77: Discrete Event

77

Balance equationsBalance equations

M vM vSS = 0 = 0

iff S is periodiciff S is periodic

Full rank (as in this case) Full rank (as in this case) no non-zero solution no non-zero solution no periodic scheduleno periodic schedule

(too many tokens accumulate on A->B or B->C)(too many tokens accumulate on A->B or B->C)

3 -1 00 1 -12 0 -12 0 -1

M =

B C

A3

1

1

1

22

11

Page 78: Discrete Event

78

Balance equationsBalance equations

Non-full rankNon-full rank infinite solutions exist (linear space of dimension 1)infinite solutions exist (linear space of dimension 1)

Any multiple of q = |1 2 2|Any multiple of q = |1 2 2|TT satisfies the balance equations satisfies the balance equations

ABCBC and ABBCC are minimal valid schedulesABCBC and ABBCC are minimal valid schedules

ABABBCBCCC is non-minimal valid scheduleABABBCBCCC is non-minimal valid schedule

2 -1 00 1 -12 0 -12 0 -1

M =

B C

A2

1

1

1

22

11

Page 79: Discrete Event

79

Static SDF schedulingStatic SDF scheduling

Main SDF scheduling theorem (Lee ‘86):Main SDF scheduling theorem (Lee ‘86): A connected SDF graph with A connected SDF graph with nn actors has a periodic schedule iff its topology actors has a periodic schedule iff its topology

matrix M has rank matrix M has rank n-1n-1

If M has rank If M has rank n-1n-1 then there exists a unique smallest integer solution q to then there exists a unique smallest integer solution q to

M q = 0M q = 0

Rank must be at least Rank must be at least n-1n-1 because we need at least because we need at least n-1 n-1 edges edges

(connected-ness), providing each a linearly independent row(connected-ness), providing each a linearly independent row

Admissibility is not guaranteed, and depends on initial tokens on Admissibility is not guaranteed, and depends on initial tokens on

cyclescycles

Page 80: Discrete Event

80

Admissibility of schedulesAdmissibility of schedules

No admissible schedule:No admissible schedule:

BACBA, then deadlock…BACBA, then deadlock…

Adding one token (delay) on A->C makesAdding one token (delay) on A->C makes

BACBACBA validBACBACBA valid

Making a periodic schedule admissible is always possible, but changes specification...Making a periodic schedule admissible is always possible, but changes specification...

B C

A1

2

1

3

2

3

Page 81: Discrete Event

81

Admissibility of schedulesAdmissibility of schedules

Adding initial token changes FIR orderAdding initial token changes FIR order

* c1

+ o

i

* c2

i(-1)i(-2)

Page 82: Discrete Event

82

From repetition vector to scheduleFrom repetition vector to schedule

Repeatedly schedule fireable actors up to number of times in Repeatedly schedule fireable actors up to number of times in

repetition vectorrepetition vector

q = |1 2 2|q = |1 2 2|TT

Can find either ABCBC or ABBCC Can find either ABCBC or ABBCC

If deadlock before original state, no valid schedule exists (Lee ‘86)If deadlock before original state, no valid schedule exists (Lee ‘86)

B C

A2

1

1

1

22

11

Page 83: Discrete Event

83

From schedule to implementationFrom schedule to implementation Static scheduling used for:Static scheduling used for:

behavioral simulation of DF (extremely efficient)behavioral simulation of DF (extremely efficient)

code generation for DSP code generation for DSP

HW synthesis (Cathedral by IMEC, Lager by UCB, …)HW synthesis (Cathedral by IMEC, Lager by UCB, …)

Issues in code generationIssues in code generationexecution speed (pipelining, vectorization)execution speed (pipelining, vectorization)

code size minimizationcode size minimization

data memory size minimization (allocation to FIFOs)data memory size minimization (allocation to FIFOs)

processor or functional unit allocationprocessor or functional unit allocation

Page 84: Discrete Event

84

Compilation optimizationCompilation optimization

Assumption: Assumption: code stitchingcode stitching

(chaining custom code for each actor)(chaining custom code for each actor)

More efficient than C compiler for DSPMore efficient than C compiler for DSP

Comparable to hand-coding in some casesComparable to hand-coding in some cases

Explicit parallelism, no artificial control dependenciesExplicit parallelism, no artificial control dependencies

Main problem: memory and processor/FU allocation Main problem: memory and processor/FU allocation

depends on scheduling, and vice-versadepends on scheduling, and vice-versa

Page 85: Discrete Event

85

Code size minimizationCode size minimization

Assumptions (based on DSP architecture):Assumptions (based on DSP architecture):subroutine calls expensivesubroutine calls expensive

fixed iteration loops are cheap fixed iteration loops are cheap

(“zero-overhead loops”)(“zero-overhead loops”)

Absolute optimum: Absolute optimum: single appearance schedulesingle appearance schedule

e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C)e.g. ABCBC -> A (2BC), ABBCC -> A (2B) (2C) may or may not exist for an SDF graph…may or may not exist for an SDF graph… buffer minimization relative to single appearance schedules buffer minimization relative to single appearance schedules

(Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)(Bhattacharyya ‘94, Lauwereins ‘96, Murthy ‘97)

Page 86: Discrete Event

86

Assumption: no buffer sharingAssumption: no buffer sharing

Example:Example:

q = | 100 100 10 1|q = | 100 100 10 1|TT

Valid SAS: (100 A) (100 B) (10 C) DValid SAS: (100 A) (100 B) (10 C) D requires 210 units of buffer arearequires 210 units of buffer area

Better (factored) SAS: (10 (10 A) (10 B) C) DBetter (factored) SAS: (10 (10 A) (10 B) C) D requires 30 units of buffer areas, but…requires 30 units of buffer areas, but… requires 21 loop initiations per period (instead of 3)requires 21 loop initiations per period (instead of 3)

Buffer size minimizationBuffer size minimization

C D1 10

A

B10

10

1

1

Page 87: Discrete Event

87

Dynamic scheduling of DFDynamic scheduling of DF

SDF is limited in modeling power SDF is limited in modeling power no run-time choiceno run-time choice

cannot implement Gaussian elimination with pivotingcannot implement Gaussian elimination with pivoting

More general DF is too powerfulMore general DF is too powerful non-Static DF is Turing-complete (Buck ‘93) non-Static DF is Turing-complete (Buck ‘93)

bounded-memory scheduling is not always possiblebounded-memory scheduling is not always possible

BDF: semi-static scheduling of special “patterns”BDF: semi-static scheduling of special “patterns” if-then-elseif-then-else repeat-until, do-whilerepeat-until, do-while

General case: thread-based dynamic scheduling General case: thread-based dynamic scheduling (Parks ‘96: may not terminate, but never fails if feasible)(Parks ‘96: may not terminate, but never fails if feasible)

Page 88: Discrete Event

88

Example of Boolean DFExample of Boolean DF

Compute absolute value of average of Compute absolute value of average of nn samples samples

+1+1 ++

--

>n>n

TT FF TT FF

TT FF

TT FF

TT

TT FF

<0<0

TT FF

0000

InIn

OutOut

Page 89: Discrete Event

89

Example of general DFExample of general DF

Merge streams of multiples of 2 and 3 in order (removing duplicates)Merge streams of multiples of 2 and 3 in order (removing duplicates)

Deterministic mergeDeterministic merge(no “peeking”)(no “peeking”)

ordered

merge

* 2 dup1

* 3 dup1

A B

O

out

a = get (A)

b = get (B)

forever {

if (a > b) {

put (O, a)

a = get (A)

} else if (a < b) {

put (O, b)

b = get (B)

} else {

put (O, a)

a = get (A)

b = get (B)

}

}

Page 90: Discrete Event

90

Summary of DF networksSummary of DF networks

Advantages:Advantages:Easy to use (graphical languages)Easy to use (graphical languages)

Powerful algorithms forPowerful algorithms for verification (fast behavioral simulation)verification (fast behavioral simulation) synthesis (scheduling and allocation)synthesis (scheduling and allocation)

Explicit concurrencyExplicit concurrency

Disadvantages:Disadvantages:Efficient synthesis only for restricted modelsEfficient synthesis only for restricted models

(no input or output choice)(no input or output choice)

Cannot describe reactive control (blocking read)Cannot describe reactive control (blocking read)