cs294-6 reconfigurable computing day 23 november 10, 1998 stream processing
Post on 21-Dec-2015
215 views
TRANSCRIPT
CS294-6Reconfigurable Computing
Day 23
November 10, 1998
Stream Processing
Previously
• Computing Requirements
• SCORE– stream-based computing model– use streams for linking computations
• instead of shared memory locations
• expose parallelism
• freedom of sequential/spatial implementation
Today
• Streams moderately well developed for– sequential atoms in multithreaded/multiprocessor
environment
• General DF case• SDF• Expression• ...thoughts on adapting ideas for SCORE-like
execution
General Dataflow case
• Dataflow graph exposes parallelism
• Operators enabled as soon as data is available
• Captures partial ordering for computation
• Adaptive/tolerant to latencies in system
• => great for exposing parallelism
General Dataflow
• Fine-grained– expose maximum parallelism– …but rendevous/presence overhead for every
operator
• Who runs when is unpredictable– variable latencies– variable consumption/production– => force runtime synchronization/scheduling
General Dataflow
• What structure to exploit to reduce requirements?
General Dataflow
• What structure to exploit to reduce requirements?– Spatial operator locality
• most communication local (sequential)
– Operation blocks• only do dataflow presence on input to region of code• sequential/direct computation of subgraph
– all local/deterministic computations in subgraph
– Cyclic/predictable dataflow?
Dataflow <=>Multithreading
• Original DF: – synchronize per instruction
• Hybrid DF -> TAM– synchronize on remote memory access (msgs)– run scheduling quanta (several instructions)
• Multithreading– coarse-grain tasks– synchronize on input data– (also locking)
What to watch for
• With arbitrary I/O rates– unbounded buffering requirements
Synchronous Data Flow
• Restriction– number of tokens produced/consumed is
constant per operator firing– these numbers known at compile time– each edge has predetermined number of initial
tokens
• Consistent– admissible and periodic
SDF: Periodic
• Periodic– invoke each operator at least once– return to initial state (# tokens on each edge)– can determine by balance equations
SDF: Admissible
• Admissible– firing sequence not yield deadlock
SDF: Inadmissible
SDF: Admissible
Benefits
• Periodic schedules
• Bounded buffer requirements– Acyclic graphs
• optimal algorithm
– Cycle• NP-complete
• heuristic algorithm … close to optimal buffering
SDF Example
• By Balance Equations– 1 A, 2 B, 4 C
• Firing Sequences:– ABCBCCC
– ABCCBCC
– ABBCCCC
• Buffer Costs– 5 (AB=2 BC=3)
– 4 (AB=2 BC=2)
– 6 (AB=2 BC=4)
Scheduling (min buffer)
• F= fireable operator
• D=deferrable(F) = edge has enough tokens to fire sink
• While (F )– if ((F-D))
• fire from F-D
– else• fire operator which increases number of tokens least
Buffer Minimization
• Repeat– 1 A
– 2 B
– 4 C
• F={A}, D=– A
• F={B}, D=– B
• F={B,C},D={B}– C
• F={B,C},D={B}– C
• F={B}, D=– B
SDFBDF
• What is SDF missing?– Restricts range of expression– Allows static scheduling
SDFBDF
• Sufficient Addition:
SDFBDF
• BDF– SDF + switch and select operators
• BDF is Turing Complete
Expression: Block Diagram
Ptolemy example from Buck’94
Expression: Stream Language
• Function AveragePairs(D: Signal returns Signal)– stream integer [(D[0]+D[1])/2] ||
AveragePairs(stream_rest(D))
Ex: Dennis94
Convert to Static Data Flow
Composition of Stream Operators
• Function Process(D:ImageStream, w:integer returns MarkStream)– let
• R:=for I in 1,w return array of– FourForThree(AveragePairsD[I]))
• end for
– in • PeakDetect(TwoDimFilter(R,w))
– end let
• end function
Adapting
• How different?
Adapting
• How different?– Expensive to change operators– Possibility of spatial pipelining of operators
• Operator AT
• Operator copies
– Allow dynamic rates…• violate fixed firing
SDF: Timeslice
• Multiples of repetition/firing schedule– valid for acyclic graph– require greater buffering
SDF: Spatial
• Can realize spatially
• Repetition/firing schedule – gives relative throughput rates– simple cases => suggest Area-Throughput
points
Dynamic
• Note that adding switch/select gives general, dynamic dataflow
• Suggests can identify:– static regions (obey SDF restrictions)– dynamic boundaries (where dynamic operators exist)
• Static schedule static regions
• Dynamic control at boundary/invocation of static blocks
Dynamic Flow Rates
• Cannot schedule completely at compile time
• Use feedback to get expected flow rate– schedule like SDF– track data presence at dynamic boundaries– allow additional buffer space (overflow)– stall slower operator as necessary
• careful check possible deadlock conditions
Summary
• Stream datatype captures computational structure – good for spatial implementations– expose parallelism
• Rich experience in DF/DSP to exploit
• Static powerful where applicable
• Can still help schedule “mostly static” cases