cs294-6 reconfigurable computing day 23 november 10, 1998 stream processing

32
CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

CS294-6Reconfigurable Computing

Day 23

November 10, 1998

Stream Processing

Page 2: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Previously

• Computing Requirements

• SCORE– stream-based computing model– use streams for linking computations

• instead of shared memory locations

• expose parallelism

• freedom of sequential/spatial implementation

Page 3: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Today

• Streams moderately well developed for– sequential atoms in multithreaded/multiprocessor

environment

• General DF case• SDF• Expression• ...thoughts on adapting ideas for SCORE-like

execution

Page 4: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

General Dataflow case

• Dataflow graph exposes parallelism

• Operators enabled as soon as data is available

• Captures partial ordering for computation

• Adaptive/tolerant to latencies in system

• => great for exposing parallelism

Page 5: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

General Dataflow

• Fine-grained– expose maximum parallelism– …but rendevous/presence overhead for every

operator

• Who runs when is unpredictable– variable latencies– variable consumption/production– => force runtime synchronization/scheduling

Page 6: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

General Dataflow

• What structure to exploit to reduce requirements?

Page 7: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

General Dataflow

• What structure to exploit to reduce requirements?– Spatial operator locality

• most communication local (sequential)

– Operation blocks• only do dataflow presence on input to region of code• sequential/direct computation of subgraph

– all local/deterministic computations in subgraph

– Cyclic/predictable dataflow?

Page 8: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Dataflow <=>Multithreading

• Original DF: – synchronize per instruction

• Hybrid DF -> TAM– synchronize on remote memory access (msgs)– run scheduling quanta (several instructions)

• Multithreading– coarse-grain tasks– synchronize on input data– (also locking)

Page 9: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

What to watch for

• With arbitrary I/O rates– unbounded buffering requirements

Page 10: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Synchronous Data Flow

• Restriction– number of tokens produced/consumed is

constant per operator firing– these numbers known at compile time– each edge has predetermined number of initial

tokens

• Consistent– admissible and periodic

Page 11: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Periodic

• Periodic– invoke each operator at least once– return to initial state (# tokens on each edge)– can determine by balance equations

Page 12: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Admissible

• Admissible– firing sequence not yield deadlock

Page 13: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Inadmissible

Page 14: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Admissible

Page 15: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Benefits

• Periodic schedules

• Bounded buffer requirements– Acyclic graphs

• optimal algorithm

– Cycle• NP-complete

• heuristic algorithm … close to optimal buffering

Page 16: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF Example

• By Balance Equations– 1 A, 2 B, 4 C

• Firing Sequences:– ABCBCCC

– ABCCBCC

– ABBCCCC

• Buffer Costs– 5 (AB=2 BC=3)

– 4 (AB=2 BC=2)

– 6 (AB=2 BC=4)

Page 17: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Scheduling (min buffer)

• F= fireable operator

• D=deferrable(F) = edge has enough tokens to fire sink

• While (F )– if ((F-D))

• fire from F-D

– else• fire operator which increases number of tokens least

Page 18: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Buffer Minimization

• Repeat– 1 A

– 2 B

– 4 C

• F={A}, D=– A

• F={B}, D=– B

• F={B,C},D={B}– C

• F={B,C},D={B}– C

• F={B}, D=– B

Page 19: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDFBDF

• What is SDF missing?– Restricts range of expression– Allows static scheduling

Page 20: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDFBDF

• Sufficient Addition:

Page 21: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDFBDF

• BDF– SDF + switch and select operators

• BDF is Turing Complete

Page 22: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Expression: Block Diagram

Ptolemy example from Buck’94

Page 23: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Expression: Stream Language

• Function AveragePairs(D: Signal returns Signal)– stream integer [(D[0]+D[1])/2] ||

AveragePairs(stream_rest(D))

Ex: Dennis94

Page 24: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Convert to Static Data Flow

Page 25: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Composition of Stream Operators

• Function Process(D:ImageStream, w:integer returns MarkStream)– let

• R:=for I in 1,w return array of– FourForThree(AveragePairsD[I]))

• end for

– in • PeakDetect(TwoDimFilter(R,w))

– end let

• end function

Page 26: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Adapting

• How different?

Page 27: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Adapting

• How different?– Expensive to change operators– Possibility of spatial pipelining of operators

• Operator AT

• Operator copies

– Allow dynamic rates…• violate fixed firing

Page 28: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Timeslice

• Multiples of repetition/firing schedule– valid for acyclic graph– require greater buffering

Page 29: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

SDF: Spatial

• Can realize spatially

• Repetition/firing schedule – gives relative throughput rates– simple cases => suggest Area-Throughput

points

Page 30: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Dynamic

• Note that adding switch/select gives general, dynamic dataflow

• Suggests can identify:– static regions (obey SDF restrictions)– dynamic boundaries (where dynamic operators exist)

• Static schedule static regions

• Dynamic control at boundary/invocation of static blocks

Page 31: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Dynamic Flow Rates

• Cannot schedule completely at compile time

• Use feedback to get expected flow rate– schedule like SDF– track data presence at dynamic boundaries– allow additional buffer space (overflow)– stall slower operator as necessary

• careful check possible deadlock conditions

Page 32: CS294-6 Reconfigurable Computing Day 23 November 10, 1998 Stream Processing

Summary

• Stream datatype captures computational structure – good for spatial implementations– expose parallelism

• Rich experience in DF/DSP to exploit

• Static powerful where applicable

• Can still help schedule “mostly static” cases