dryad and dataflow systems
DESCRIPTION
Dryad and dataflow systems. Michael Isard [email protected] Microsoft Research 4 th June, 2014. Talk outline. Why is dataflow so useful? What is Dryad? An engineering sweet spot Beyond Dryad Conclusions. Computation on large datasets. Performance mostly efficient resource use - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/2.jpg)
Talk outline• Why is dataflow so useful?• What is Dryad?• An engineering sweet spot• Beyond Dryad• Conclusions
![Page 3: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/3.jpg)
Computation on large datasets• Performance mostly efficient resource use• Locality• Data placed correctly in memory hierarchy
• Scheduling• Get enough work done before being interrupted
• Decompose into independent batches• Parallel computation• Control communication and synchronization
• Distributed computation• Writes must be explicitly shared
![Page 4: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/4.jpg)
Computational model• Vertices are independent• State and scheduling
• Dataflow very powerful• Explicit batching and communication
Processingvertices
Channels
Inputs
Outputs
![Page 5: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/5.jpg)
Why dataflow now?• Collection-oriented programming model• Operations on collections of objects• Turn spurious (unordered) for into foreach• Not every for is foreach
• Aggregation (sum, count, max, etc.)• Grouping• Join, Zip
• Iteration
• LINQ since ca 2008, now Spark via Scala, Java
![Page 6: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/6.jpg)
int SortKey(KeyValuePair<string,int> x){ return x.count;}
int SortKey(void* x){ return (KeyValuePair<string,int>*)x->count;}
Given some lines of text, find the most commonly occurring words.
1. Read the lines from a file2. Split each line into its constituent words3. Count how many times each word appears4. Find the words with the highest counts
1. var lines = FS.ReadAsLines(inputFileName);2. var words = lines.SelectMany(x => x.Split(‘ ‘));3. var counts = words.CountInGroups();4. var highest =
counts.OrderByDescending(x => x.count).Take(10);
Type inference
Collection<KeyValuePair<string,int>>
Lambda expressions
Generics and extension methods
FooCollection FooTake(FooCollection c, int count) { … }
Well-chosen syntactic sugar
red,2blue,4
yellow,3
red
red
blue
blueblue blueyellow
yellowyellow
Collection<T> Take(this Collection<T> c, int count) { … }
![Page 7: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/7.jpg)
Collections compile to dataflow• Each operator specifies a single data-parallel step• Communication between steps explicit• Collections reference collections, not individual objects!• Communication under control of the system
• Partition, pipeline, exchange automatically
• LINQ innovation: embedded user-defined functions var words = lines.SelectMany(x => x.Split(‘ ‘));• Very expressive• Programmer ‘naturally’ writes pure functions
![Page 8: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/8.jpg)
Distributed sortingvar sorted = set.OrderBy(x => x.key)
range partition by key
sort locally
sorted
set
sample
compute histogram
![Page 9: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/9.jpg)
Quiet revolution in parallelism• Programming model is more attractive• Simpler, more concise, readable, maintainable
• Program is easier to optimize• Programmer separates computation and communication• System can re-order, distribute, batch, etc. etc.
![Page 10: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/10.jpg)
Talk outline• Why is dataflow so useful?• What is Dryad?• An engineering sweet spot• Beyond Dryad• Conclusions
![Page 11: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/11.jpg)
What is Dryad?• General-purpose DAG execution engine ca 2005• Cited as inspiration for e.g. Hyracks, Tez
• Engine behind Microsoft Cosmos/SCOPE• Initially MSN Search/Bing, now used throughout MSFT
• Core of research batch cluster environment ca 2009• DryadLINQ• Quincy scheduler• TidyFS
![Page 12: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/12.jpg)
What Dryad does• Abstracts cluster resources• Set of computers, network topology, etc.
• Recovers from transient failures• Rerun computations on machine or network fault• Speculate duplicates for slow computations
• Schedules a local DAG of work at each vertex
![Page 13: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/13.jpg)
Scheduling and fault tolerance• DAG makes things easy• Schedule from source to sink in any order• Re-execute subgraph on failure• Execute “duplicates” for slow vertices
![Page 14: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/14.jpg)
Scheduling and fault tolerance• DAG makes things easy• Schedule from source to sink in any order• Re-execute subgraph on failure• Execute “duplicates” for slow vertices
![Page 15: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/15.jpg)
Scheduling and fault tolerance• DAG makes things easy• Schedule from source to sink in any order• Re-execute subgraph on failure• Execute “duplicates” for slow vertices
![Page 16: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/16.jpg)
Scheduling and fault tolerance• DAG makes things easy• Schedule from source to sink in any order• Re-execute subgraph on failure• Execute “duplicates” for slow vertices
![Page 17: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/17.jpg)
Resources are virtualized• Each graph vertex is a process• Writes outputs to disk (usually)• Reads inputs from upstream nodes’ output files
• Graph generally larger than cluster RAM• 1TB partitioned input, 250MB part size, 4000 parts
• Cluster is shared• Don’t size program for exact cluster• Use whatever share of resources are available
![Page 18: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/18.jpg)
Integrated system• Collection-oriented programming model (LINQ)• Partitioned file system (TidyFS)• Manages replication and distribution of large data
• Cluster scheduler (Quincy)• Jointly schedule multiple jobs at a time• Fine-grain multiplexing between jobs• Balance locality and fairness
• Monitoring and debugging (Artemis)• Within job and across jobs
![Page 19: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/19.jpg)
Dryad Cluster Scheduling
R
Scheduler
![Page 20: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/20.jpg)
Dryad Cluster Scheduling
R
R
Scheduler
![Page 21: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/21.jpg)
Quincy without preemption
![Page 22: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/22.jpg)
Quincy with preemption
![Page 23: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/23.jpg)
Dryad features• Well-tested at scales up to 15k cluster computers• In heavy production use for 8 years
• Dataflow graph is mutable at runtime• Repartition to avoid skew• Specialize matrices dense/sparse• Harden fault-tolerance
![Page 24: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/24.jpg)
Talk outline• Why is dataflow so useful?• What is Dryad?• An engineering sweet spot• Beyond Dryad• Conclusions
![Page 25: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/25.jpg)
Stateless DAG dataflow• MapReduce, Dryad, Spark, …• Stateless vertex constraint hampers performance• Iteration and streaming overheads
• Why does this design keep repeating?
![Page 26: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/26.jpg)
Software engineering• Fault tolerance well understood• E.g., Chandy-Lamport, rollback recovery, etc.
• Basic mechanism: checkpoint plus log• Stateless DAG: no checkpoint!• Programming model “tricked” user• All communication on typed channels• Only channel data needs to be persisted• Fault tolerance comes without programmer effort• Even with UDFs
![Page 27: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/27.jpg)
Talk outline• Why is dataflow so useful?• What is Dryad?• An engineering sweet spot• Beyond Dryad• Conclusions
![Page 28: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/28.jpg)
What about stateful dataflow?• Naiad• Add state to vertices• Support streaming and iteration
• Opportunities• Much lower latency• Can model mutable state with dataflow
• Challenges• Scheduling• Coordination• Fault tolerance
![Page 29: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/29.jpg)
Batch processing
Stream processing
Graph processing
Timely dataflow
![Page 30: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/30.jpg)
Batching Streamingvs.
Requires coordination Supports aggregation
No coordination needed Aggregation is difficult
(synchronous) (asynchronous)
![Page 31: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/31.jpg)
Batch DAG execution
Centralcoordinator
![Page 32: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/32.jpg)
Streaming DAG execution
![Page 33: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/33.jpg)
Streaming DAG execution
Inlinecoordination
![Page 34: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/34.jpg)
Batch iteration
Centralcoordinator
![Page 35: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/35.jpg)
Streaming iteration
![Page 36: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/36.jpg)
Messages
B C D
B.SENDBY(edge, message, time)
C.ONRECV(edge, message, time)
Messages are delivered asynchronously
![Page 37: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/37.jpg)
Notifications
B C D
D.NOTIFYAT(time)
D.ONNOTIFY(time)
Notifications support batching
C.SENDBY(_, _, time)
No more messages at time or earlierD.ONRECV(_, _, time)
![Page 38: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/38.jpg)
Coordination in timely dataflow• Local scheduling with global progress tracking• Coordination with a shared counter, not a scheduler• Efficient, scalable implementation
![Page 39: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/39.jpg)
32K tweets/s
10 queries/s
Interactive graph analysis
In ⋈
#x
@y
z?
⋈max
⋈
![Page 40: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/40.jpg)
Query latency
30000 35000 40000 45000 500001
10
100
1000
Experiment time (s)
Quer
y la
tenc
y (m
s)
32 8-core 2.1 GHz AMD Opteron16 GB RAM per serverGigabit Ethernet
Max: 140 ms99th percentile: 70 msMedian: 5.2 ms
![Page 41: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/41.jpg)
Mutable state• In batch DAG systems collections are immutable• Functional definition in terms of preceding subgraph
• Adding streaming or iteration introduces mutability• Collection varies as function of epoch, loop iteration
![Page 42: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/42.jpg)
Key-value store as dataflowvar lookup = data.join(query, d => d.key, q => q.key)
• Modeled random access with dataflow… • Add/remove key is streaming update to data• Look up key is streaming update to query
• High throughput requires batching• But that was true anyway, in general
![Page 43: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/43.jpg)
What can’t dataflow do?• Programming model for mutable state?• Not as intuitive as functional collection manipulation
• Policies for placement still primitive• Hash everything and hope
• Great research opportunities• Intersection of OS, network, runtime, language
![Page 44: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/44.jpg)
Talk outline• Why is dataflow so useful?• What is Dryad?• An engineering sweet spot• Beyond Dryad• Conclusions
![Page 45: Dryad and dataflow systems](https://reader033.vdocuments.net/reader033/viewer/2022050802/56816683550346895dda2998/html5/thumbnails/45.jpg)
Conclusions• Dataflow is a great structuring principle• We know good programming models• We know how to write high-performance systems
• Dataflow is the status quo for batch processing• Mutable state is the current research frontier
Apache 2.0 licensed source on GitHubhttp://research.microsoft.com/en-us/um/siliconvalley/projects/BigDataDev/