april 20, 2015 for big data analytics - harvard...
TRANSCRIPT
![Page 1: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/1.jpg)
The Stratosphere Platform for Big Data Analytics
Hongyao MaFranco Solleza
April 20, 2015
![Page 2: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/2.jpg)
Stratosphere
![Page 3: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/3.jpg)
Stratosphere
![Page 4: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/4.jpg)
Stratosphere
![Page 5: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/5.jpg)
Big Data Analytics
● “BIG Data”
● Heterogeneous datasets: structured / unstructured / semi-structured
● Users have different needs for declarativity and expressivity
![Page 6: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/6.jpg)
What we have covered so far
● Polybase
● Shark
● MLBase
● SharedDB
● BlinkDB
![Page 7: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/7.jpg)
![Page 8: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/8.jpg)
The Promises● Declarative, high-level language
● “In situ” data analysis
● Richer set of primitives than MapReduce
● Treat UDFs at first-class citizens
● Automated parallelization and optimization
● Support for iterative programs
● Includes external memory query processing algorithms to support arbitrarily long programs
![Page 9: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/9.jpg)
Outline
● Meteor & Sopremo
● PACT
● Nephele
● Experiment Results
● Future work & Discussions
![Page 10: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/10.jpg)
Sopremo
![Page 11: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/11.jpg)
Meteor Script
● Declarative interface● High level script
![Page 12: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/12.jpg)
Meteor Translates To SopremoOutput
Lineitem
Filter
ComputeRevenue
Join
Supplier
Group
![Page 13: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/13.jpg)
Sopremo
● Modular and extensible● Composable
![Page 14: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/14.jpg)
Sopremo compiled to PACTOutput
Lineitem
Filter
ComputeRevenue
Join
Supplier
Group
![Page 15: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/15.jpg)
PACT
![Page 16: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/16.jpg)
PACT● Programmer makes a “pact”
with system● Uses one of 5 functions
![Page 17: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/17.jpg)
PACT● Programmer makes a “pact”
with system● Uses one of 5 functions
Map Reduce Cross
Match Co-group
![Page 18: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/18.jpg)
PACT● Programmer makes a “pact”
with system● Uses one of 5 functions
Map Reduce Cross
Match Co-group
![Page 19: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/19.jpg)
PACT● Programmer makes a “pact”
with system● Uses one of 5 functions
Map Reduce Cross
Match Co-group
![Page 20: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/20.jpg)
PACT● Programmer makes a “pact”
with system● Uses one of 5 functions
Map Reduce Cross
Match Co-group
![Page 21: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/21.jpg)
What’s a PACT?
● Data and a function● Specifies how data are partitioned across the system● An atomic(?) operation on all specified data
![Page 22: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/22.jpg)
Iterative PACT Programs
![Page 23: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/23.jpg)
Iterative PACT Programs
● Implicitly, iteration mutates state
![Page 24: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/24.jpg)
Iterative PACT Programs
● Implicitly, iteration mutates state● How to do iteration without explicit
mutation of state?
![Page 25: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/25.jpg)
Iterative PACT Programs
● Bulk iteration
![Page 26: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/26.jpg)
Iterative PACT Programs
● Bulk iteration
Starts with a solution set
![Page 27: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/27.jpg)
Iterative PACT Programs
● Bulk iteration
Sends group by label to neighbors
![Page 28: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/28.jpg)
Iterative PACT Programs
● Bulk iteration
Find minimum among those neighbors
![Page 29: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/29.jpg)
Iterative PACT Programs
● Bulk iteration
Outputs an incremental solution set
![Page 30: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/30.jpg)
Iterative PACT Programs
● Bulk iteration
Incremental solution set becomes input to next iteration
![Page 31: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/31.jpg)
Iterative PACT Programs
● Bulk iteration
![Page 32: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/32.jpg)
Iterative PACT Programs
● Incremental iteration
![Page 33: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/33.jpg)
Iterative PACT Programs
● Incremental iteration
Starts with a work set, and a solution set
![Page 34: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/34.jpg)
Iterative PACT Programs
● Incremental iteration
Calculates the min for a group
![Page 35: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/35.jpg)
Iterative PACT Programs
● Incremental iteration
Merges work set with solution set and checks if label changed
![Page 36: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/36.jpg)
Iterative PACT Programs
● Incremental iteration
If the label is new, it becomes part of the delta set ..
![Page 37: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/37.jpg)
Iterative PACT Programs
● Incremental iteration
Which gets sent back to the next iteration
![Page 38: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/38.jpg)
Iterative PACT Programs
● Incremental iteration
If changed, also gets matched to the neighbors...
![Page 39: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/39.jpg)
Iterative PACT Programs
● Incremental iteration
And those matches become the new workset
![Page 40: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/40.jpg)
Iterative PACT Programs
● Incremental iteration
![Page 41: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/41.jpg)
PACT Optimization
![Page 42: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/42.jpg)
PACT Optimization
![Page 43: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/43.jpg)
PACT Optimization
![Page 44: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/44.jpg)
PACT Optimization
![Page 45: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/45.jpg)
PACT Optimization
![Page 46: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/46.jpg)
PACT Optimization
![Page 47: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/47.jpg)
PACT Optimization
![Page 48: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/48.jpg)
Nephele
![Page 49: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/49.jpg)
Nephele Execution
![Page 50: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/50.jpg)
Nephele Execution● Tasks, channels,
scheduling
![Page 51: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/51.jpg)
Nephele Execution● Tasks, channels,
scheduling
Tasks with all local pipelines associated with that task are pushed by to slaves
![Page 52: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/52.jpg)
Nephele Execution● Tasks, channels,
scheduling
Tasks can request to send data over network (only when necessary or ready)
![Page 53: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/53.jpg)
Nephele Execution● Fault tolerance
![Page 54: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/54.jpg)
Nephele Execution● Fault tolerance
Conceptually, follows the same concept as lineage (RDDs) but...
![Page 55: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/55.jpg)
Nephele Execution● Fault tolerance
Intermediate
Blocking operator model
![Page 56: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/56.jpg)
Nephele Execution● Fault tolerance
Intermediate
Non- Blocking operator model
![Page 57: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/57.jpg)
Nephele Execution● Runtime operators
![Page 58: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/58.jpg)
Does it deliver?
![Page 59: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/59.jpg)
Does it deliver?
● Maybe - what do the experiments say?● What’s old?
○ A lot of things
● What’s new?○ second-order functions that abstract parallelization○ optimization in a UDF-heavy environment○ Integrate iterative processing○ an extensible query language and underlying operator model
![Page 60: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/60.jpg)
Experimental Evaluation
![Page 61: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/61.jpg)
Experimental SetupSetup:
● 1 master + 25 slave machines● 16 cores @ 2.0Hz with 32GB of RAM (29GB of operating memory)● 80TB HDFS in plain ASCII, 4 SATA drives at 500MB/s read/write per node● 8 parallel tasks per slave, total DOP 40-200
Comparison with Hadoop
● Vanilla MapReduce engine● Apache Hive● Apache Giraph
![Page 62: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/62.jpg)
Summary of Results
● Stratosphere achieves linear speedup and similar performance to Hadoop for simple tasks (TeraSort, Word Count)
● Stratosphere beats Hive and Hadoop by 5 times for complicated tasks like TPC-H and triangle enumeration, though no gain from increasing DOP
● Stratosphere performed worse on Connected Components than Giraph due to the better tuned implementation of the latter
● Checkpointing adds little overhead and saves much time when failure occurs
![Page 63: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/63.jpg)
TeraSort --- Stratosphere v.s. HadoopStratosphere achieves similar performance as Hadoop and Linear Speedup
![Page 64: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/64.jpg)
Word Count --- Stratosphere v.s. HadoopStratosphere is 20% faster than Hadoop and achieves linear speedup
![Page 65: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/65.jpg)
Triangle Enumeration: Reducer 1
![Page 66: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/66.jpg)
Triangle Enumeration: Reducer 2
![Page 67: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/67.jpg)
Triangle Enumeration: PACT
![Page 68: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/68.jpg)
Triangle EnumerationStratosphere is 5x faster than Hadoop, though parallelism does not help
![Page 69: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/69.jpg)
TPC-H Query
![Page 70: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/70.jpg)
TPC-H --- Stratosphere v.s. HiveParallelism does not seem to help, however, Stratosphere is 5x faster
![Page 71: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/71.jpg)
Connected ComponentsGiraph is faster, due to better tuned implementation
![Page 72: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/72.jpg)
CC --- Execution time per superstep
![Page 73: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/73.jpg)
Fault ToleranceCheckpointing adds little overhead and saves much time when failure occurs
![Page 74: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/74.jpg)
What Else Do We Want to See? For presented experiments:
● Breakdown of execution time to distinguish bottlenecks● What happens with even smaller DOP?● What happens with more/less tasks on each core?
Further:
● What happens with even larger data? Current size does fit into RAM● Comparison with MPP, or split query processing systems like Polybase, or
Shark given the size of the tested data
![Page 75: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/75.jpg)
The Promises?● Declarative, high-level language
● “In situ” data analysis
● Richer set of primitives than MapReduce
● Treat UDFs at first-class citizens
● Automated parallelization and optimization
● Support for iterative programs
● Includes external memory query processing algorithms to support arbitrarily long programs
![Page 76: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/76.jpg)
Ongoing and Future Work● One-pass optimizer unifying PACT and sopremo layers
● Strengthening fault-tolerant capabilities
● Improving scalability and efficiency of Nephele
● Design, compilation and optimization of higher-level languages
● Scalable, efficient, and adaptive algorithms and architecture
● “Stateful” systems for fast ingestion and low-latency data analysis
![Page 77: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/77.jpg)
Discussions and Questions
● Declarativity - expressiveness tradeoff
○ More declarative -> less expressive, but easier to optimize
● Run-time optimization is the way to go?
○ Skewed data distribution may become a bottleneck for such systems
○ Detecting performance bottleneck on the fly
![Page 78: April 20, 2015 for Big Data Analytics - Harvard Universitydaslab.seas.harvard.edu/classes/cs265/files/...April 20, 2015. Stratosphere. Stratosphere. Stratosphere. Big Data Analytics](https://reader034.vdocuments.net/reader034/viewer/2022042709/5f514f9ae5f918157102b9d9/html5/thumbnails/78.jpg)
QEDTHANKS!