big data platforms mihai budiu, oct 6 2014. my work ph.d. from carnegie mellon, 2003 hardware...
TRANSCRIPT
![Page 1: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/1.jpg)
Big Data Platforms
Mihai Budiu
, Oct 6 2014
![Page 2: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/2.jpg)
2
My work• Ph.D. from Carnegie Mellon, 2003• Hardware synthesis• Reconfigurable hardware• Compilers and computer architecture
• Researcher at Microsoft Research Silicon Valley 2004-2014• Computer security• Cloud computing infrastructure:
• distributed computation platforms • monitoring and debugging• performance analysis
• Big data analysis and visualization • Large scale machine learning
![Page 3: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/3.jpg)
3
500 Years Ago
Tycho Brahe(1546-1601)
Johannes Kepler(1571-1630)
![Page 4: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/4.jpg)
4
The Laws of Planetary Motion
Tycho’s measurements Kepler’s laws
![Page 5: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/5.jpg)
5
The Large Hadron Collider
25 PB/year WLHC Grid: 200K computing cores
![Page 6: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/6.jpg)
6
Genetic Code
![Page 7: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/7.jpg)
7
Astronomy
![Page 8: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/8.jpg)
8
Weather
![Page 9: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/9.jpg)
9
The Webs
Internet
Facebook friends graph
![Page 10: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/10.jpg)
10
Big Data
![Page 11: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/11.jpg)
11
Big Computers
![Page 12: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/12.jpg)
12
Talk Outline
• Motivation• Dryad: A distributed runtime• DryadLINQ: A compiler for Dryad• Tools and applications• Sketch: A billion-row spreadsheet
![Page 13: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/13.jpg)
13
Design Space
Throughput(batch)
Latency(interactive)
Internet
Datacenter
Data-parallel
Sharedmemory
DryadSearch
HPC
Grid
Transaction
Sketch
![Page 14: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/14.jpg)
14
Dryad• Eurosys 2007• Continuously deployed in
Microsoft since 2006• Execution engine of Bing
analytics• > 105 machines•Many PB of data analyzed daily
Dryad painting by Evelyn de Morgan
![Page 15: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/15.jpg)
15
Dryad = Execution Layer
Job (application)
Dryad
Cluster
Pipeline
Shell
Machine≈
![Page 16: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/16.jpg)
16
2-D Piping• Unix Pipes: 1-D
grep | sed | sort | awk | perl
• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50
![Page 17: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/17.jpg)
17
Virtualized 2-D Pipelines
![Page 18: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/18.jpg)
18
Virtualized 2-D Pipelines
![Page 19: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/19.jpg)
19
Virtualized 2-D Pipelines
![Page 20: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/20.jpg)
20
Virtualized 2-D Pipelines
![Page 21: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/21.jpg)
21
Virtualized 2-D Pipelines• 2D DAG• multi-machine• virtualized
![Page 22: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/22.jpg)
22
Dryad Job Structure
grep
sed
sortawk
perlgrep
grepsed
sort
sort
awk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
![Page 23: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/23.jpg)
23
Dryad System Architecture
Files, TCP, FIFO, Networkjob schedule
data plane
control plane
NS,Sched RE RERE
V V V
job manager cluster
![Page 24: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/24.jpg)
GM code
vertex code
Staging1. Build
2. Send .exe
3. Start manager
5. Generate graph
7. Serializevertices
8. MonitorVertex execution
4. Querycluster resources
Nameserver6. Initialize vertices
Remoteexecutionservice
![Page 25: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/25.jpg)
25
Talk Outline
• Motivation• Dryad: A distributed runtime• DryadLINQ: A compiler for Dryad• Tools and applications• Sketch: A billion-row spreadsheet
![Page 26: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/26.jpg)
26
Distributed Collections
Partition
Collection
.Net objects
![Page 27: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/27.jpg)
27
LINQ
Dryad
=> DryadLINQ
![Page 28: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/28.jpg)
28
LINQ = .Net+ Queries
Collection<T> collection;bool IsLegal(Key);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
![Page 29: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/29.jpg)
29
Collection<T> collection;bool IsLegal(Key k);string Hash(Key);
var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value};
DryadLINQ = LINQ + Dryad
C#
collection
results
C# C# C#
Vertexcode
Queryplan(Dryad job)Data
![Page 30: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/30.jpg)
30
Language Summary
WhereSelectGroupByOrderByAggregateJoin
![Page 31: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/31.jpg)
31
Very expressive
var result = input.SelectMany(r => Mapper(r)) .GroupBy(r => Key(r)) .Select(g => Reducer(g));
Map-Reduce
Distributed sorting
Iterative machine-learning (EM)
![Page 32: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/32.jpg)
32
Talk Outline
• Motivation• Dryad: A distributed runtime• DryadLINQ: A compiler for Dryad• Tools and applications• Sketch: A billion-row spreadsheet
![Page 33: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/33.jpg)
33
Debugging DryadLINQ jobs
![Page 34: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/34.jpg)
34
Distributed performance counters
![Page 35: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/35.jpg)
35
Training Kinect
Depth map Body parts
Classifier
Xbox GPU
![Page 36: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/36.jpg)
36
Learn from Many Examples
DecisionTree
Classifier
Machine learning
![Page 37: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/37.jpg)
37
Talk Outline
• Motivation• Dryad: A distributed runtime• DryadLINQ: A compiler for Dryad• Tools and applications• Sketch: A billion-row spreadsheet
![Page 38: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/38.jpg)
Bandwidth hierarchy
![Page 39: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/39.jpg)
39
Principles
• Visualizations are bounded data displays• All computations are sketches
• Sketch is a runtime for (1) running streaming (sketching) algorithms(2) implementing visualizations with bounded data renderings
![Page 40: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/40.jpg)
40
Streaming algorithms
• Sketches = randomized streaming algorithms • Input = set of size n• Result same independent of the order• Memory = O(log(n))• Multi-pass
• Linear input transformations
![Page 41: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/41.jpg)
4 billion rows on 155 machines
![Page 42: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/42.jpg)
42
Spreadsheet operations• Browsing/scrolling• Filtering• Using predicates• Heavy hitters• Sampling
• Searching• Sorting• Computing new columns• Set operations (intersection, union, etc.)• Charting
![Page 43: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/43.jpg)
Histograms
![Page 44: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/44.jpg)
Heat Maps
![Page 45: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/45.jpg)
Sketch distributed service
45data
Sketchservice
data
Sketchservice
data
Sketchservice
data
Sketchservice
![Page 46: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/46.jpg)
46
DataSets = distributed objects
Network
46
Client
Servers
DataSet<T>
Application
T T T T T T T T T T T
![Page 47: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/47.jpg)
47
Sketch Spreadsheet architecture
DataSet<Table>
SQL Server CSV Files Column store Cosmos Storage layer
Table operations
GUI
Distributed objects
Spreadsheet logic
Spreadsheet display
![Page 48: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/48.jpg)
48
DataSet API
interface IDataSet<T> { IDataSet<S> Map<S>(Func<T,S> f); IDataSet<Pair<T,S>> Zip(IDataSet<S> other); R Sketch(ISketch<T, R> sketch);}interface ISketch<T,R> {
R Create(T data);R Combine(List<R> parts);
}
![Page 49: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/49.jpg)
49
DataSet Implementations
Application
Network
Client Parallel
Proxy Proxy
GUI
Parallel
Local Local Local Local
Parallel
Local Local
Parallel
Datasetinterface
Rack aggregation
Core parallelism
Cluster parallelism
RMI layer
Proxy
ref ref ref
Parallel
Server 0 Server 1 Server n
Rack 0 Rack r
Address space
T T T T T T
![Page 50: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/50.jpg)
Proxy
Local Local
Parallel
Proxy
Local Local
Parallel
T T S Sff
Map(f)
![Page 51: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/51.jpg)
51
Sketch(s)
Proxy
Local Local
ParallelR R
R
R
s.Combine
T T
s.Create
interface ISketch<T,R> {R Create(T data);R Combine(List<R> parts);
}
![Page 52: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/52.jpg)
52
Zip
Proxy
Local Local
Parallel
Proxy
Local Local
Parallel
T T S S
Proxy
Local Local
Parallel
T,S T,S
![Page 53: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/53.jpg)
53
Histograms
CDF
2Dhistogram
![Page 54: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/54.jpg)
54
Compute
Computing a histogram
Client
Server 1
Server n
Histogram
1D + 2Dcomposit
esketch
Datarangesketch
Render
Displayhistogra
m
User click tr th
ta
![Page 55: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/55.jpg)
55
Some numbers
• Window Server 2012 R2 • 8-core 2.1GHz
AMD Opteron 2373 EE • > 16GB RAM• 3 x 1TB disks using RAID-0• 155 machines • 5 racks • 1Gbps Ethernet
![Page 56: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/56.jpg)
56
1 2 4 8 16 24 32 64 128
155
0
100
200
300
400
500
600 No aggregation network
With aggregation network
Null Sketch
Machines
Tim
e (m
s)
![Page 57: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/57.jpg)
57
Histogram computation
• 26M rows/machine• Scale-out
1 2 4 8 16 24 32 64 128
155
0200400600800
1000120014001600
machines
Tim
e (m
s)
![Page 58: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/58.jpg)
58
Conclusions
• Big data is here to stay• Better tools are needed• Quest for high-level abstractions for
building distributed systems• Execution graphs• Distributed collections• Higher-order transformations• Distributed stateful objects• Sketching algorithms
![Page 59: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/59.jpg)
59
![Page 60: Big Data Platforms Mihai Budiu, Oct 6 2014. My work Ph.D. from Carnegie Mellon, 2003 Hardware synthesis Reconfigurable hardware Compilers and computer](https://reader033.vdocuments.net/reader033/viewer/2022050714/56649cf35503460f949c12d2/html5/thumbnails/60.jpg)
Execution
Application
Data-Parallel Computation
60
Storage
Language
Map-Reduce
GFSBigTable
CosmosAzure
SQL Server
Dryad
DryadLINQScope
Sawzall,FlumeJava
Hadoop
HDFSS3
Pig, Hive≈SQL LINQ, SQLSawzall, Java