three perspectives & two problems shivnath babu duke university
TRANSCRIPT
Three Perspectives &Two Problems
Shivnath BabuDuke University
Outline• I want to highlight two problems / thoughts• First some context
Three Perspectives
• The Cloud era is ringing in interesting changes• Increasingly overlapping roles• Joe Schmoe can now provision a 100-node Hadoop
cluster in minutes• Administrators in traditional roles are getting laid
off
System Designers /Developers
Users of theSystem
SystemAdministrators
Three Perspectives
• The Cloud era is ringing in interesting changes• Software abstractions / packing / release cycle
have changed• More visibility into how users use the software
System Designers /Developers
Users of theSystem
SystemAdministrators
Problem 1:
Automated Experiment-driven
System Management
Taking the (Next) Bite Out of
System Administration
• Cloud has automated some system administration tasks
• Can we automate others:• System tuning (configuration parameters, SQL
queries, MapReduce jobs)• Detecting and repairing data corruption (disaster
recovery)• Software /service testing
Database Performance Tuning2-dim Projection of a 11-dim Surface
MapReduce Job Tuning in Hadoop
2-dim Projection of a 13-dim Surface
Taking the (Next) Bite Out of
System Administration
• Cloud has automated some system administration tasks
• Can we automate others:• System tuning (configuration parameters, SQL
queries, MapReduce jobs)• Detecting and repairing data corruption (disaster
recovery)• Software /service testing
Data Corruption
• Stored data becomes different from what it is supposed to be• Bugs in software /
firmware• Alpha particles, bit rot • Human mistakes
• Bad things have happened• Data loss• System unavailability• Incorrect results
Stored Data
Applications
File-System
Storage
Database
Taking the (Next) Bite Out of
System Administration
• Cloud has automated some system administration tasks
• Can we automate others:• System tuning (configuration parameters, SQL
queries, MapReduce jobs)• Detecting and repairing data corruption (disaster
recovery)• Software /service testing
Key Insight: Need to Run “Experiments”• System tuning:• Running workload under
various system settings• Detecting data corruption:• Running integrity checks
to verify data correctness
• Software /service testing:• Running the tests
Stored Data
Applications
File-System
Storage
Database
Challenge: Where / How / When to run experiments?
Cloud is Part of the Answer• Take snapshots of
production data at low overhead
• Fire up production-like instances of the system• Pay-as-you-go, elasticity
• Run the experimentsProduction Data
Applications
File-System
Storage
Database
Applications
File-System
Storage
Database
Data on system for doing experiments
Power of Experiments to the People
Resources
Declarative Language
Plan optimizedsequence of expts
Conduct exptsautomatically
Declarativebenchmarking
& tuning
Protectingagainst datacorruption
Problem 2:
Data-Parallel Computing for the Masses
Challenges• Joe Schmoe can now provision a 100-node
Hadoop cluster in minutes. Is that enough?• Joe may need to answers to:
o How many reduce tasks to use in MapReduce job J for getting the best perf. on my 8-node production cluster?
o My current cluster needs more than 6 hours to process 1 day’s worth of data. Want to reduce that to under 3 hours. How many and what type of Amazon EC2 nodes to use?
Performance Vs. Price Tradeoff
m1.small m1.large m1.xlarge0
2000
4000
6000
8000
10000
12000
2 nodes 4 nodes 6 nodes
Node Type on Amazon EC2
Exe
cuti
on T
ime
(sec
)
m1.small m1.large m1.xlarge$0.00
$1.00
$2.00
$3.00
$4.00
$5.00
$6.00
2 nodes 4 nodes 6 nodes
Node Type on Amazon EC2
Cos
t ($
)
SpectrumDatabaseSystems
SQL
Known data-accesspatterns
Fixed set ofoperators
Cost-based optimizers,What-if engines
GridComputing
Python / R / Java
Unknown data-accesspatterns
Black-boxfunctions
Newer Data-Parallel
Systems
Starfish: Self-Tuning Analytics on Big Data
What-if Engine
Workflow-level tuning
Workflow-aware Optimizer/Scheduler
Workload-level tuning
Workload Optimizer Elastisizer
Data ManagerMetadata
Mgr.Intermediate
Data Mgr.Data Layout & Storage Mgr.
Just-in-Time Optimizer
Profiler
Job-level tuning
Sampler
MapReduce Job Tuning in Hadoop
True Surface Estimated Surface
Summary• Three perspectives: Developer, User, &
Administrator• Two problems:• Automated Experiment-driven System
Management• Data-Parallel Computing for the Masses