optimizing mapreduce provisioning in the cloud

25
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota IBM Almaden Research Center

Upload: eagan

Post on 26-Feb-2016

56 views

Category:

Documents


2 download

DESCRIPTION

Optimizing MapReduce Provisioning in the Cloud. Michael Cardosa, Aameek Singh†, Himabindu Pucha †, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota † IBM Almaden Research Center. MapReduce Provisioning Problem. Platform: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Optimizing MapReduce Provisioningin the Cloud

Michael Cardosa, Aameek Singh†,Himabindu Pucha†, Abhishek Chandra

http://www.cs.umn.edu/~cardosa

Department of Computer Science, University of Minnesota

†IBM Almaden Research Center

Page 2: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

MapReduce Provisioning Problem Platform:

Virtualized Cloud Environment, which enables

Virtualized MapReduce Clusters Several MapReduce Jobs from different

users Goal: Optimize system-wide metrics, such

as: throughput, energy, load distribution, user costs

Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2

Page 3: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

MapReduce Platform: Hadoop Open-source implementation of MapReduce

distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google)

InputData

Page 4: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Hadoop Clusters

4

Distributed data Replicated chunks

Distributed computation Map/reduce tasks

Traditional: Dedicated physical nodes

Page 5: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Virtual Hadoop Clusters

5

Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce =

Hadoop+AmazonEC2

Server Pool

VM Pool

Hadoop Processes

Page 6: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for

Provisioning Building Blocks for MapReduce

Provisioning Case Study: Performance optimization Case Study: Energy optimization

6

Page 7: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Spatio-Temporal Insights for Provisioning

Initial Focus: Energy Savings Goal: Minimize energy usage

Energy+cooling ~ 42% of total cost [Hamilton08]

Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)

7

Page 8: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

VM Placement: Spatial Fit

8

Job 1 Job 2 Job 3 Job 4

Co-Place complementary

workloads

Page 9: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Which placement is better?

9

20min

10min

100min

20min20min

20min

SHUTDOWN SHUTDOWN

A B

Page 10: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Time Balancing

10

20 25

90

20 25 20 25

20 25

30

20 25

30

20 25

30

Time Balance

Page 11: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Building Blocks for Provisioning

11

Objective-drivenresource provisioning

MapReduce Jobs

Jobprofiling

Clusterscaling Migration

Cloud Execution Environment

Initial Provisioning Continuous Optimization

Page 12: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Building Blocks for Provisioning Job Profiling: MapReduce job runtime

estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling

Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling

model Migration: Useful for continuous

optimization Load balancing, VM consolidation

12

Page 13: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Job Profiling: Runtime Estimation Based on Number of VMs

13

Page 14: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Job Profiling: Runtime Estimation Based on Input Data Size

14

Page 15: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Job Profiling: Runtime Estimation Online Profiling: Additional refinement

15

Page 16: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Cluster Scaling Increasing allocated resources (typical):

Add additional VMs to join virtualized Hadoop cluster

Job performance increases, runtime decreases

E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:

Performance

16

Page 17: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Cluster Scaling: Time Balancing

17

20 25

90

20 25 20 25

20 25

30

20 25

30

20 25

30

Time Balance

Page 18: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for

Provisioning Building Blocks for MapReduce

Provisioning Case Study: Performance optimization Case Study: Energy optimization

18

Page 19: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Case Study: Performance & Deadlines

Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet

deadline if necessary Monitoring: Use offline profiling to estimate

number of VMs needed based on past performance

Actuation: Online profiling: Trigger points to invoke cluster scaling

19

Page 20: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Case Study: Energy Savings Goal: Minimize energy consumption from

the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost

[Hamilton08] Pass energy savings on to users

Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)

20

Page 21: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Case Study: Energy Savings Use Job Profiling to place similar-runtime

VMs together for initial provisioning Use Job Profiling to adjust number of

VMs in each cluster to adjust runtimes if needed

Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling

Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning

21

Page 22: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Conclusion Framework: Building blocks (STEAMEngine)

for the optimization of MapReduce provisioning from a cloud service provider perspective

Preliminary evaluations to validate usefulness of each building block

Approaches for applying building blocks to meet specific goals, e.g. performance, energy

22

Page 23: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Thank you! Questions?

23

Page 24: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Job Profiling: Runtime Estimation Based on Number of VMs

24

Page 25: Optimizing MapReduce Provisioning in the Cloud

University of Minnesota

Cluster Scaling Increasing allocated resources (typical):

Add additional VMs to join virtualized Hadoop cluster

Job performance increases, runtime decreases

E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:

Performance

25