supporting time-critical event processing in grids and clouds

74
Supporting Time Critical Events Processing in Grids and Clouds Qian Zhu 1 Supporting Time- Critical Event Processing in Grids and Clouds Qian Zhu Advisor: Professor Gagan Agrawal

Upload: lieu

Post on 23-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Supporting Time-Critical Event Processing in Grids and Clouds. Qian Zhu Advisor: Professor Gagan Agrawal. Adaptive Applications. Earthquake modeling. Coastline forecasting. Medical systems. Time-Critical Event Processing Compute-intensive Time constraints - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Supporting Time-Critical Event Processing in Grids and Clouds

Supporting Time Critical Events Processing in Grids

and CloudsQian Zhu 1

Supporting Time-Critical Event

Processing in Grids and Clouds

Qian Zhu

Advisor: Professor Gagan Agrawal

Page 2: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds2

Adaptive Applications

Earthquake modelingCoastline forecasting Medical systems

• Time-Critical Event Processing- Compute-intensive- Time constraints- Application-specific flexibility- Application Quality of Service (QoS)

Page 3: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds3

Adaptive Applications (Cont’d)

Adaptive Applications that

perform time-critical event processing• Application-specific flexibility: parameter

adaptation• Trade-off between application QoS and execution time

HPC ApplicationsHPC Applications(compute-(compute-intensive)intensive)

HPC ApplicationsHPC Applications(compute-(compute-intensive)intensive)

• Aim at maximize performance• Do not consider adaptation

Deadline-drivenDeadline-drivenSchedulingScheduling

Deadline-drivenDeadline-drivenSchedulingScheduling

• Not very compute-intensive

Page 4: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds4

Motivating Application - Real-time Volume

Rendering• Interactively create a 2D projection of a

large time-varying 3D data set

• Application Flexibility

- Error tolerance (image quality)

- Image size

• Benefit definition (QoS metric)

- To view the rendered images from as many angles as possible

- For each view angle, display the image with the best resolution at the desired image size

Page 5: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds5

Motivating Application - Real-time Volume

endering•Example

(a) (b)

•How well can we do given 1 minute as the time constraint ?

Note: RMI data set from Lawrence Livermore National Laboratory

Page 6: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds6

Motivating Application - Great Lake Nowcasting and

Forecasting

•Monitor meteorological conditions of the Lake Erie for nowcasting and forecasting

1km

1km

Page 7: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds7

Motivating Application - Great Lake Nowcasting and

Forecasting• Application flexibility

- Resolution of grids

- Internal time step

- External time step

• Benefit definition (QoS metric)

- To predict the water level first

- To predict other meteorological information as much as possible

• How much meteorological information can we predict given 1 hour?

Page 8: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds8

Time Critical Event Processing

•Grid Computing Environment

- Geographically distributed

- Heterogeneous

- Unreliable

•Cloud Computing Environment

- On-demand resource availability

- Pay-as-you-go pricing model

Goal: Maximize the application benefit (QoS) while satisfying the pre-specified time

constraints

Page 9: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds9

Dissertation Overview

Adaptive applications that perform

time-critical event processing

Grid Cloud

Resource Allocation

Fault Tolerance

Resource Provisioning

Power Management

Parameter Adaptation

Scientific computingMobile applicationsParallel computing

Page 10: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds10

Challenges-- Parameter

Adaptation•A Large Number of Parameters to be

Adapted

- Discrete and continuous

- Correlations between parameters

•No Knowledge about the Impact of Such Parameters on Execution Time or Benefit

•Pre-specified Time Constraints

- Low adaptation overhead

Page 11: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds11

Challenges-- Resource

Allocation• Grid/Cloud: Heterogeneous and Dynamic Resources

• Resource Allocation Impacts Application Benefit

• A 20-min event from Volume Rendering application

Ben

efi

t V

alu

e

Resource Configuration

- Different CPU, Memory and/or Bandwidth Usage

• Different application components

• Different value of adaptive parameters

Page 12: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds12

Challenges-- Fault

Tolerance•Grid Resources

- Heterogenous and Unreliable

•Time Constraints

•Trade-off between Resource Efficiency and Reliability

•Effective, Low-overhead Failure Recovery

Page 13: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds13

Challenges-- Resource Budget

Constraints•Elastic Cloud Computing

- Pay-as-you-go model

•Satisfy the Application QoS with the Minimum Resource Cost

•Dynamic Resource Provisioning

- Dynamically varying application workloads

- Resource budget

Page 14: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds14

Contributions

• Parameter adaptation

- Q. Zhu and G. Agrawal (ICAC2008)

• Resource allocation

- Q. Zhu and G. Agrawal (IPDPS2009)

• Fault tolerance

- Q. Zhu and G. Agrawal (SC2009)

• Budget constrained resource provisioning

- Q. Zhu and G. Agrawal (HPDC 2010)

• Power-aware consolidation of workflows

- Q. Zhu, J. Zhu and G. Agrawal (submitted to SC2010)

Goal: Maximize the application benefit (QoS) while satisfying the pre-specified time

constraints

Page 15: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds15

Roadmap• Motivation and Introduction

• Parameter Adaptation in the Grid Environment

- Application model

- Autonomic adaptation algorithm

- Resource allocation in time-critical event processing

• Budget Constrained Resource Provisioning

• Power-aware Consolidation of Workflows

• Future Work

• Conclusion

Page 16: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds16

Contributions

• Develop an Autonomic Adaptation Algorithm

- Effectively adjust the parameters

- Low overhead

• Design of an Adaptive Middleware with Support of Easy Deployment of Applications in Grid Environments

• Consider Heterogeneous Resources

- Efficiency value definition

- Efficiency value estimation

- Greedy-based scheduling algorithm

Page 17: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds17

Application and Environment

Model

Temporal TreeConstruction

Service

CompressionService

Unit ImageRendering

Service

Decompression

ServiceImage

CompositionService

WSTP TreeConstruction

Service

• Volume Rendering application

Page 18: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds18

Algorithm OverviewGoal: Maximize the application benefit (QoS)

while satisfying the pre-specified time constraints

Input DataInput DataInput DataInput Data......

checkpoint checkpoint 11......

checkpoint checkpoint 22......

......checkpoint checkpoint

11......

checkpoint checkpoint 22......

• Train system model

• Learn the relationship between the values of adaptive parameters and execution time, application benefit

(collect data)

(collect data)

Normal Processing Phase

Input DataInput DataInput DataInput Data......

checkpoint checkpoint 11......

checkpoint checkpoint 22......

......checkpoint checkpoint

11......

checkpoint checkpoint 22......

• Apply the trained system model for parameter adaptation

(adjust parameters)

(adjust parameters)

Event Handling PhaseTime Time

ConstraintConstraintTime Time

ConstraintConstraint

Page 19: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds19

Parameter Adaptation to Optimal Control Model

•Adaptation Process

•Control Policy

- Policy with learning -- Reinforcement learning

ApplicatioApplicationn

ApplicatioApplicationn

ControlleControllerr

PerformancPerformancee

MeasureMeasureu(k)

D(k)w(k)

D(k)

Page 20: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds20

Resource Allocation

•Heterogeneous and Dynamic Resources

•Different CPU, Memory and/or Bandwidth Usage

- Different service components

- Different values of adaptive service parameters

•Schedule the Service Components to Maximize the Benefit Function Within the Time Constraint

Page 21: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds22

Efficiency Value

• Assign to and to yields the maximum benefit

• Our definition of efficiency value captures the suitability of different nodes for different services

• Definition

- Represent how efficient to execute a service on a node

- Consider application benefit and execution time

• Estimation

- Based on fuzzy logic

Page 22: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds23

Roadmap• Motivation and Introduction

• Parameter Adaptation in the Grid Environment

• Budget Constrained Resource Provisioning

- Background: Cloud environment

- Dynamic resource provisioning algorithm

- Framework Design

- Experimental evaluation

• Power-aware Consolidation of Workflows

• Future Work

• Conclusion

Page 23: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds24

Background: Cloud Environment• Amazon EC2, Google AppEngine, Microsoft Azure,

Magellan ...

• Utility-like Computing

- On-demand scalability of resources

• Resource Cost

- Pricing model: Pay-as-you-go

• Virtualization

- Resource sharing

- Customized deployment and easy migration

- Assumption: Fine-grained resource allocation (i.e., change CPU, memory on-the-fly) and pricing

Page 24: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds25

Background: Pricing Model

• Charged Fees

- Base price

- Transfer fee

• Linear Pricing Model

• Exponential Pricing Model

Base price charged for the smallest amount of CPU

cycles

Transfer fee for each CPU allocation change

CPU cycle at the ith allocation

Time duration at the ith allocation

Number of CPU cycle allocations

Page 25: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds26

Problem Description

• Adaptive Applications

- Adaptive parameters

- Benefit

- Time constraint

• Cloud Computing Environment

- Resource budget

- Overprovisioning/Underprovisioning

• Goal

- Maximize the application benefit while satisfying the time constraints and resource budget

Page 26: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds27

Contributions

•Dynamic Resource Provisioning Algorithm

- Based on multi-input-multi-output feedback control model

- Optimization to reduce provisioning overhead

•Adaptive and SOA Oriented Framework

- Support dynamic virtual CPU and memory allocation based on application requirements

Page 27: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds28

Approach Overview

Dynamic Dynamic Resource Resource

Provisioning Provisioning (feedback (feedback control)control)

Dynamic Dynamic Resource Resource

Provisioning Provisioning (feedback (feedback control)control)

Resource ModelResource Model(with (with

optimization)optimization)

Resource ModelResource Model(with (with

optimization)optimization)

• Resource Provisioning Controller

- Multi-input-multi-output (MIMO) feedback control model

- Modeling between adaptive parameters and performance metrics

- Control policy: reinforcement learning

• Resource Model

- Map change of parameters to change in CPU/memory allocations

- Optimization: avoid frequent resource changes

change to the

adaptive parameter

s

change to CPU/memor

yallocations

Page 28: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds29

Resource Provisioning Controller

Performance Performance MetricsMetrics

Performance Performance MetricsMetrics

Multi-Input-Multi-Input-Multi-Output Multi-Output

ModelModel

Multi-Input-Multi-Input-Multi-Output Multi-Output

ModelModel

Control Control PolicyPolicy

Control Control PolicyPolicy

00

• Satisfy time constraints and resource budget

00• Relationship

between adaptive parameters and performance metrics

00

• Decide how to change values of the adaptive parameters

00

Page 29: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds31

Control Model Formulation -- Performance Metrics

• Performance Metrics

- Processing progress: ratio between the currently obtained application benefit and the elapsed execution time

- Performance/cost ratio: ratio between the currently obtained application benefit and the cost of the resources that have been assigned

•Notation

Application benefit obtained at time step kElapsed execution time at time step kResource cost at time step k

Page 30: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds32

Control Model Formulation -- Multi-Input-Multi-Output Model• Auto-Regressive-Moving-Average with Exogenous

Inputs (ARMAX)

- Second-order model

- is ith adaptive parameter at time step k

- are updated at the end of every interval

Previous observed performance metricsPrevious and current values of adaptive parameters

Page 31: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds34

Control Model Formulation -- Control Policy• : Maximize Application Benefit

- Reinforcement learning (Q-Learning)

- Reward function

• : Minimize Control Overhead( )

- Proportional-Integral (PI) controller

• Update Parameter Values

Action taken at time step kApplication benefit, subject to the time and resource budget constraints

Page 32: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds35

Resource Model

• Offline Training

• Collect Data Points:

• Learn the Relationship Between the Values of the Parameters and CPU/memory Usage

• Model Optimization

- Avoid frequent change to CPU/memory allocations due to resource cost

- Balance global CPU/memory among multiple services

Page 33: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds36

Framework DesignApplicatioApplicatio

nnApplicatioApplicatio

nn

Virtualization Management (Eucalyptus, Open Virtualization Management (Eucalyptus, Open Nebular...)Nebular...)

Xen HypervisorXen Hypervisor

VMVM VMVM...

Xen HypervisorXen Hypervisor

VMVM VMVM...

Xen HypervisorXen Hypervisor

VMVM VMVM...

ServiceDeployment

ServiceWrapper

Resource ProvisioningController

Application Controller

ResourceModel

ModelOptimizer

PerformanceManager

PriorityAssignme

nt

StatusQuery

Performance

Analysis

Page 34: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds37

Experiments Setup

• Schemes Compared

- Work-conserving

- Static scheduling

• Metrics

- Benefit Percentage

- Resource Cost

• Emulated Cloud Environment

- Xen 3.0

- ,

- ,

Page 35: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds38

Resource Model Validation: Hardware Heterogeneity

• Our model predicts CPU cycle and memory usage within 3% comparing to the actual resource usage

• Model trained on homogeneous hardware (M1) and on heterogeneous hardware (M2 and M3)

Page 36: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds39

Performance of Dynamic Resource Provisioning Algorithm

• Considered both linear and exponential pricing models

• In linear pricing model, Our approach performs 24% worse than Work Conserving

Page 37: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds40

Performance of Dynamic Resource Provisioning Algorithm

• Work Conserving costs 66% more than our approach does

Page 38: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds41

Resource Provisioning Overhead

• Optimal Execution: ideal resource configurations

• Our approach performs 4%, 2%, 2%, 1% and 0.8% worse than the Optimal Execution

Page 39: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds42

Roadmap• Motivation and Introduction

• Parameter Adaptation in the Grid Environment

• Budget Constrained Resource Provisioning

• Power-aware Consolidation of Workflows

- Opportunities for consolidation

- Workload analysis

- Consolidation algorithm

- Experimental Evaluation

• Future Work

• Conclusion

Page 40: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds43

Motivation

• Another Critical Issue in Cloud Environment: Power Management

- HPC servers consume a lot of energy

- Significant adverse impact on the environment

• To Reduce Resource and Energy Costs

- Server consolidation

- Minimize the total power consumption and resource costs without a substantial degradation in performance

Page 41: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds44

Problem Description

• Our Target Applications

- Workflows with DAG structure

- Multiple processing stages

- Opportunities for consolidation

• Research Problems

- Combine parameter adaptation, budget constraints and resource allocation with consolidation and power optimization

- Challenge: consolidation without parameter adaptation

- Support power-aware parameter adaptation -- future work

Page 42: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds46

Contributions

•A power-aware consolidation framework, pSciMapper, based on hierarchical clustering and an optimization search method

•pSciMapper is able to reduce the total power consumption by up to 56% with a most a 15% slowdown for the workflow

•pSciMapper incurs low overhead and thus suitable for large-scale scientific workflows

Page 43: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds47

Opportunities for Consolidation: GLFS

• GLFS nowcasts and forecasts meteorological information for Lake Erie

• GLFS is compute-intensive

• Individual tasks could incur low resource usage

Page 44: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds48

Resource Usage of GLFS Task1

<1000, 6, 600>

<500, 3, 600>

<2000, 12, 1200> <1000, 6, 600>

Page 45: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds49

Resource Usage of GLFS Task2

Page 46: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds50

Observations

•Periodic Behavior w.r.t. CPU, memory, disk, and network usage: Time Series

•Average Resource Usage is Significantly Smaller than its Peak Value

•Dependent on the Values of the Application Parameters and the Characteristics of the Host Server

Page 47: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds51

Power Consumption Analysis

•Resource Usage Activity

- CPU, memory, disk and network

•Server Consolidation

- Virtualization

- Interference of consolidated workloads

Page 48: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds52

Power Consumption Analysis: Resource Usage

• All resource activities impact power consumption

• Variation in the CPU utilization has the largest impact

• Memory footprint and cache activities also impact the consumed power

Workload CPUMemor

yDisk

Network

CPU-bound

Vary 2% None None

Memory-bound

70% Vary None None

Disk-bound

50% 2% Vary None

Network-bound

50% 2%18MB

/sVary

Page 49: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds53

Power Consumption Analysis: Virtualization

• Virtualization incurs very low power overhead

• Contention of CPU cycles

- Dynamic CPU provisioning saves power

Page 50: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds54

Power Consumption Analysis: Interference

• Consolidating dissimilar workloads incur a small slowdown in the execution time and large savings in power and resource costs

• Consolidating workloads with similar resource requirements significantly increase the execution time

Page 51: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds55

The pSciMapper Framework Design

Offline Analysis Online ConsolidationScientific Scientific WorkflowsWorkflowsScientific Scientific WorkflowsWorkflows

Resource Usage Resource Usage GenerationGeneration

Resource Usage Resource Usage GenerationGeneration

Temporal Feature Temporal Feature ExtractionExtraction

Temporal Feature Temporal Feature ExtractionExtraction

Feature Feature ReductionReduction

and Modelingand Modeling

Feature Feature ReductionReduction

and Modelingand Modeling

Time Series

KnowledgeKnowledgebasebase

Temporal Signatures

model

Hierarchical Hierarchical ClusteringClustering

Hierarchical Hierarchical ClusteringClustering

Optimization Optimization SearchSearch

AlgorithmAlgorithm

Optimization Optimization SearchSearch

AlgorithmAlgorithmTime Time

VaryingVaryingResource Resource ProvisioninProvisionin

gg

Time Time VaryingVarying

Resource Resource ProvisioninProvisionin

gg

ConsolidatedWorkloads

Page 52: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds56

Temporal Feature Extraction•Relate Resource Usage to Power

Consumption

•Temporal Signature

- Peak value: max value of the time series

- Relative variance: normalized sample variance

- Pattern: a sequence of samples to represent the pattern

•Notation

Page 53: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds57

Kernel Canonical Correlation Analysis (KCCA)

• 52 Features from Temporal Signature

- 12 features for CPU, memory, disk and network

- 4 features representing the host capacity

• resource-time and resource-power Relationships

Page 54: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds58

Power-aware Consolidation

• Distance Metric

• Algorithm

- Initial one-to-one assignment

- Generate resource usage time series (HMM)

- Merge clusters

- Optimal assignment (Nelder-Mead algorithm)

- Dynamic CPU provisioning

distance between task i and jinterference of consuming resource R1 and R2 togetherPearson’s correlation between two workloads w.r.t. the resource usage of R1 (10 pairs in total)

Page 55: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds59

Example

C1

CPU: moderateMem: lowDisk: lowNet: low

C2

CPU: moderateMem: lowDisk: lowNet: moderate

C3

CPU: moderateMem: highDisk: highNet: low

C4

CPU: highMem: moderateDisk: lowNet: low

C5

CPU: lowMem: lowDisk: highNet: moderate

C1Level 1 C2 C3 C4 C5{C1,S2}, {C2,S3}, {C3,S5}, {C4,S1}, {C5,S4}

{(C1, C2), S2}, {C3,S5}, {(C4,C5), S1}

{(C1, C2, C3),S2}, {(C4,C5), S1}

Level 2

Level 3

Level 4Assignment <power, time>

<180.56, 92.87>

<135.11, 88.03>

<93.62, 83.93>

X

Page 56: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds60

Experiments Setup• Algorithms Compared

- Without Consolidation

- Optimal + Work Conserving

- pSciMapper + Static Allocation

- pSciMapper + Dynamic Provisioning

• Metrics

- Normalized total power consumption

- Execution time

• Emulated Cloud Environment

- Xen 3.0

- GridSim: a grid environment simulator

- CloudSim: a cloud environment simulator

• Power Modeling

Page 57: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds61

Applications

• Two Real-world Workflows

- GLFS and VR

• Three Synthetic Workflows

Application

CPU Memory DiskNetwor

k

GLFS HighModerat

eModera

teNone

VRModerat

eHigh

Moderate

Moderate

SynApp1 Low Low High High

SynApp2Moderat

eHigh

Moderate

Low

SynApp3 HighModerat

eLow Low

Page 58: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds62

Normalized Total Power Consumption Comparison: GLFS

• Four different combinations of application parameters

• Total power is saved up to 27% by Optimal and pSciMapper + Dynamic Provisioning is able to save up to 35%

Page 59: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds63

Normalized Total Power Consumption Comparison: VR and Synthetic Workflows

• In VR, total power is saved up to 58% by Optimal. pSciMapper + Dynamic Provisioning is 8% worse

Page 60: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds64

Execution Time Comparison: GLFS

• Optimal stops when performance degradation is 15%

• pSciMapper + Dynamic Provisioning performs 12% worse comparing to Without Consolidation

Page 61: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds65

Execution Time Comparison: VR and Synthetic Workflows

Page 62: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds66

Scheduling Overhead and Scalability

• The overhead caused by pSciMapper + Dynamic Provisioning is much smaller than Optimal

• pSciMapper is suitable to large-scale scientific workflows

Page 63: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds67

Roadmap

•Motivation and Introduction

•Parameter Adaptation in the Grid Environment

•Budget Constrained Resource Provisioning

•Power-aware Consolidation of Workflows

•Future Work

•Conclusion

Page 64: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds68

Schedule Service Components in Parallel

•Service Components can be Parallelized

- One-to-many mapping to processing nodes

•Degree of Parallelism

- Adaptive parameters

•How Does Degree of Parallelism Impact Parameter Adaptation

•How to Schedule Multiple Instances of Certain Service Components

Page 65: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds69

Power-aware Adaptation

•Adaptive Parameters Impact Application QoS and Execution Time

•Different Resource Usage Lead to Different Levels of Power Consumption

•Co-hosting Service Components Incur Performance Interference

•How can we achieve the required application quality with the minimum power consumption?

Page 66: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds70

Performance Modeling

•Detailed Performance Analysis

- Feedback to the application user

- Identify the performance bottleneck

•Help Understand the Application Behavior that is Dependent on Adaptive Parameters

•How to Determine the Factors that Limit Application Performance Accurately

Page 67: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds71

Large-Scale Optimization

• Peta-scale Applications

- Scientific computing

• A Large Number of Parameters

- Continuous and discrete

- Correlation

- Unstructured vs. structured search space

• How to Efficiently Explore the Large Parameter Space

Page 68: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds72

Roadmap

•Motivation and Introduction

•Parameter Adaptation in the Grid Environment

•Budget Constrained Resource Provisioning

•Power-aware Consolidation of Workflows

•Future Work

•Conclusion

Page 69: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds73

Conclusion

• An Autonomic Adaptation Algorithm and an Adaptive Middleware

• In Grid Computing Environment

- An efficient resource allocation approach

- An effective fault tolerance scheme

• In Cloud Computing Environment

- A dynamic resource provisioning framework

- pSciMapper: power-aware consolidation framework

Goal: Maximize the benefit (QoS) of adaptive applications while satisfying the pre-specified time

constraints

Page 70: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds74

Thank You!

Page 71: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds75

Related Work: Parameter Adaptation

•Autonomic Adaptation

- Lim et al. (CCNC06), Valetto et al. (ICAC05), Ruth et al. (ICAC06)

•Autonomic Computing Middleware

- AutoMate(Vanderbilt), Q-Fabric (Georgia Tech.)

•Reinforcement Learning in Autonomic Computing

- Tesauro et al. (ICAC06)

Page 72: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds76

Related Work: Resource Allocation

•Resource Allocation in Grid Computing

- Singh et al. (HPDC07), Huang et al. (SC07)

- Xu et al. (ICAC07)

•Real-Time Scheduling

- Survey: Sha et al. (Real-time Systems 04)

- Gopalan et al. (MMCN02), Ghosh et al. (Cluster06)

Page 73: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds77

Related Work: Resource Provisioning•Cloud Computing Systems

- Amazon EC2, Google AppEngine, Microsoft Azure, Eucalyptus (UCSD)

•Virtualized Resource Scheduling

- Diao et al. (ACC02), Padala et al. (EuroSys07,09)

•Scheduling with Budget Constraints

- Garg et al. (ACSC09), Sakellariou et al. (GRID07)

Page 74: Supporting Time-Critical Event Processing in Grids and Clouds

Qian Zhu

Supporting Time Critical Events Processing in Grids

and Clouds78

Related Work: Power-aware Consolidation• Scientific Workflow Scheduling

- Pegasus (USC), Kelper (UCSB), ASKALON (Innsbruck)

• Power Management

- Dynamic Voltage and Frequency Scaling (DVFS)

• Wang et al. (HPCA08), Govandin et al. (EuroSys 09), Laszewski et al. (Cluster09)

- Consolidation

• Srikantaiah et al. (HotPower08), Verma et al. (USENIX09)