from static scheduling - ciceseusuario.cicese.mx/~chernykh/papers/15281.andrei tchernykh...•...

Algorithms and Scheduling Techniques to Manage Resilience and

Power Consumption in Distributed Systems

Andrei TchernykhCICESE Research Center,

Ensenada, Baja California, México

[email protected]

http://usuario.cicese.mx/~chernykh/

Dagstuhl – July 7, 2015

From Static Scheduling Towards Understanding Uncertainty

Baja California,México

Ensenada, Baja California, México

Sonora

Yaqui deer dancer

Research Areas

HPC

Cloud Computing

SchedulingResource optimization

online offline

Real T

ime S

yste

ms

Gri

d C

om

pu

tin

g

Knowledge Free

Scheduling with

Uncertainty

Multiobjective

Optimization

Computational

Intelligence

List Scheduling Stealing

Scheduling with

Service Levels

Approximation

Algorithms

Workflow Scheduling

Collaboration

Mexico

USA

Uruguay

SpainFrance

Russia

Luxembourg

Germany Universidad Autónoma de Baja CaliforniaUniversidad Autónoma de Nuevo LeónTecnológico de MonterreyInstituto Tecnológico de MoreliaCentro de Estudios Superiores del Estado

de Sonora

Dortmund UniversityProf. Uwe Schwiegelshohn

University of GöttingenProf. Ramin Yahyapour

Institute for System Programming, RASProf. Arutyun AvetisyanProf. Nikolay Kuzurin

Moscow Institute of Physics and TechnologyProf. Alexander Drozdov

Institute of Informatics and Applied Mathematics of Grenoble Prof. Denis TrystramINRIA Lille - Nord EuropeProf. El-ghazali Talbi

University of Notre Dame Dr. Jarek Nabrzyski

University of California –Irvine, CA, USA

Prof. Isaac Scherson, Prof. Jean Luc Gaudiot

University of LuxembourgProf. Pascal BouvryDr. Dzmitry Kliazovich

Universidad de la RepúblicaDr. Sergio Nesmachnow

BSCProf. Vassil Alexandrov

http://www.uci.edu/

http://www.uci.edu/

http://www.cs.sunysb.edu/

http://www.cs.sunysb.edu/

CICESE Parallel Computing Laboratory

Team

8

9CICESE Parallel Computing Laboratory

Towards Understanding Uncertainty in

Cloud Computing Resource

Provisioning

Andrei Tchernykh CICESE Research Center, Mexico

Uwe Schwiegelshohn University of Dortmund, Germany

El-ghazali Talbi University of Lille, France

Vassil Alexandrov Barcelona Supercomputing Centre, Spain

ICCS-SPU 2015. Procedia Computer Science, Elsevier, 2015


Uncertainty

Can be classified in several different ways according to their

nature:

1. Long-term uncertainty is due to the object is poorly understood and

inadvertent factors can influence its behavior.

2. Retrospective uncertainty is due to the lack of information about the

behavior of the object in the past.

3. Technical uncertainty is a consequence of the impossibility of

predicting the exact results of decisions

4. Stochastic uncertainty is a result of probabilistic (stochastic) nature of

the studied processes and phenomena.

• there is a reliable statistical information;

• statistical information is not available;

• hypothesis on the stochastic nature requires verification.

Tychinsky 2006


Uncertainty

5. Constraint uncertainty - partial or complete ignorance of the

conditions.

6. Participant uncertainty - conflict of main stakeholders: cloud

providers, users and administrators.

• own preferences, incomplete, inaccurate information about the

motives and behavior of opposing parties.

7. Goal uncertainty

• inability to select one goal

• conflicts in building multi objective optimization model.

• competing interests

8. Condition uncertainty occurs when a failure or a complete lack of

information about the conditions under which decisions are made.


Uncertainty

9. Action uncertainty occurs when there is no ambiguity when choosing

solutions.

• Single objective case

o determine the best solution among all feasible ones;

• In multiple objective case,

o there exists a (possibly infinite) number of Pareto optimal

solutions.

o There is the problem of finding a good element of this set.


Uncertainty

Can be grouped into:

parameter (parametric) uncertainties

1. arise from the incomplete knowledge and variation of the

parameters

2. estimated using statistical techniques

system uncertainties.

1. arise from an incomplete understanding of the processes that

control service provisioning

2. incomplete information about a system


Services and resources

are subject to considerable uncertainty during provisioning.

Uncertainty brings additional challenges to

• End-users

• Resource providers

• Brokering

It requires

• waiving habitual computing paradigms

• adapting current computing models

• designing novel resource management strategies to handle

uncertainty in an effective way

Uncertainty in Clouds

The question is:

How to deliver scalable and robust cloud behavior under uncertainties

and specific constraints, such as budgets, QoS, SLA, energy costs; etc.


• dynamic elasticity

• dynamic performance changing

• virtualization, loosely coupling application to the infrastructure

• resource provisioning time variation

• inaccuracy of application runtimes, variation of processing times

• variation in data transmission, variable data streams,

• release time and workload uncertainty

• effective bandwidth variation,

and other phenomenon.

Sources of uncertainty

• workload is not predictable and can be changed dramatically

• performance can be changed due to sharing of common resources

with other VM


Providers might not know the

• Quantity of transmitted data

• Amount of computation

Example:

Every time when a user requires a status of his e-mail or bank

account, it could generate

• different amount of data and

• take different time for delivering.



It is impossible to get exact knowledge about the system.

Parameters such as

• effective processor speed,

• number of available processors,

• actual bandwidth

are changing over the time.

Topology is unknown

In general, an execution environment will differ for each

program/service invocation.




Da

ta (

vo

lum

e, va

rie

ty,

va

lue

)V

irtu

aliz

atio

n

Jo

bs a

rriv

al

Mig

ratio

n

En

erg

y m

inim

iza

tio

n

Fa

ult to

lera

nce

Scala

bili

ty

Cost (d

ynam

ic p

ricin

g)

Resourc

e a

vaila

bili

ty

Ela

sticity

Co

nso

lida

tio

n

Co

mm

un

ica

tio

nR

ep

lica

tio

n

Clo

ud

in

fra

str

uctu

reR

eso

urc

e p

rovis

ion

ing

tim

e

Clo

ud

co

mp

uti

ng

pa

ram

ete

rs

Effective performance ● ● ● ● ● ● ● ● ● ● ● ●

Effective bandwidth ● ● ● ● ● ● ● ● ● ● ● ●

Processing time ● ● ● ● ● ● ● ● ● ● ● ●

Available memory ● ● ● ● ● ● ● ● ● ● ● ●

Number of processors ● ● ● ● ● ● ● ● ● ●

Available storage ● ● ● ● ● ● ● ● ● ●

Data transfer time ● ● ● ● ● ●

Resource capacity ● ● ● ● ● ● ● ●

Network capacity ● ● ● ● ● ● ●

Source of uncertainty


Approaches

To treat uncertainly and dynamism we need sophisticated

solutions.

• Fuzzy,

• Robust,

• Non-clairvoyant

• Knowledge-free

• Stochastic

• Randomized algorithms

• Dynamic priority

• Adaptive strategies (reactive)

• Dynamic load balancing

Preliminary results

Scheduling for Cloud Computing with

Different Service Levels

IPDPS 2012, IEEE 26th International Parallel and Distributed Processing Symposium



Quality of Service

CICESE Parallel Computing Laboratory 22

Deadline Service Level (slack factor) Execution time

Profit

Response time in relation to the requested processing time

price per time unit

Competitive

Factor

Obtained Income

Optimal income


Competitive Factor

Competitive Factor


SSL-SM 𝝆 ≤ 𝟏 − (𝟏 −𝒑𝒎𝒊𝒏

𝒑𝒎𝒂𝒙)𝟏

𝒇

SSL-MM𝝆 ≤

𝒇

𝟏 + 𝒇(𝟏 −𝒑𝒎𝒊𝒏𝒑𝒎𝒂𝒙

)

Das Gupta and Palis, 2001

Schwiegelshohn,Tchernykh 2012

Competitive Factor


𝝆 ≤ 𝒎𝒂𝒙{

𝒑𝒎𝒊𝒏𝒑𝒎𝒂𝒙

𝒇𝑰 − 𝟏,𝒇𝑰 − 𝟏 +

𝒑𝒎𝒊𝒏𝒑𝒎𝒂𝒙

𝒇𝑰 − 𝟏 +𝒖𝑰𝒖𝑰𝑰

MSL-SM

MSL-MM 𝝆 ≤𝒖𝑰𝑰𝒖𝑰

(𝟏 −𝟏

𝒇𝑰)

Schwiegelshohn,Tchernykh 2012

On-line Scheduling in Distributed Systems

Multiple strip packing

Job Stealingnon-clairvoyant



Ramin Yahyapour University of Göttingen, Germany

IEEE IPDPS 200ß

Any machine applies a priority order when selecting jobs for execution:

Jobs of its group A

Jobs of its group B

Jobs that are enabled for execution on its previous machine.


Grid Scheduling Algorithm

Uwe Schwiegelshohn

• Theoretical evaluation

– Cmax(LIST)/Cmax* < 3 in the offline case

– Cmax(LIST)/Cmax* < 5 in the online case


Performance of the Algorithm

IEEE IPDPS, 2008

(Klaus Jansen, Denis Trystram et. al…) 5/2, 7/3, 2 + ε, 2 –approximations

Improved by …

On-line Scheduling in Distributed Systems

Multiple strip packing

Adaptive Admissible Allocation

Future Generation Computer Systems 2012

Journal of Scheduling, 2010

Andrei Tchernykh CICESE Research Center

José Luis González-García Mexico

Vanessa Miranda-López

Uwe Schwiegelshohn University of Dortmund

Germany

Ramin Yahyapour University of Göttingen

Germany


…

m1 m2 m3 m4 m5 mm

first(Jj) = 2

last(Jj) = m

M-available

M-admis

last(Jj) = 5

If last is the minimum r such that

m

jfirstii

r

jfirstii

jj

mam)()(

Allocation


For a set of machines with identical processors, and for a set of rigid jobs

with admissible range

Approximation factor (off-line)

Min_LB-a + Best_PS

10 a

Adaptive optimization

)1(

21

21

2

aa

a

mf

rf

mf

rf

m

map

m

map

,

,

,

,

0

0

ara

ara

)1(

23

23

2

aa

a

mf

rf

mf

rf

m

map

m

map

,

,

,

,

0

0

ara

ara

Competitive factor (on-line)

Min_LB-a + Best_PS

Tchernykh, et al 2012

Future Generation Computer Systems, Elsevier

Tchernykh, et al 2010

Journal of Scheduling, Springer

Time

Cmax(LIST)=4


List Scheduling in the Grid

Machines with different numbers of processors

a=1

100% 100%

Time

Machines with different numbers of processors

Cmax(LIST)=2


Admissible Allocation in Grid

a=0.5a=1

Theoretical Evaluation


Gridscheduling

On-line

No clarivoyant

Different machine sizes

Off-line

No clarivoyant

Clarivoyant

Equal machine sizes

Differentmachine sizes

(Schwiegelshon et al. 2008) 3--approximation

(Pascual et al. 2008)4--approximation

(Klaus Jansen, Denis Trystram) 5/2, 7/3, 2 + ε, 2 –approximation(Zhuk et al. 2004) 10--approximation(Tchernykh et al. 2005) 10—approximation(Tchernykh et al. 2012) 3—approximation

(Tchernykh et al. 2008) 5-competitive(Tchernykh et al. 2010) 17-competitive(Schwiegelshon 2010) (2e+1)-competitive(Tchernykh et al. 2012) 5-competitive

• Future Generation Computer Systems, Elsevier

• Journal of Scheduling, Springer

• Discrete Applied Mathematics, Elsevier

• Tran Fund Elec, Comm. & Comp. Science, IEICE

• Parallel and Distributed Processing, IEEE

• Computers & Industrial Engineering, Elsevier

Job Allocation Strategies with User Run

Time Estimates

Journal of Grid Computing , Springer, 2011

Juan Manuel Ramírez


José Luis González Mexico

Adán Hirales-Carbajal

Uwe Schwiegelshohn University of DortmundGermany


Germany

Multiple Workflow Scheduling

Strategies

with User Run Time Estimates

on a Grid


Adán Hirales-Carbajal


José Luis González Mexico

Juan Manuel Ramírez

Thomas Röblitz University of DortmundGermany


Germany

Adaptive Resource Allocation

in Computational Grids with

Runtime Uncertainty


Raul Ramírez-Velarde Tecnológico de MonterreyCarlos Barba-Jimenez MéxicoJuan Nolazco

Adán Hirales-Carbajal CETYS University, Mexico

Model uses notion of • heavy-tails• self-similarityfor the predictability of run-time estimate

Energy-Aware Online Scheduling:

Ensuring Quality of Service for

IaaS Clouds


Luz Lozano


Johnatan Pecero University of Luxembourg, Luxembourg

Pascal Bouvry

Sergio Nesmachnov Universidad de la República, Uruguay

Alexander Yu. Drozdov Moscow Institute of Physics and Technology


Solution space, Pareto optimal solutions


0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 1 2 3 4 5 6 7 8 9

Inco

me d

eg

rad

ati

on

Power consumption degradation

Adaptive energy efficient scheduling

in Peer-to-Peer desktop gridsKnowledge Free Scheduling

Future Generation Computer Systems. 2013


Aritz Barrondo

Johnatan E. Pecero University of Luxembourg, Luxembourg

Elisa Schaeffer Universidad Autónoma de Nuevo León,Mexico


Work Queue with Replication (WQR)

Time

Resources

Time+Resources

OurGrid, BOINC

SETI@home, folding@home, Rosetta@home

Einstein@home,

+50 projects

A VoIP Service for Cloud

Infrastructure


Jorge Mario Cortez

Johnatan E. PeceroPascal Bouvry University of Luxembourg, LuxembourgAna-Maria SimionoviciDzmitry Kliazovich

Loic Didelot MIXvoip S.A. Luxembourg

Denis Trystram Grenoble institute of Technology, France

Problem


Two objectives:

- Provider cost optimization

- Voice Quality

Bin-packing approach (well-known)

• one-dimensional, on-line

• classic NP-hard optimization problem

The principal novelty

• state of the bin is determined not only

by actions of the decision maker during

item allocations,

• but also by item completions after their

lifespan.

Unlike in standard formulation,

• bins are always open

• dynamic

• items in bins can be terminated (call

termination)

• utilization can be changed

Cloud Infrastructure Cost

Optimization:

to buy or to lease

Uwe Schwiegelshohn University of DortmundStephan Schlagkamp Germany


Fermin Armenta Mexico

Cloud Provider Cost

Our objective is to:

• avoid overprovisioning

• find the resource capacity of the private cloud

• minimize total investment and leasing costs with respect to the

demand forecast



Modeling applications with

communications and uncertainty

Dzmitry Kliazovich

Johnatan E. Pecero University of Luxembourg, Luxembourg

Pascal Bouvry


Samee U. Khan North Dakota State University, U.S.A.

Albert Y. Zomaya University of Sydney, Australia

• IEEE CLOUD 2013 - IEEE 6th International Conference on Cloud Computing.

• Journal of Grid Computing , Springer, 2015


Modeling Applications

• Proposed CA-DAG: Communication-Aware DAG model– Two types of vertices: one for

computing and one for communications

– Edges define dependences between tasks and order of execution

• Main advantage– Allows separate resource allocation

decisions, assigning processors to handle computing jobs and network resources for information transmissions

1

3

2

4

Communication task

Computing task

Ordinary edge

Thanks for your attention!

Redmer Hoekstra

from static scheduling - ciceseusuario.cicese.mx/~chernykh/papers/15281.andrei tchernykh...•...

Documents