grid computing grid systems and scheduling grid computing april 20 th, 2009

66
Grid Computing Grid Systems and scheduling Grid Computing April 20 th , 2009

Post on 19-Dec-2015

253 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

Grid Computing

Grid Systems and scheduling

Grid Computing April 20 th, 2009

Page 2: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

2

Grid systems• Many!!!• Classification: (depends on the author)

– Computational grid: • distributed supercomputing (parallel application execution on

multiple machines)• high throughput (stream of jobs)

– Data grid: provides the way to solve large scale data management problems

– Service grid: systems that provide services that are not provided by any single local machine.

• on demand: aggregate resources to enable new services• Collaborative: connect users and applications via a virtual

workspace• Multimedia: infrastructure for real-time multimedia applications

Page 3: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

3

Taxonomy of Applications Distributed supercomputing consume CPU cycles

and memory

High-Throughput Computing unused processor cycles

On-Demand Computing meet short-term requirements for resources that cannot be cost-effectively or conveniently located locally.

Data-Intensive Computing

Collaborative Computing enabling and enhancing human-to-human interactions (eg: CAVE5D system supports remote, collaborative exploration of large geophysical data sets and the models that generated them)

Page 4: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

4

Alternative classification

• independent tasks

• loosely-coupled tasks

• tightly-coupled tasks

Page 5: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

5

Application Management

• Description

• Partitioning

• Mapping

• Allocation

partitioning

mapping

allocation

grid node B

Application

grid node A

management

Page 6: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

6

Description

• Use a grid application description language

• Grid-ADL and GEL– One can take advantage of loop construct to

use compilation mechanisms for vectorization

Page 7: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

7

Grid-ADL

Traditional systems

alternative systems

1

6

2 5

1

6

2 5..

Page 8: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

8

Partitioning/Clustering

• Application represented as a graph– Nodes: job– Edges: precedence

• Graph partitioning techniques:– Minimize communication– Increase throughput or speedup– Need good heuristics

• Clustering

Page 9: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

9

Graph Partitioning

• Optimally allocating the components of a distributed program over several machines

• Communication between machines is assumed to be the major factor in application performance

• NP-hard for case of 3 or more terminals

Page 10: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

10

Collapse the graph

• Given G = {N, E, M}• N is the set of Nodes• E is the set of Edges• M is the set of

machine nodes

Page 11: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

11

Dominant Edge

• Take node n and its heaviest edge e

• Edges e1,e2,…er with opposite end nodes not in M

• Edges e´1,e´

2,…e´k with

opposite end nodes in M• If w(e) ≥ Sum(w(ei)) +

max(w(e´1),…,w(e´

k))• Then the min-cut does

not contain e• So e can be collapsed

Page 12: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

12

Machine Cut

• Let machine cut Mi be the set of all edges between a machine mi and non-machine nodes N

• Let Wi be the sum of the weights of all edges in the machine cut Mi

• Wi’s are sorted so W1 ≥ W2 ≥ …• Any edge that has a

weight greater than W2 cannot be part of the min-cut

Page 13: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

13

Zeroing

• Assume that node n has edges to each of the m machines in M with weights

w1 ≤ w2 ≤ … ≤ wm

• Reducing the weights of each of the m edges from n to machines M by w1 doesn’t change the assignment of nodes for the min-cut

• It reduces the cost of the minimum cut by (m-1)w1

Page 14: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

14

Order of Application

• If the previous 3 techniques are repeatedly applied on a graph until none of them are applicable:

• Then the resulting reduced graph is independent of the order of application of the techniques

Page 15: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

15

Output

• List of nodes collapsed into each of the machine nodes

• Weight of edges connecting the machine nodes

• Source: Graph Cutting Algorithms for Distributed Applications Partitioning, Karin Hogstedt, Doug Kimelman, VT Rajan, Tova Roth, and Mark Wegman, 2001

• homepages.cae.wisc.edu/~ece556/fall2002/PROJECT/distributed_applications.ppt

Page 16: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

16

Graph partitioning• Hendrickson and Kolda, 2000: edge cuts:

– are not proportional to the total communication volume

– try to (approximately) minimize the total volume but not the total number of messages

– do not minimize the maximum volume and/or number of messages handled by any single processor

– do not consider distance between processors (number of switches the message passes through, for example)

– undirected graph model can only express symmetric data dependencies.

Page 17: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

17

Graph partitioning

• To avoid message contention and improve the overall throughput of the message traffic, it is preferable to have communication restricted to processors which are near to each other

• But, edge-cut is appropriate to applications whose graph has locality and few neighbors

Page 18: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

18

Kwok and Ahmad, 1999: multiprocessor scheduling taxonomy

Page 19: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

19

List Scheduling• make an ordered list of processes by assigning them some

priorities• repeatedly execute the following two steps until a valid schedule

is obtained:– Select from the list, the process with the highest priority for

scheduling. – Select a resource to accommodate this process.

• priorities are determined statically before the scheduling process begins. The first step chooses the process with the highest priority, the second step selects the best possible resource.

• Some known list scheduling strategies:• Highest Level First algorithm or HLF • Longest Path algorithm or LP • Longest Processing Time • Critical Path Method

• List scheduling algorithms only produce good results for coarse-grained applications

Page 20: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

20

Static scheduling task precedence graphDSC: Dominance Sequence Clustering

• Yang and Gerasoulis, 1994: two step method for scheduling with communication:(focus on the critical path)1) schedule an unbounded number of completely

connected processors (cluster of tasks);

2) if the number of clusters is larger than the number of available processors, then merge the clusters until it gets the number of real processors, considering the network topology (merging step).

Page 21: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

21

Graph partitioning

• Kumar and Biswas, 2002: MiniMax– multilevel graph partitioning scheme– Grid-aware– consider two weighted undirected

graphs: • a work-load graph (to model the problem

domain)• a system graph (to model the

heterogeneous system)

Page 22: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

22

Resource Management

Source: P. K. V. Mangan, Ph.D. Thesis, 2006

(1988)

Page 23: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

23

Resource Management

• The scheduling algorithm has four components:– transfer policy: when a node can take part of

a task transfer; – selection policy: which task must be

transferred;– location policy: which node to transfer to;– information policy: when to collect system

state information.

Page 24: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

24

Resource Management

• Location policy:

– Sender-initiated

– Receiver-initiated

– Symmetrically-initiated

Page 25: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

25

Scheduling mechanisms for grid

• Berman, 1998 (ext. by Kayser, 2006):– Job scheduler– Resource scheduler– Application scheduler– Meta-scheduler

Page 26: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

26

Scheduling mechanisms for grid

• Legion– University of Virginia (Grimshaw, 1993)– Supercomputing 1997– Currently Avaki commercial product

Page 27: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

27

Legion

• is an object oriented infrastructure for grid environments layered on top of existing software services.

• uses the existing operating systems, resource management tools, and security mechanisms at host sites to implement higher level system-wide services

• design is based on a set of core objects

Page 28: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

28

Legion

• resource management is a negotiation between resources and active objects that represent the distributed application

• three steps to allocate resources for a task:– Decision: considers task’s characteristics and

requirements, resource’s properties and policies, and users’ preferences

– Enactment: the class object receives an activation request; if the placement is acceptable, start the task

– Monitoring: ensures that the task is operating correctly

Page 29: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

29

Globus

• Toolkit with a set of components that implement basic services:

– Security– resource location– resource management– data management– resource reservation– Communication

Page 30: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

30

Globus

• From version 1.0 in 1998 to the 2.0 release in 2002 and the latest 3.0, the emphasis is to provide a set of components that can be used either independently or together to develop applications

• The Globus Toolkit version 2 (GT2) design is highly related to the architecture proposed by Foster et al.

• The Globus Toolkit version 3 (GT3) design is based on grid services, which are quite similar to web services. GT3 implements the Open Grid Service Infrastructure (OGSI).

• The current version, GT4, is also based on grid services, but with some changes in the standard

Page 31: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

31

Globus: scheduling• GRAM: Globus Resource Allocation Manager• Each GRAM responsible for a set of resources

operating under the same site-specific allocation policy, often implemented by a local resource management

• GRAM provides an abstraction for remote process queuing and execution with several powerful features such as strong security and file transfer

• It does not provide scheduling or resource brokering capabilities but it can be used to start programs on remote resources, despite local heterogeneity due to the standard API and protocol.

Page 32: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

32

Globus: scheduling

• Resource Specification Language (RSL) is used to communicate requirements.

• To take advantage of GRAM, a user still needs a system that can remember what jobs have been submitted, where they are, and what they are doing.

• To track large numbers of jobs, the user needs queuing, prioritization, logging, and accounting. These services cannot be found in GRAM alone, but are provided by systems such as Condor-G

Page 33: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

33

MyGrid and OurGrid

• Mainly for bag-of-tasks (BoT) applications• uses the dynamic algorithm Work Queue

with Replication (WQR)• hosts that finished their tasks are assigned

to execute replicas of tasks that are still running.

• Tasks are replicated until a predefined maximum number of replicas is achieved (in MyGrid, the default is one).

Page 34: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

34

OurGrid

• An extension of MyGrid

• resource sharing system based on peer-to-peer technology

• resources are shared according to a “network of favors model”, in which each peer prioritizes those who have credit in their past history of interactions.

Page 35: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

35

GrADS

• is an application scheduler• The user invokes the Grid Routine component

to execute an application• The Grid Routine invokes the component

Resource Selector• The Resource Selector accesses the Globus

MetaDirectory Service (MDS) to get a list of machines that are alive and then contact the Network Weather Service (NWS) to get system information for the machines.

Page 36: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

36

GrADS

• The Grid Routine then invokes a component called Performance Modeler with the problem parameters, machines and machine information.

• The Performance Modeler builds the final list of machines and sends it to the Contract Developer for approval.

• The Grid Routine then passes the problem, its parameters, and the final list of machines to the Application Launcher.

Page 37: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

37

GrADS

• The Application Launcher spawns the job using the Globus management mechanism (GRAM) and also spawns the Contract Monitor.

• The Contract Monitor monitors the application, displays the actual and predicted times, and can report contract violations to a re-scheduler.

Page 38: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

38

GrADS

• Although the execution model is efficient from the application perspective, it does not take into account the existence of other applications in the system

Page 39: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

39

GrADS

• Vadhiyar and Dongarra, 2002: proposed a metascheduling architecture in the context of the GrADS Project.

• The metascheduler receives candidate schedules of different application level schedulers and implements scheduling policies for balancing the interests of different applications.

Page 40: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

40

EasyGrid

• Mainly concerned with MPI applications

• Allows intercluster execution of MPI processes

Page 41: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

41

Nimrod

• uses a simple declarative parametric modeling language to express parametric experiments

• provides machinery that automates:– task of formulating, – running, – monitoring, – collating results from the multiple individual experiments.

• incorporates distributed scheduling that can manage the scheduling of individual experiments to idle computers in a local area network

• has been applied to a range of application areas, e.g.: Bioinformatics, Operations Research, Network Simulation, Electronic CAD, Ecological Modelling and Business Process Simulation.

Page 42: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

42

Nimrod/G

Page 43: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

43

AppLeS

• UCSD (Berman and Casanova)

• Application parameter Sweep Template

• Use scheduling based on min-min, min-max, sufferage, with heuristics to estimate performance of resources and tasks– Performance information dependent

algorithms (pida)

• Main goal: to minimize file transfers

Page 44: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

44

Main scheduling algorithm

sched() { (1) compute the next scheduling event(2) create a Gantt Chart, G(3) foreach computation and file transfer currently underway

compute an estimate of its completion timefill in the corresponding blocks in G

(4) until each host has been assigned enough workheuristically assign tasks to hosts (filling blocks in

G)(5) convert G into a plan

}Min-min, min-max, sufferage: step (4)

Page 45: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

45

Min-min algorithm1. A task list is generated that includes all the tasks as unmapped

tasks.2. For each task in the task list, the machine that gives the task its

minimum completion time (first Min) is determined (ignoring other unmapped tasks).

3. Among all task-machine pairs found in 2, the pair that has the minimum completion time (second Min) is determined.

4. The task selected in 3 is removed from the task list and is mapped to the paired machine.

5. The ready time of the machine on which the task is mapped is updated.

6. Steps 2-5 are repeated until all tasks have been mapped.

Source: Study of an Iterative Technique to Minimize Completion Times of Non-Makespan Machines, by Luis Diego Briceño, Mohana Oltikar, Howard Jay Siegel, and Anthony A. Maciejewski, 2007

Page 46: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

46

Sufferage algorithm1. A task list (L) is generated that includes all unmapped tasks in a

given arbitrary order.2. While there are still unmapped tasks:

i. Mark all machines as unassigned.ii. For each task tk є L.

a. The machine mj that gives the earliest completion time is found.b. The Sufferage value is calculated. (Sufferage value = second earliest

completion time minus earliest completion time).c. If machine mj is unassigned then assign tk to machine mj , delete tk from L,

and mark mj as assigned. Otherwise, if the sufferage value of the task (ti) already assigned to mj is less than the sufferage value of task tk then unassign ti, add ti back to L, assign tk to machine mj , and remove tk from L.

iii. The ready times for all machines are updated.

Source: Study of an Iterative Technique to Minimize Completion Times of Non-Makespan Machines, by Luis Diego Briceño, Mohana Oltikar, Howard Jay Siegel, and Anthony A. Maciejewski, 2007

Page 47: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

47

Minimum Completion Time (MCT) algorithm

1. A task list is generated that includes all unmapped tasks in a given arbitrary order.

2. The first task in the list is mapped to its minimum completion time machine (machine ready time plus estimated computation time of the task on that machine).

3. The task selected in step 2 is removed from the task list.4. The ready time of the machine on which the task is

mapped is updated.5. Steps 2-4 are repeated until all the tasks have been

mapped.

Source: Study of an Iterative Technique to Minimize Completion Times of Non-Makespan Machines, by Luis Diego Briceño, Mohana Oltikar, Howard Jay Siegel, and Anthony A. Maciejewski, 2007

Page 48: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

48

GRAnD [Kayser et al., CCP&E, 2007]

• Distributed submission control

• Data locality

• automatic staging of data

• optimization of file transfer

Page 49: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

49

Distributed submission

Results of simulation with Monarc: http://monarc.web.cern.ch/MONARC/[Kayser, 2006]

Page 50: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

50

GRAnD

• Experiments with Globus– Discussion list: [email protected] (05/02/2004)

• Submission takes 2s per task • Place 200 tasks in the queue: ~6min• Maximum number of tasks: few hundreds

– experiments in CERN (D. Foster et al. 2003)

• 16s to submit a task• Saturation in the server: 3.8 tasks/minute

Page 51: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

51

GRAnD• Grid Robust Application Deployment

Page 52: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

52

GRAnD

Page 53: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

53

GRAnD data management

Page 54: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

54

GRAnD data management

Page 55: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

55

Comparison (Kayser, 2006)

Page 56: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

56

Comparison (Kayser, 2006)

Page 57: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

57

Condor performance

Page 58: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

58

Condor performance

Page 59: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

59

Condor x AppMan

Page 60: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

60

Condor performance

exps on a cluster of 8 nodes (Sanches et al. 2005)

Page 61: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

61

ReGS: Condor performance

Page 62: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

62

ReGS: Condor performance

Page 63: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

63

Toward Grid Operating Systems

• Vega GOS

• G SMA

Page 64: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

64

Vega GOS (the CNGrid OS)

GOS overviewA user-level middleware running on a client

machine

• GOS has 2 components: GOS and gnetd

- GOS is a daemon running on the client machine

- gnetd is a daemon on the grid server

Page 65: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

65

GOS

• Grid process and Grid thread– Grid process is a unit for managing the whole resource of the Grid.– Grid thread is a unit for executing computation on the Grid.

• GOS API– GOS API for application developers

• grid(): constructs a Grid process on the client machine.• gridcon(): grid process connects to the Grid system.

• gridclose(): close a connected grid.

– gnetd API for service developer on Grid servers• grid_register(): register a service to Grid.• grid_unregister(): unregister a service.

Page 66: Grid Computing Grid Systems and scheduling Grid Computing April 20 th, 2009

66

Grid

• Not yet mentioned:– Simulation: SimGrid and GridSim– Monitoring: RTM, MonaLisa, ...– Portals: GridIce, Genius, ...