scheduling for grid computing

82
Scheduling for Grid Computing 龚 龚 龚龚龚龚龚龚龚龚龚龚龚龚龚龚 龚龚龚龚龚龚龚龚龚龚

Upload: milt

Post on 14-Jan-2016

54 views

Category:

Documents


1 download

DESCRIPTION

Scheduling for Grid Computing. 龚 斌 山东大学计算机科学与技术学院 山东省高性能计算中心. Reference. Fangpeng Dong and Selim G.Akl : Scheduling Algorithms for Grid Computing : State of the Art and Open Problems Yanmin ZHU : A Survey on Grid Scheduling Systems Peter Gradwell : Overview of Grid Scheduling Systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Scheduling for Grid Computing

Scheduling for Grid Computing

龚 斌山东大学计算机科学与技术学院

山东省高性能计算中心

Page 2: Scheduling for Grid Computing

Reference• Fangpeng Dong and Selim G.Akl : Scheduling Alg

orithms for Grid Computing : State of the Art and Open Problems

• Yanmin ZHU : A Survey on Grid Scheduling Systems

• Peter Gradwell : Overview of Grid Scheduling Systems

• Alain Andrieux et al : Open Issues in Grid Scheduling• Jia yu and Rajkumar Buyya : A Taxonomy of Workfl

ow Systems for Grid Computing

Page 3: Scheduling for Grid Computing

什么是网格? 网格( Grid )是构筑在 Internet 上的一组新

兴技术,它将高速互联网、高性能计算机、大型数据库、传感器、远程设备等融为一体,为科技人员和普通老百姓提供更多的资源。 Internet 主要为人们提供E-mail 、网页浏览等通信功能,而网格功能更多更强,能让人们透明地使用计算、存储、信息处理等其他资源。

1998, The Grid: Blueprint for a New Computing Infrastructure.

Ian Foster : 美国阿岗国家实验室资深 科学家、美国计算网格项目负责人

Page 4: Scheduling for Grid Computing

The Definition of Grid

• A type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous and heterogeneous resources dynamically at runtime depending on their availability, capability, performance, cost and users’ quality-of-service requirements

Page 5: Scheduling for Grid Computing

Characteristics of Grid Computing

• Exploiting underutilized resources

• Distributed supercomputing capability

• Virtual organization for collaboration

• Resource balancing

• Reliability

Page 6: Scheduling for Grid Computing

Class of Grid Computing

• Function:– Computing Grid– Data Grid– Service Grid

• Size:– IntraGrid– ExtraGrid– InterGrid

Page 7: Scheduling for Grid Computing

Traditional Parallel Scheduling Systems

• System:– SMP : 对称多处理,共享内存– Cluster :机群– CC-NUMA: SGI

• Scheduling Systems:– OpenPBS, LSF, SGE , Loadlevel, Condor,etc

Page 8: Scheduling for Grid Computing
Page 9: Scheduling for Grid Computing

Cluster Scheduling

Page 10: Scheduling for Grid Computing

The Assumption Underlying Tradition

Systems • All resources reside within a single administrative domain.• To provide a single system image, the scheduler controls

all of the resources.• The resource pool is invariant.• Contention caused by incoming application can be

managed by the scheduler according to some policies, so that its impact on the performance that the site can provide to each application can be well predicted.

• Computation and their data reside in the same site or data staging is a highly predictable process, usually from a predetermined source to a predetermined destination, which can be viewed as constant overhead.

Page 11: Scheduling for Grid Computing

Characteristics of Cluster Scheduling

• Homogeneity of resource and application

• Dedicated resource

• Centralized scheduling architecture

• High-speed interconnection network

• Monotonic performance goal

Page 12: Scheduling for Grid Computing

年代

Page 13: Scheduling for Grid Computing

The Terms of Grid Scheduling• A task is an atomic unit to be scheduled by the scheduler and a

ssigned to a resource.• The properties of a task are parameters like CPU/memory requ

irement, deadline, priority, etc.• A job (or metatask, or application) is a set of atomic tasks that

will be carried out on a set of resources. Job can have a recursive structure, meaning that jobs are composed of sub-jobs and /or tasks, and sub-jobs can themselves be decomposed further into atomic tasks.

• A resource is something that is required to carry out an operation, for example: a processor for data processing, a data storage device, or a network link for data transporting.

• A site (or node) is an autonomous entity composed of one or multiple resources.

• A task scheduling is the mapping of tasks to a selected group of resources which may be distributed in administrative domains.

Page 14: Scheduling for Grid Computing

Three Stages of Scheduling Process

• Resource discovering and filtering

• Resource selecting and scheduling according to certain objectives

• Job submission

Page 15: Scheduling for Grid Computing

Stages of SuperScheduling

• Resource Discovery– Authorization Filtering– Application requirement definition– Minimal requirement filtering

• System Selection– Gathering information (query)– Select the system (s) to run on

• Run Job– (optional) Make an advance reservation– Submit job to resources– Preparation Tasks– Monitor progress (maybe go back to System Selection)– Find out J is done– Completion tasks

Page 16: Scheduling for Grid Computing

Grid Scheduling framework

• Application Model– Extracts the characteristics of applications to be scheduled.

• Resource Model– Describes the characteristics of the underlying resources in Grid

systems.

• Performance Model– Responsible for behavior of a specific job on a specific

computation resource.

• Scheduling Policy– Responsible for deciding how applications should be executed and

how resources should be utilized.

Page 17: Scheduling for Grid Computing

Applications Classification

• Batch vs. Interactive

• Real-time vs. Non real-time

• Priority

Page 18: Scheduling for Grid Computing

Resources Classification

• Time-shared vs. Non time-shared

• Dedicated vs. Non-dedicated

• Preemptive vs. Non-preemptive

Page 19: Scheduling for Grid Computing

Performance Estimation

• Simulation

• Analytical Modeling

• Historical Data

• On-line Learning

• Hybrid

Page 20: Scheduling for Grid Computing

Scheduling Policy

• Application-centric– Execution Time : the time duration spent executing the job– Wait Time : the time duration spent waiting in the ready queue– Speedup : the ratio of time spent executing the job on the original

platform to time spent executing the job on the Grid.– Turnaround Time : also called response time. It is defined as the sum of

waiting time and executing time.– Job Slowdown : it is defined as the ratio of the response time of a job to

its actual run time.• System-centric

– Throughput : the number of jobs completed in one unit of time, such as per hour or per day.

– Utilization : the percent of time a resource is busy.– Flow Time : the flow time of a set of jobs is the sum of completion time

of all jobs.– Average Application performance.

Page 21: Scheduling for Grid Computing

Scheduling Strategy

• Performance-driven• Market-driven• Trust-driven

– Security policy– Accumulated reputation– Self-defense capability– Attack history– Site vulnerability (弱点、攻击)

Page 22: Scheduling for Grid Computing

A logical Grid scheduling architectureBroken lines : resource or application information flowsReal lines : task or task scheduling command flows

Page 23: Scheduling for Grid Computing

Grid Scheduler

• Grid Scheduler (GS) receives application from Grid users, select feasible resources for these application according to acquired information from the Grid Information Service module, and finally generates application-to-resource mappings based on certain objective functions and predicted resource performance.– GS usually cannot control Grid resources directly, but work like br

oker or agents– Metascheduler, SuperScheduler– Is not an indispensable component in the Grid infrastructure. Not i

ncluded in the Globus Tookit– In reality multiple such schedulers might be deployed, and organiz

ed to form different structures (centralized, hierarchical and decentralized) according to different concerns, such as performance or scalability.

Page 24: Scheduling for Grid Computing

Grid Information Service (GIS)

• To provide such information to Grid schedulers• GIS is responsible for collecting and predicting the resourc

e state information, such as CPU capacities, memory size, network bandwidth, software availabilities and load of a site in a particular period.

• GIS can answer queries for resource information or push information subscribers

• Globus : Monitoring and Discovery System (MDS)• Application profiling (AP) is used to extract properties of a

pplications• Analogical Benchmarking (AB) provides a measure of ho

w well a resource can perform a given type of job.

Page 25: Scheduling for Grid Computing

Launching and Monitoring (LM)

• Binder• Implements a finally-determined schedule b

y submitting applications to selected resources, staging input data and executables if necessary, and monitoring the execution of the applications

• Globus :Grid Resource Allocation and Management, GRAM

Page 26: Scheduling for Grid Computing

Local Resource Manager (LRM)

• Is mainly responsible for two jobs: local scheduling inside a resource domain, where not only jobs from exterior Grid users, but also jobs from the domain’s local users are executed, and reporting resource information to GIS.

• Open PBS, Condor , LSF , SGE , etc• NWS : Network Weather Service, Hawkeye, Gang

lia

Page 27: Scheduling for Grid Computing

Evaluation Criteria for Grid Scheduling Systems

• Application Performance Promotion

• System Performance Promotion

• Scheduling Efficiency

• Reliability

• Scalability

• Applicability to Application and Grid Environment

Page 28: Scheduling for Grid Computing

Scheduler Organization

• Centralized

• Decentralized

• Hierarchical

Page 29: Scheduling for Grid Computing

Centralized Scheduling

Page 30: Scheduling for Grid Computing

Decentralized Scheduling

Page 31: Scheduling for Grid Computing

Hierarchical Scheduling

Page 32: Scheduling for Grid Computing

Existing Grid Scheduling Systems

• Information Collection Systems– MDS (Meta Directory Service)– NWS (Network Weather Service)

• Condor• Condor-G• AppLeS• Nimrod-G• GRaDS• Etc…

Page 33: Scheduling for Grid Computing

Characteristics of scheduling for Grid Computing

• Heterogeneity and Autonomy– Does not have full control of the resources– Hard to estimate the exact cost of executing a task on different sites.– Is required to be adaptive to different local policies

• Performance Dynamism– Grid resources are not dedicated to a Grid application– Performance fluctuation, compared with traditional system– Some methods: QoS negotiation, resource reservation, rescheduling

• Resource Selection and Computation-Data Separation– In tradition systems, executable codes of application and input/output data

are usually in the same site, or the input sources and output destinations determined before the application is submitted, The cost of data staging can be neglected.

• Application Diversity

Page 34: Scheduling for Grid Computing

Grid Scheduling Algorithms

• The Complexity of a general scheduling problem is NP-Complete.

• The scheduling problem becomes more challenging because of some unique characteristics belonging to Grid computing.

Page 35: Scheduling for Grid Computing

A Hierarchical taxonomy for scheduling algorithm

Page 36: Scheduling for Grid Computing

A Taxonomy of Grid Scheduling Algorithm

• Local vs. Global– Grid scheduling falls into the global scheduling.

• Static vs. Dynamic– Both static and dynamic scheduling are widely

adopted in Grid computing

Page 37: Scheduling for Grid Computing

Static Scheduling

• Every task comprising the job is assigned once to a resource, the placement of an application is static, and a firm estimate of the cost of the computation can be made in advance of the actual execution.

• Easier to program from a scheduler’s point of view• Rescheduling mechanism are introduced for task

migration.• Another side-effect is that the gap between static

scheduling and dynamic scheduling becomes less important.

Page 38: Scheduling for Grid Computing

Dynamic Scheduling

• Online Scheduling• Two Major components : system state estimation ,

decision making.• Advantage : the system need not be aware of the run-time

behavior of the application before execution• The primary performance : maximizing resource

utilization, rather than minimizing runtime for individual jobs.

• Four basic approaches:– Unconstrained FIFO– Balance-constrained techniques– Cost-constrained techniques– Hybrid of static and dynamic techniques

Page 39: Scheduling for Grid Computing

Unconstrained FIFO

• The resource with the currently shortest waiting queue or the smallest waiting queue time is selected for the incoming task

• Opportunistic Load Balancing (OLB), or myopic ( 近视 ) algorithm

• Simplicity, but far from optimal

Page 40: Scheduling for Grid Computing

Balance-constrained

• Attempts to rebalance the loads on all resources by periodically shifting waiting tasks from one waiting queue to another.

• The rebalance only happens inside a “neighborhood” where all resources are better interconnected.

• Advantages:– The initial loads can be quickly distributed to all resources and

started quickly– The rebalancing process is distributed and scalable– The communication delay of rebalancing can be reduced since task

shifting only happens among the resources that are “close” to each other

Page 41: Scheduling for Grid Computing

Cost-constrained

• Not only considers the balance among resources but also the communication cost between tasks

• Instead of doing a task exchange periodically, tasks will be checked before their move.

• This approach is more efficient than the previous one when the communication costs among resources are heterogeneous and communication cost to execute the application is the main consideration

• It is also flexible, and can be used with other cost factors such as seeking lowest memory size or lowest disc drive activity, and so on.

Page 42: Scheduling for Grid Computing

Hybrid

• A further improvement is the static-dynamic hybrid scheduling

• Is to take the advantages of static schedule and at the same time capture uncertain behaviors of applications and resources.

• For example, in those cases where there are special QoS requirements in some tasks, the static phase can be used to map those task with QoS requirements, and dynamic scheduling can be used for the remaining tasks.

Page 43: Scheduling for Grid Computing

A Taxonomy of Grid Scheduling Algorithm (cont.)

• Optimal vs. Suboptimal– Some criterion : minimum makespan , maximum resour

ce utilization

– Makespan : is the time spent from the beginning of the first task in a job to the end of the last task of the job.

– The NP-Complete nature of scheduling algorithms

– Current research tries to find suboptimal solutions

Page 44: Scheduling for Grid Computing

A Taxonomy of Grid Scheduling Algorithm (cont.)

• Approximate vs. Heuristic– Approximate

• Use formal computational models, but instead of searching the entire solution space for an optimal solution

• The factor:– Availability of a function to evaluate a solution– The time required to evaluate a solution– The ability to judge the value of an optimal solution according to some

metric.– Availability of a mechanism for intelligently pruning the solution space.

– Heuristic• Represents the class of algorithms which make the most realistic

assumptions about a priori knowledge concerning process and system loading characteristics.

• are more adaptive to the Grid scenarios where both resources and applications are highly diverse and dynamic

Page 45: Scheduling for Grid Computing

A Taxonomy of Grid Scheduling Algorithm (cont.)

• Distributed vs. Centralized– The centralized strategy has the advantage of

ease of implementation, but suffers from the lack of scalability, fault tolerance and the possibility of becoming a performance bottleneck.

Page 46: Scheduling for Grid Computing

A Taxonomy of Grid Scheduling Algorithm (cont.)

• Cooperative vs. Non-cooperative– In the non-cooperative case, individual schedulers act

alone as autonomous entities and arrive at decisions regarding their own optimum objects independent of the effects of the decision on the rest of system.

– In the cooperative case, each Grid scheduler has the responsibility to carry out its own portion of the scheduling task, but all schedulers are working toward a common system-wide goal.

Page 47: Scheduling for Grid Computing

Objective Functions

• The two major parties in Grid computing– Resource consumers who submit various application– Resources Providers who share their resources

• Application Centric– Makespan– Economic Cost

• Resource Centric– Resource Utilization– Economic Profit

Page 48: Scheduling for Grid Computing

Application-Centric

• Aim to optimize the performance of each individual application, as application-level schedulers do.

• time : makespan

• Grid economic model : economic cost

• QoS : Quality of Services

Page 49: Scheduling for Grid Computing

Resource-Centric

• Aim to optimize the performance of the resources• Throughput : which is the ability of a resource to process a

certain number of jobs in a given period.• Utilization : which is the percentage of time a resource is b

usy• Grid economic Model : Economic Profit• TPCC : Total Processor Cycle Consumption, which is the t

otal number of instructions the grid could compute from the starting time of executing the schedule to the completion.– Represents the total computing power consumed by an application– Advantage : it can be little affected by the variance of resource per

formance, yet still related to the makespan.

Page 50: Scheduling for Grid Computing

Adaptive Scheduling

• The demand for scheduling adaptation comes from:– The heterogeneity of candidate resources– The dynamism of resource performance– The diversity of applications

• Resource Adaptation• Dynamic Performance Adaptation• Application Adaptation

Page 51: Scheduling for Grid Computing

Resource Adaptation

• Su et al : show how the selection of a data storage site affects the network transmission delay.

• Dail et al : proposed a resource selection algorithm– Available resources are grouped first into disjoint subsets according to the

network delays between the subsets– Inside each subset, resources are ranked according to their memory size a

nd computation power– An appropriately-size resource group is selected from the sorted lists

• Subhlok et al : show algorithms to jointly analyze computational and communication resource for different application demands and a framework for automatic node selection– The algorithm are adaptive to demands like selecting a set of nodes to ma

ximize minimum available bandwidth between any pair of nodes and selecting a set of nodes to maximize the minimum available fractional compute and communication capacities.

Page 52: Scheduling for Grid Computing

Dynamic Performance Adaptation

• The adaptation of the dynamic performance of resources is :– Changing scheduling policies or rescheduling– Workload distributing according to application-specific

performance models– Finding a proper number of resources to be used

• Usually adopt some kind of divide-and conquer approach– Parameter sweep applications– Data stripe processing

• Cluster-aware Random Stealing (CRS)– Allows an idle resource steal jobs not only from the local cluster

but also from remote ones with a very limited amount of wide-area communication

Page 53: Scheduling for Grid Computing

Application Adaptation

• Dial et al : explicitly decouple the scheduler core from application-specific and platform-specific components used by the core.

• Aggarwal et al : resource reservation• Wu et al : give a very good example of how a self-adaptive

scheduling algorithm cooperates with long-term resource performance prediction.– The algorithm is adaptive to indivisible single sequential jobs, jobs

that can be partitioned into independent parallel tasks, and jobs that have a set of indivisible tasks.

– When prediction error of the system utilization id reaching a threshold, the scheduler will try to reallocate tasks.

Page 54: Scheduling for Grid Computing

Task Dependency of an Application

• Independent– Static– Dynamic

• Dependent– Static

• List Algorithm• Cluster Algorithm• Duplication-based Algorithm

– Dynamic– Static Enhanced by Dynamic Rescheduling

Page 55: Scheduling for Grid Computing

Independent Task Scheduling

• Algorithms with Performance Estimate– MET– MCT– Min-min– Max-min– Xsuffrage– Task Grouping

• Algorithms without Performance Estimate

Page 56: Scheduling for Grid Computing

MET Algorithm

• Minimum Execution Time• Assigns each task to the resource with the best

expected execution time for that task, no matter whether this resource is available or not at the present time.

• The motivation behind MET is to give each task its best machine

• This can cause a severe load imbalance among machine

Page 57: Scheduling for Grid Computing

MET AlgorithmFor each arrived task S[k]

for each host H[j] in Heterogeneous Machines Set H

查找出最小的 E[k,j] 以及获得该最小值的机器 H[t]

endfor

更新机器就绪时间: r[t]=r[t]+E[k,t]

endforS[k]:任务集合H[j]:机器集合E[k,j]:任务 S[k]在机器 H[j]上的期望执行时间R[t]:机器 H[j]的期望就绪时间

Page 58: Scheduling for Grid Computing

MCT Algorithm

• Minimum Completion Time• Assigns each task, in an arbitrary order, to the

resource with the minimum expected completion time for that task.

• This cause some tasks to be assigned to machines that do not have the minimum execution time for them.

• The intuition behind MCT is to combine the benefits of opportunistic load balance (OLB) and MET, while avoiding the circumstances in which OLB and MET perform poorly.

Page 59: Scheduling for Grid Computing

MCT AlgorithmFor each arrived task S[k] for each host H[j] in heterogeneous Machines Set H 计算预测完成时间: C[k,j]=E[k,j]+r[j] 查找出最小的 C[k,j] 以及获得该最小值的机器 H[t]

endfor 更新机器就绪时间: r[t]=r[t]+E[k,t]endfor

C[k,j]:任务 S[k]在机器 H[j]上的期望完成时间

Page 60: Scheduling for Grid Computing

Min-min Algorithm

• Algorithm:– Begins with the set U of all unmapped tasks– The set of minimum completion time from M for each task in U is

found– The task with the overall minimum completion time from M is

selected and assigned to the corresponding machine– The newly mapped task is removed from U, and the process

repeats until all tasks are mapped (i.e., U is empty).

• Based on the minimum completion time, as is MCT.

Page 61: Scheduling for Grid Computing

Min-min Algorithm

For all task S[k] in scheduling-set SS for all machines H[j] in heterogeneous host set H C[k,j]=E[k,j]+r[j]// 计算全部任务在每一个机器上的期望完成时间Do until all tasks in SS are mapped for each task in SS// 找出全部未映射任务在每个机器上的最小完成时间及

其机器 find the earliest (minimum) completion time and the host that obtains it endfor // 在所有未映射任务的最小完成时间中找出最小值机器获得该值的机器 H[j] find task S[k] with the minimum earliest completion time assign task S[k] to the host H[j] that gives the earliest completion time delete task S[k] from SS and update r[j] Update C[k,j] for all host H[j]enddo

Page 62: Scheduling for Grid Computing

Max-min Algorithm

• Algorithm : is very similar to Min-min.– Begins the set U of all unmapped tasks

– The set of minimum completion time M is found

– The task with the overall maximum from M is selected and assigned to the corresponding machine

– The newly mapped task is removed from U, and the process repeats until all tasks are mapped

Page 63: Scheduling for Grid Computing

• Min-min and Max-min algorithms are simple and can be easily amended to adapt to different scenarios– X. He et al : is presents a QoS Guided Min-min

heuristic, can guarantee the QoS requirement of particular tasks and minimum the makespan at the same time.

– Wu, Shu and Zhang : gave a Segmented Min-min algorithm.

Page 64: Scheduling for Grid Computing

Max-Int Algorithm(最大时间跨度算法 )

For all task S[k] in scheduling-set SS for all machines H[j] in heterogeneous host set H C[k,j]=E[k,j]+r[j]// 计算全部任务在每一个机器上的期望完成时间Do until all tasks in SS are mapped for each task in SS find the earliest (minimum) completion time C[k,m] and the host H[m] that

obtains it find second earliest completion time C[k,n] and the host H[n] that obtains it 计算 Interval : I[k]=C[k,n]-C[k,m] ,并将 I[k] 作为向量 I 的一个

元素 endfor // 从全部任务的时间间隔 I 中,找出具有最小时间间隔的任务 S[t] for all task S[k] in SS find the task S[t] with the maximum Interval assign task S[t] to the host H[m] that gives the earliest completion time delete task S[t] from SS and update r[j] Update C[t,m] for all host Henddo

Page 65: Scheduling for Grid Computing

Max-Int Algorithm

• 吸取 Min-min 和 Max-min 算法的优点,除利用历史调度信息,还利用预测信息减少调度任务时间

• 未来调度总是趋向最佳

Page 66: Scheduling for Grid Computing

Suffrage algorithm

• 一个资源将被分配给这样的一个作业,如果作业不分配到该节点上,将会蒙受最大的损失。

• 每个作业有一个 sufferage 值,定义在该任务的最好完成时间和它的次好完成时间之间, sufferage 值高的作业有优先权。

Page 67: Scheduling for Grid Computing

Algorithmfor 作业集合 T 中所有的作业 for 所有网格节点 mj

ckj=ekj+rj

do until T 中所有任务映射 for 作业集合 T 中所有的作业 tk

寻找具有最早完成时间的 mj sufferage value= 次好完成时间 - 最好完成时间 if mj 没有指派 指派 tk给 mj,从 T 中删除 tk,标记 mj为已经指派 else if 已经指派给 mj的 tk的 sufferage 小于 tk的 sufferage value 取消 ti的指派,把 ti放回 T 中,指派 tk给 mj,从

T 中删除 tk

endfor 基于指派给机器的作业更新向量 r , 更新 c 矩阵enddo

Page 68: Scheduling for Grid Computing

Task Grouping

• Some cases in which applications with a large number of lightweight jobs. The overall processing of these applications involves a high overhead cost in terms of scheduling and transmission to or from Grid resources.

• Muthuvelu et al : propose a dynamic task grouping scheduling algorithm to deal with these cases.– Once a set of fine grained tasks are received, the scheduler groups

them according to their requirements for computation and the processing that a Grid resource can provide in a certain time period.

– All tasks in the same group are submitted to the same resource which can finish them all in the given time.

– The overhead for scheduling and job launching is reduced and resource utilization is increased.

Page 69: Scheduling for Grid Computing

Algorithms without Performance Estimate

• Do not use performance estimate but adopt the idea of duplication, which is feasible in the Grid environment where computational resources are usually abundant but mutable.

• Subramani et al : a simple duplication scheme– Distributes each job to the K least load sites– Each of these K sites schedules the job locally– When a job is able to start at any of the sites, the site informs the s

cheduler at the job-originating site, which in turn contacts the other K-1 sites to cancel the jobs from their respective queue.

• Silva et al : Workqueue with Replication (WQR)

Page 70: Scheduling for Grid Computing

Dependent Task Scheduling

• Directed Acyclic Graph (DAG) – Node represents a task.– Directed edge denotes the precedence orders be

tween its two vertices.– In some cases, weights can be added to nodes a

nd edges to express computational costs and communication costs respectively

• Condor DAGMan, CoG, Pegasus, GridFlow, ASKALON

Page 71: Scheduling for Grid Computing

DAG

Page 72: Scheduling for Grid Computing

Grid Systems Supporting Dependent Task Scheduling

• To run a workflow in a Grid :– How the tasks in the workflow are scheduled.

Grid workflow generators– How to submit the scheduled tasks to Grid

resources without violating the structure of the original workflow. Grid workflow engines

Page 73: Scheduling for Grid Computing

Taxonomy of Algorithms for Dependent Task Scheduling

• List Heuristics– Heterogeneous Earliest-Finish-Time, HEFT– Fast Critical Path, FCP

• Duplication Based Algorithms– Task Duplication-based Scheduling, TDS– Task duplication-based scheduling Algorithm for

Network of Heterogeneous systems, TANH• Clustering Heuristics

– Dominant Sequence Clustering, DSC– CASS-II

Page 74: Scheduling for Grid Computing

Data Scheduling

• In high energy physics, bioinformatics, and other disciplines, there are application involving numerous, parallel tasks that both access and generate large data sets, sometime in the petabyte range.

• Remote data storage, access management, replication services, and data transfer protocol.

Page 75: Scheduling for Grid Computing

Park et al’s model of cost measured in makespan

• Local Data and Local Execution

• Local Data and Remote Execution

• Remote Data and Local Execution

• Remote Data and Same Remote Execution

• Remote Data and Different Remote Execution

Page 76: Scheduling for Grid Computing

On Data Replication

• When the scheduling problem with data movement is considered, there are two situation : whether data replication is allowed or not.

• In Pegasus, the CWG assumes that accessing an existing dataset is always more preferable than a new one, when it maps an abstract workflow to a concrete one.

• Ranganthan et al : view data sets in the Grid as a tiered system and use dynamic replication strategies to improve data access.

Page 77: Scheduling for Grid Computing

On Computation and Data Scheduling

• When the interaction of computation scheduling and data scheduling is considered, we can also find two different of approaches – Decoupling computation from data scheduling– Conducting a combination scheduling

Page 78: Scheduling for Grid Computing

Non-traditional Approaches for Grid Task Scheduling

• Grid Economy – Economic Cost/Profit Considered– None Economic Cost/Profit Considered

• Nature’s Heuristics– Genetic Algorithms– Simulated Annealing– Tabu Serarch

Page 79: Scheduling for Grid Computing

Scheduling under QoS Constains

• Ina distributed heterogeneous non-dedicated environment, quality of services (QoS) is a big concern of many application. The meaning of QoS can be varied according to the concerns of different users. It could be a requirement on the CPU speed, memory size, bandwidth, software version or deadline.

• In general, QoS is not the ultimate objective of an application, but a set of conditions to run application successfully.

Page 80: Scheduling for Grid Computing

Strategies Treating Dynamic Resource Performance

• On-Time-Information from GIS

• Performance Prediction Based on GIS, Historical Record and Workload Modeling– On Prediction Accuracy– Prediction Based on Historical Record– Prediction Based on Workload Modeling

• Rescheduling

Page 81: Scheduling for Grid Computing

Open Issues On the Grid Scheduling

• Application and Enhancement of Classic Heterogeneous Scheduling Algorithms in Grid Environment

• New Algorithms Utilizing Dynamic Performance Prediction

• New Rescheduling Algorithms Adaptive Performance Variation

• New Algorithms under QoS Constraints• New Algorithms Considering Combined Computation and

Data Scheduling• New Problems Introduced by New Models• New Algorithm Utilizing the Grid Resource Overlay Struct

ure

Page 82: Scheduling for Grid Computing

如何阅读科技论文• 学会多种方法查找有关文献

– Google+ 关键词– 文献引用– 著名专家网页– 博硕论文– 近年来的学术会议论文

• 由点到面,由杂到精,由量到质• 好记性不如烂笔头• 善于总结,提出自己的想法

– 我应该如何利用该论文?– 真的像作者宣称的那样么?– 如果……会发生什么?