fault tollerant scheduling system for computational grid

Department of Computer Science

DCS

COMSATS Institute of Information Technology

A Fault tolerant Scheduling System for Computational Grids

Presented to:

Dr.Babar Nazir

Presented By:

Ghulam Asfia

1


Grid

2


Points to be discussed…….

Introduction to grid computing

Scheduling

Genral Discussion

Problem Statement

Proposed Solution

Conclusion

Questions????

3


Grid Computing

• What is Grid Computing?

• Grid computing is nothing but using the

resources of many computers from different

domains connected by a network to achieve a

common goal.

• Explanation

4


Scheduling

Scheduling is a process of allocating jobs onto

available resources in time. Such process has to

respect constraints given by the jobs and the

Grid.

5


Terms of Grid Scheduling

A task is an atomic unit to be scheduled by the scheduler and assigned to a resource.

A job (or metatask, or application) is a set of atomic tasks that will be carried out on a set of resources. Job can have a recursive structure, meaning that jobs are composed of sub-jobs and /or tasks, and sub-jobs can themselves be decomposed further into atomic tasks.

A resource is something that is required to carry out an operation, for example: a processor for data processing, a data storage device, or a network link for data transporting.

A site (or node) is an autonomous entity composed of one or multiple resources.

A task scheduling is the mapping of tasks to a selected group of resources which may be distributed in administrative domains. 6


Problem description

• Users submit jobs with their QoS requirements. Grid scheduler schedules these jobs on the most suitable resources according to the resource response time and the fault index. The resource executes the job and the result is submitted to the user.

• Major drawback: while there are resources that fulfill the criterion of the response time, they have a tendency to fail. Also, fault index is not a suitable indicator for the resource failure history. This results in selecting resources that may have a higher tendency to fail.

7


Flow Chart of Problem Statement

8


Major Contribution….

• To introduce a fault-tolerant system with a scheduling strategy that depends on a new factor called scheduling indicator (SI). This indicator comprises of the response time and the fault rate of resources in the grid. The main idea behind the proposed system is to avoid resources that frequently fail.

• Compared with most recent scheduling system.

• Improved grid reliability

9


Copmonents of Proposed System

Five main components:

• Grid portal.

• Scheduler.

• Resourse information server.

• Fault handler.

• Grid resources.

10


Flow chart of proposed Solution

11


Continued…..

Grid Portal: provides an interface for users to submit their bids for execution.

Scheduler: selects the optimal resources to execute the task.

The Resource Information Server (RIS):

Contains information on all the resources of the grid. The information may include computing speed, the available load, and memory.

Fault handler: responsible for detecting defects in resources and the estimated default rate resources.

12


Architecture of Proposed system

13


The scheduler’s operation

14

• The scheduler receives user jobs and its information from the grid portal. Job information includes job number, job type, and job size.

• Assigns each job to the most reliable, suitable, and available resource to execute the job. The most reliable resource is the resource that has a lower fault rate. This can be known from the history of the resource failures stored in the RIS. In this server,the fault rate of each resource in the grid is stored.


Continued…..

• The fault ratePfj of resourcej is defined by:

• To achieve its purpose, the scheduler creates a SImatrix. Each entry in the matrix represents the scheduler indicator of each job for each suitable resource in the grid. Assuming there are m resources and n jobs,the SImatrix will be as follows:

15


Continued…..

16


Inside Scheduler

• The agent’s role

• Scheduler agent (SA)

• Job agent (JA)

• Result agent (RA)

• Fault Handler Agent (FHA)

17


Scheduling Algorithm

18


Result Agent Algorithm

19


Fault Handler Algorithm

20


Conclusion

• In this paper, a fault tolerant scheduling system for networks is proposed. The system performance is evaluated under different conditions with recent fault tolerant scheduling system which depends on the response time and the fault index.The parameters used for the evaluation are throughput, turnaround time, availability and the tendency of failure.

21

Department of Computer Science 22

Thank you

22

fault tollerant scheduling system for computational grid

Internet

grid resources

grid scheduler

suitable resources

set of resources

grid portal

multiple resources

available resources

optimal resources