fault tollerant scheduling system for computational grid

22
Department of Computer Science DCS COMSATS Institute of Information Technology A Fault tolerant Scheduling System for Computational Grids Presented to: Dr.Babar Nazir Presented By: Ghulam Asfia 1

Upload: ghulam-asfia

Post on 18-Jul-2015

46 views

Category:

Internet


3 download

TRANSCRIPT

Department of Computer Science

DCS

COMSATS Institute of Information Technology

A Fault tolerant Scheduling System for Computational Grids

Presented to:

Dr.Babar Nazir

Presented By:

Ghulam Asfia

1

Department of Computer Science

Grid

2

Department of Computer Science

Points to be discussed…….

Introduction to grid computing

Scheduling

Genral Discussion

Problem Statement

Proposed Solution

Conclusion

Questions????

3

Department of Computer Science

Grid Computing

• What is Grid Computing?

• Grid computing is nothing but using the

resources of many computers from different

domains connected by a network to achieve a

common goal.

• Explanation

4

Department of Computer Science

Scheduling

Scheduling is a process of allocating jobs onto

available resources in time. Such process has to

respect constraints given by the jobs and the

Grid.

5

Department of Computer Science

Terms of Grid Scheduling

A task is an atomic unit to be scheduled by the scheduler and assigned to a resource.

A job (or metatask, or application) is a set of atomic tasks that will be carried out on a set of resources. Job can have a recursive structure, meaning that jobs are composed of sub-jobs and /or tasks, and sub-jobs can themselves be decomposed further into atomic tasks.

A resource is something that is required to carry out an operation, for example: a processor for data processing, a data storage device, or a network link for data transporting.

A site (or node) is an autonomous entity composed of one or multiple resources.

A task scheduling is the mapping of tasks to a selected group of resources which may be distributed in administrative domains. 6

Department of Computer Science

Problem description

• Users submit jobs with their QoS requirements. Grid scheduler schedules these jobs on the most suitable resources according to the resource response time and the fault index. The resource executes the job and the result is submitted to the user.

• Major drawback: while there are resources that fulfill the criterion of the response time, they have a tendency to fail. Also, fault index is not a suitable indicator for the resource failure history. This results in selecting resources that may have a higher tendency to fail.

7

Department of Computer Science

Flow Chart of Problem Statement

8

Department of Computer Science

Major Contribution….

• To introduce a fault-tolerant system with a scheduling strategy that depends on a new factor called scheduling indicator (SI). This indicator comprises of the response time and the fault rate of resources in the grid. The main idea behind the proposed system is to avoid resources that frequently fail.

• Compared with most recent scheduling system.

• Improved grid reliability

9

Department of Computer Science

Copmonents of Proposed System

Five main components:

• Grid portal.

• Scheduler.

• Resourse information server.

• Fault handler.

• Grid resources.

10

Department of Computer Science

Flow chart of proposed Solution

11

Department of Computer Science

Continued…..

Grid Portal: provides an interface for users to submit their bids for execution.

Scheduler: selects the optimal resources to execute the task.

The Resource Information Server (RIS):

Contains information on all the resources of the grid. The information may include computing speed, the available load, and memory.

Fault handler: responsible for detecting defects in resources and the estimated default rate resources.

12

Department of Computer Science

Architecture of Proposed system

13

Department of Computer Science

The scheduler’s operation

14

• The scheduler receives user jobs and its information from the grid portal. Job information includes job number, job type, and job size.

• Assigns each job to the most reliable, suitable, and available resource to execute the job. The most reliable resource is the resource that has a lower fault rate. This can be known from the history of the resource failures stored in the RIS. In this server,the fault rate of each resource in the grid is stored.

Department of Computer Science

Continued…..

• The fault ratePfj of resourcej is defined by:

• To achieve its purpose, the scheduler creates a SImatrix. Each entry in the matrix represents the scheduler indicator of each job for each suitable resource in the grid. Assuming there are m resources and n jobs,the SImatrix will be as follows:

15

Department of Computer Science

Continued…..

16

Department of Computer Science

Inside Scheduler

• The agent’s role

• Scheduler agent (SA)

• Job agent (JA)

• Result agent (RA)

• Fault Handler Agent (FHA)

17

Department of Computer Science

Scheduling Algorithm

18

Department of Computer Science

Result Agent Algorithm

19

Department of Computer Science

Fault Handler Algorithm

20

Department of Computer Science

Conclusion

• In this paper, a fault tolerant scheduling system for networks is proposed. The system performance is evaluated under different conditions with recent fault tolerant scheduling system which depends on the response time and the fault index.The parameters used for the evaluation are throughput, turnaround time, availability and the tendency of failure.

21

Department of Computer Science 22

Thank you

22