fault tollerant scheduling system for computational grid
TRANSCRIPT
Department of Computer Science
DCS
COMSATS Institute of Information Technology
A Fault tolerant Scheduling System for Computational Grids
Presented to:
Dr.Babar Nazir
Presented By:
Ghulam Asfia
1
Department of Computer Science
Points to be discussed…….
Introduction to grid computing
Scheduling
Genral Discussion
Problem Statement
Proposed Solution
Conclusion
Questions????
3
Department of Computer Science
Grid Computing
• What is Grid Computing?
• Grid computing is nothing but using the
resources of many computers from different
domains connected by a network to achieve a
common goal.
• Explanation
4
Department of Computer Science
Scheduling
Scheduling is a process of allocating jobs onto
available resources in time. Such process has to
respect constraints given by the jobs and the
Grid.
5
Department of Computer Science
Terms of Grid Scheduling
A task is an atomic unit to be scheduled by the scheduler and assigned to a resource.
A job (or metatask, or application) is a set of atomic tasks that will be carried out on a set of resources. Job can have a recursive structure, meaning that jobs are composed of sub-jobs and /or tasks, and sub-jobs can themselves be decomposed further into atomic tasks.
A resource is something that is required to carry out an operation, for example: a processor for data processing, a data storage device, or a network link for data transporting.
A site (or node) is an autonomous entity composed of one or multiple resources.
A task scheduling is the mapping of tasks to a selected group of resources which may be distributed in administrative domains. 6
Department of Computer Science
Problem description
• Users submit jobs with their QoS requirements. Grid scheduler schedules these jobs on the most suitable resources according to the resource response time and the fault index. The resource executes the job and the result is submitted to the user.
• Major drawback: while there are resources that fulfill the criterion of the response time, they have a tendency to fail. Also, fault index is not a suitable indicator for the resource failure history. This results in selecting resources that may have a higher tendency to fail.
7
Department of Computer Science
Major Contribution….
• To introduce a fault-tolerant system with a scheduling strategy that depends on a new factor called scheduling indicator (SI). This indicator comprises of the response time and the fault rate of resources in the grid. The main idea behind the proposed system is to avoid resources that frequently fail.
• Compared with most recent scheduling system.
• Improved grid reliability
9
Department of Computer Science
Copmonents of Proposed System
Five main components:
• Grid portal.
• Scheduler.
• Resourse information server.
• Fault handler.
• Grid resources.
10
Department of Computer Science
Continued…..
Grid Portal: provides an interface for users to submit their bids for execution.
Scheduler: selects the optimal resources to execute the task.
The Resource Information Server (RIS):
Contains information on all the resources of the grid. The information may include computing speed, the available load, and memory.
Fault handler: responsible for detecting defects in resources and the estimated default rate resources.
12
Department of Computer Science
The scheduler’s operation
14
• The scheduler receives user jobs and its information from the grid portal. Job information includes job number, job type, and job size.
• Assigns each job to the most reliable, suitable, and available resource to execute the job. The most reliable resource is the resource that has a lower fault rate. This can be known from the history of the resource failures stored in the RIS. In this server,the fault rate of each resource in the grid is stored.
Department of Computer Science
Continued…..
• The fault ratePfj of resourcej is defined by:
• To achieve its purpose, the scheduler creates a SImatrix. Each entry in the matrix represents the scheduler indicator of each job for each suitable resource in the grid. Assuming there are m resources and n jobs,the SImatrix will be as follows:
15
Department of Computer Science
Inside Scheduler
• The agent’s role
• Scheduler agent (SA)
• Job agent (JA)
• Result agent (RA)
• Fault Handler Agent (FHA)
17
Department of Computer Science
Conclusion
• In this paper, a fault tolerant scheduling system for networks is proposed. The system performance is evaluated under different conditions with recent fault tolerant scheduling system which depends on the response time and the fault index.The parameters used for the evaluation are throughput, turnaround time, availability and the tendency of failure.
21