grid computing
TRANSCRIPT
Grid Service Reliability Modeling and Optimal Task
Scheduling Considering Fault Recovery
1
Abstract
• There has been quite some research on the development of tools and techniques for grid systems.
• Main issues are, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied.
• Some grid services having large subtasks requiring more time computation, so the reliability of grid service could be low.
2
Abstract
• To resolve this problems we are using local node fault recovery mechanism and ant colony optimization.
• LNFR helps us to recovery the faults that occur in the grid systems.
• ACO helps us to optimize the task scheduling that takes large time for computation.
3
CONTENTS
• Introduction• Problem statement• Existing System• Proposed System• Advantages of current system• Modules
4
INTRODUCTION
• Grid computing has emerged as the next-generation parallel and distributed computing methodology.
• Its goal is to provide a service-oriented infrastructure to enable easy access to resources.
• For solving various kinds of large-scale parallel applications in the wide area network.
• Nowadays, grid computing has been widely accepted, researched, and given attention to by researchers.
5
System Architecture
Resources Control Server
ACO scheduler and processor
Grid Node1 Grid Node2 Grid Node3
6
Problem statement
• In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant.
• Since the failure of resources affects job execution fatally.
• Fault tolerance service is essential to satisfy QoS requirement in grid computing.
7
Existing system
• Grid service reliability can be defined as the probability of subtasks involved in the considered service to be executed successfully.• The modeling and analysis of grid service reliability mainly concentrated on the resource management and sharing.•But it failed to maintain the fault recovery and task scheduling.
8
Existing System
• Grid services will perform long tasks that may require several days of computation.
• For some grid services which have large subtasks requiring time-consuming computation
• So the reliability of grid service could be low.
9
Proposed System
• The basic approach proposed by us on fault recovery in grid systems is a Remote Node Fault Recovery (RNFR) mechanism.
• i.e. when a failure occurs on a node, the state information can be migrated to another node.
• Failed subtask execution is resumed from the interrupted point.
(or)• Failed subtask can be dynamically rescheduled on
another node, and the node restarts the subtask from the beginning.
10
Proposed System
• It is very useful and effective for RNFR to recover grid tasks from failures.
• However, some complex tasks may require several days of computation.
• Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented.
• Ant colony optimization (ACO) algorithm is developed to solve it effectively.
11
MODULES
• The Application is split into below modules:
– USERS
– RESOURCE
– GRID
12
Users:
-- Connect to Grid
-- Send request to grid
-- Get response from Grid
Resource:-- Accept request from Grid
-- Process and send output to Grid
MODULES
13
• Grid: Maintain Users/NodesTake Request from UserSchedule task to ResourceAccept the output from ResourceValidate the output Reschedule the task (if output is invalid)Intimate the Grid for Fault GeneratedSend output to User (if valid)Handle Network Failure and intimate the
Grid Manager
MODULES
14
Process Model Used
• Process model used for the project Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery is WATER FALL MODEL.
15
Designs
UML• Use case Diagram• Activity Diagram• Interaction Diagram
Sequence DiagramCollaboration Diagram
• Class Diagram
16
USE-CASE DIAGRAM
17
connect to grid
resourse specification
send request to grid
user
forward request to resource
process and return output
apply validation on outputs
reshedule task
send potput to user
resource
grid
database
ACTIVITY DIAGRAMS
18
accept grid node information
connect to grid
view resourse of grid
select function and pass values send request to grid
display output
accept the output from grid
Activity diagram for a user/node:
ACTIVITY DIAGRAMS
19
start grid
resourse specification
accept user connection
obtain o/p from resource
verify and transact
accept request from user
forward request to resource
view faults registration
send o/p to user
register fault details
valid
exit
validate o/p from resourse
invalid
Activity diagram for the grid:
ACTIVITY DIAGRAMS
20
accept request from grid
process the request
send response to grir
start resource
exit
Activity diagram for the resource:
Interaction Diagrams
grid gsr : to resourse info
1 : set resource info requset()
2 : prompt for resource details()
3 : enter func name,specify resourse node()
4 : verify and save()
5 : return status()
6 : view resource list request()
7 : verify and fetch records()
8 : load resourse info()
Sequence diagram for grid:
21
Interaction Diagrams
userusersft : to
grid connection
1 : user connection request()
2 : prompts for grid details()
3 : enter grid system IP adress()4 : verify establish connection()
5 : return status of connectivity()
6 : request func call to grid()
7 : verify and fetch function()
8 : load function and pass values()
9 : verify and send request to grid()
10 : load o/p to called function()
Sequence diagram for grid connectivity:
22
Interaction Diagrams
grid gsr : to requests resourse fault info
1 : view user request()2 : verify & fetch request()
3 : load request()
4 : verify & process req()
5 : load resource()
6 : trace &forward req to resource()
7 : verify &process request()
8 : return process output()
9 : validate output()
10 : store validation status()
11 : register fault details()
12 : send responce to user()
13 : resedule request()
14 : return output for request()
15 : send output to user()
Sequence diagram for grid to handle user request:
23
Interaction Diagrams
grid
gsr : to resourse info
1 : set resource info requset()
2 : prompt for resource details()
3 : enter func name,specify resourse node()
4 : verify and save()
5 : return status()6 : view resource list request()
7 : verify and fetch records()
8 : load resourse info()
Collaboration diagram for grid:
24
Interaction Diagrams
user
usersft : togrid connection
1 : user connection request()
2 : prompts for grid details()3 : enter grid system IP adress()
4 : verify establish connection()
5 : return status of connectivity()
6 : request func call to grid()
7 : verify and fetch function()
8 : load function and pass values()
9 : verify and send request to grid()
10 : load o/p to called function()
Collaboration diagram for grid connectivity:
25
Interaction Diagrams
grid
gsr : torequests
resourse
fault info
1 : view user request()
2 : verify & fetch request()3 : load request()
4 : verify & process req()
5 : load resource()
6 : trace &forward req to resource()
7 : verify &process request()
8 : return process output()
9 : validate output()
10 : store validation status()
11 : register fault details()
12 : send responce to user()
13 : resedule request() 14 : return output for request()
15 : send output to user()
Collaboration diagram for grid to handle user request:
26
Class Diagram
26
Grid
+Port
+startserver()
Gthread
+port
+connectclient()+acceptreq()
User
+port+servername+uname+pwd
+connect()+sendrequest()+getresponce()
Rthread
+port
+acceptreqfromgrid()+processreq()+sendrestogrid()
Resource
+port
+stratresource()
Snapshots
28
Snapshots
29
Snapshots
30
Snapshots
31
Snapshots
32
Snapshots
33
Snapshots
34
Snapshots
35
Snapshots
36
Snapshots
37
Snapshots
38
Conclusion
• As present organisations require more reliability which can be achieved by the proposed system and are used in organisations and research areas.
39
References
• Foster, “The Grid: A new infrastructure for 21st century science,”Physics Today, vol. 55, no. 2, pp. 42–47, 2002.
• Y. S. Dai, M. Xie, and K. L. Poh, “Reliability of grid service systems,”Computers and Industrial Engineering, vol. 50, no. 1–2, pp. 130–147,2006.
• Y. S. Dai, Y. Pan, and X. K. Zou, “A hierarchical modeling and analysis for grid service reliability,” IEEE Trans. Computers, vol. 56, no. 5, pp. 681–691, 2007.
• T. Paul and X. Jie, “Fault tolerance within a grid environment,” in Proceedings of UK e-Science All Hands Meeting, 2003.
• Y. C. Liang and A. E. Smith, “An ant colony optimization algorithmfor the redundancy allocation problem (RAP),” IEEE Trans. Reliability,vol. 53, no. 3, pp. 417–423, 2004.
40