grid computing

40
Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery 1

Upload: naresh-v-naresh

Post on 08-Nov-2014

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid Computing

Grid Service Reliability Modeling and Optimal Task

Scheduling Considering Fault Recovery

1

Page 2: Grid Computing

Abstract

• There has been quite some research on the development of tools and techniques for grid systems.

• Main issues are, e.g., grid service reliability and task scheduling in the grid, have not been sufficiently studied.

• Some grid services having large subtasks requiring more time computation, so the reliability of grid service could be low.

2

Page 3: Grid Computing

Abstract

• To resolve this problems we are using local node fault recovery mechanism and ant colony optimization.

• LNFR helps us to recovery the faults that occur in the grid systems.

• ACO helps us to optimize the task scheduling that takes large time for computation.

3

Page 4: Grid Computing

CONTENTS

• Introduction• Problem statement• Existing System• Proposed System• Advantages of current system• Modules

4

Page 5: Grid Computing

INTRODUCTION

• Grid computing has emerged as the next-generation parallel and distributed computing methodology.

• Its goal is to provide a service-oriented infrastructure to enable easy access to resources.

• For solving various kinds of large-scale parallel applications in the wide area network.

• Nowadays, grid computing has been widely accepted, researched, and given attention to by researchers.

5

Page 6: Grid Computing

System Architecture

Resources Control Server

ACO scheduler and processor

Grid Node1 Grid Node2 Grid Node3

6

Page 7: Grid Computing

Problem statement

• In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant.

• Since the failure of resources affects job execution fatally.

• Fault tolerance service is essential to satisfy QoS requirement in grid computing.

7

Page 8: Grid Computing

Existing system

• Grid service reliability can be defined as the probability of subtasks involved in the considered service to be executed successfully.• The modeling and analysis of grid service reliability mainly concentrated on the resource management and sharing.•But it failed to maintain the fault recovery and task scheduling.

8

Page 9: Grid Computing

Existing System

• Grid services will perform long tasks that may require several days of computation.

• For some grid services which have large subtasks requiring time-consuming computation

• So the reliability of grid service could be low.

9

Page 10: Grid Computing

Proposed System

• The basic approach proposed by us on fault recovery in grid systems is a Remote Node Fault Recovery (RNFR) mechanism.

• i.e. when a failure occurs on a node, the state information can be migrated to another node.

• Failed subtask execution is resumed from the interrupted point.

(or)• Failed subtask can be dynamically rescheduled on

another node, and the node restarts the subtask from the beginning.

10

Page 11: Grid Computing

Proposed System

• It is very useful and effective for RNFR to recover grid tasks from failures.

• However, some complex tasks may require several days of computation.

• Based on the proposed grid service reliability model, a multi-objective task scheduling optimization model is presented.

• Ant colony optimization (ACO) algorithm is developed to solve it effectively.

11

Page 12: Grid Computing

MODULES

• The Application is split into below modules:

– USERS

– RESOURCE

– GRID

12

Page 13: Grid Computing

Users:

-- Connect to Grid

-- Send request to grid

-- Get response from Grid

 

Resource:-- Accept request from Grid

-- Process and send output to Grid

MODULES

13

Page 14: Grid Computing

• Grid: Maintain Users/NodesTake Request from UserSchedule task to ResourceAccept the output from ResourceValidate the output Reschedule the task (if output is invalid)Intimate the Grid for Fault GeneratedSend output to User (if valid)Handle Network Failure and intimate the

Grid Manager

MODULES

14

Page 15: Grid Computing

Process Model Used

• Process model used for the project Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery is WATER FALL MODEL.

15

Page 16: Grid Computing

Designs

UML• Use case Diagram• Activity Diagram• Interaction Diagram

Sequence DiagramCollaboration Diagram

• Class Diagram

16

Page 17: Grid Computing

USE-CASE DIAGRAM

17

connect to grid

resourse specification

send request to grid

user

forward request to resource

process and return output

apply validation on outputs

reshedule task

send potput to user

resource

grid

database

Page 18: Grid Computing

ACTIVITY DIAGRAMS

18

accept grid node information

connect to grid

view resourse of grid

select function and pass values send request to grid

display output

accept the output from grid

Activity diagram for a user/node:

Page 19: Grid Computing

ACTIVITY DIAGRAMS

19

start grid

resourse specification

accept user connection

obtain o/p from resource

verify and transact

accept request from user

forward request to resource

view faults registration

send o/p to user

register fault details

valid

exit

validate o/p from resourse

invalid

Activity diagram for the grid:

Page 20: Grid Computing

ACTIVITY DIAGRAMS

20

accept request from grid

process the request

send response to grir

start resource

exit

Activity diagram for the resource:

Page 21: Grid Computing

Interaction Diagrams

grid gsr : to resourse info

1 : set resource info requset()

2 : prompt for resource details()

3 : enter func name,specify resourse node()

4 : verify and save()

5 : return status()

6 : view resource list request()

7 : verify and fetch records()

8 : load resourse info()

Sequence diagram for grid:

21

Page 22: Grid Computing

Interaction Diagrams

userusersft : to

grid connection

1 : user connection request()

2 : prompts for grid details()

3 : enter grid system IP adress()4 : verify establish connection()

5 : return status of connectivity()

6 : request func call to grid()

7 : verify and fetch function()

8 : load function and pass values()

9 : verify and send request to grid()

10 : load o/p to called function()

Sequence diagram for grid connectivity:

22

Page 23: Grid Computing

Interaction Diagrams

grid gsr : to requests resourse fault info

1 : view user request()2 : verify & fetch request()

3 : load request()

4 : verify & process req()

5 : load resource()

6 : trace &forward req to resource()

7 : verify &process request()

8 : return process output()

9 : validate output()

10 : store validation status()

11 : register fault details()

12 : send responce to user()

13 : resedule request()

14 : return output for request()

15 : send output to user()

Sequence diagram for grid to handle user request:

23

Page 24: Grid Computing

Interaction Diagrams

grid

gsr : to resourse info

1 : set resource info requset()

2 : prompt for resource details()

3 : enter func name,specify resourse node()

4 : verify and save()

5 : return status()6 : view resource list request()

7 : verify and fetch records()

8 : load resourse info()

Collaboration diagram for grid:

24

Page 25: Grid Computing

Interaction Diagrams

user

usersft : togrid connection

1 : user connection request()

2 : prompts for grid details()3 : enter grid system IP adress()

4 : verify establish connection()

5 : return status of connectivity()

6 : request func call to grid()

7 : verify and fetch function()

8 : load function and pass values()

9 : verify and send request to grid()

10 : load o/p to called function()

Collaboration diagram for grid connectivity:

25

Page 26: Grid Computing

Interaction Diagrams

grid

gsr : torequests

resourse

fault info

1 : view user request()

2 : verify & fetch request()3 : load request()

4 : verify & process req()

5 : load resource()

6 : trace &forward req to resource()

7 : verify &process request()

8 : return process output()

9 : validate output()

10 : store validation status()

11 : register fault details()

12 : send responce to user()

13 : resedule request() 14 : return output for request()

15 : send output to user()

Collaboration diagram for grid to handle user request:

26

Page 27: Grid Computing

Class Diagram

26

Grid

+Port

+startserver()

Gthread

+port

+connectclient()+acceptreq()

User

+port+servername+uname+pwd

+connect()+sendrequest()+getresponce()

Rthread

+port

+acceptreqfromgrid()+processreq()+sendrestogrid()

Resource

+port

+stratresource()

Page 28: Grid Computing

Snapshots

28

Page 29: Grid Computing

Snapshots

29

Page 30: Grid Computing

Snapshots

30

Page 31: Grid Computing

Snapshots

31

Page 32: Grid Computing

Snapshots

32

Page 33: Grid Computing

Snapshots

33

Page 34: Grid Computing

Snapshots

34

Page 35: Grid Computing

Snapshots

35

Page 36: Grid Computing

Snapshots

36

Page 37: Grid Computing

Snapshots

37

Page 38: Grid Computing

Snapshots

38

Page 39: Grid Computing

Conclusion

• As present organisations require more reliability which can be achieved by the proposed system and are used in organisations and research areas.

39

Page 40: Grid Computing

References

• Foster, “The Grid: A new infrastructure for 21st century science,”Physics Today, vol. 55, no. 2, pp. 42–47, 2002.

• Y. S. Dai, M. Xie, and K. L. Poh, “Reliability of grid service systems,”Computers and Industrial Engineering, vol. 50, no. 1–2, pp. 130–147,2006.

• Y. S. Dai, Y. Pan, and X. K. Zou, “A hierarchical modeling and analysis for grid service reliability,” IEEE Trans. Computers, vol. 56, no. 5, pp. 681–691, 2007.

• T. Paul and X. Jie, “Fault tolerance within a grid environment,” in Proceedings of UK e-Science All Hands Meeting, 2003.

• Y. C. Liang and A. E. Smith, “An ant colony optimization algorithmfor the redundancy allocation problem (RAP),” IEEE Trans. Reliability,vol. 53, no. 3, pp. 417–423, 2004.

40