1 making services fault tolerant pat chan department of computer science and engineering the chinese...

25
1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

1

Making Services Fault Tolerant

Pat ChanDepartment of Computer Science and EngineeringThe Chinese University of Hong Kong

2nd June 2006

Page 2: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

2

Outline

Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion Future Directions

Page 3: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

3

Introduction

Service-oriented computing is becoming a reality.

Service-oriented Architectures (SOA) are based on a simple model of roles.

The problems of service dependability, security and timeliness are becoming critical.

We propose experimental settings and offer a roadmap to dependable Web services.

Page 4: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

4

Problem Statement

Fault-tolerant techniques Replication Diversity

Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults.

Another efficient technique is design diversity. By independently designing software systems or services with different

programming teams, Resort in defending against permanent software design faults.

We focus on the analysis of the replication techniques when applied to Web services.

A generic Web service system with spatial as well as temporal replication is proposed and investigated.

Page 5: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

5

Methodologies for reliable Web services -- Redundancy Spatial redundancy

Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result.

Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state.

Temporal redundancy Redundant in time

Page 6: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

6

Methodologies for reliable Web services -- Diversity Protect redundant systems against common-

mode failures With different designs and implementations,

common failure modes will probably cause different error effects.

N-version programming, recovery blocks…

Page 7: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

7

Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration

Page 8: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

8

Fault Confinement

Fault Detection Fault Detection

Failover Diagnosis

Online Offline

Reconfiguration

Recovery

Restart

Repair

Reintegration

Page 9: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

9

Replication Manager

Web service selection algorithm

WatchDog

UDDI

Registry

WSDL

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Web ServiceIIS

Application

Database

Client

Port

Application

Database

1. Create web services

2. Select primary web service (PWS)

3. Register

4. Look up

5. Get WSDL

6. Invoke web service

7. Keep check the availability of the PWS

8. If PWS failed, reselect the PWS.

9. Update the WSDL

Proposed Paradigm

Page 10: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

10

RM sends message to the Web Service

Reselect a primary Web Service

Do not get reply

Map the new address to the WSDL

System Fail

Get reply

All Service failed

Work Flow of the Replication Manager

Page 11: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

11

Road Map for Experiment Research Redundancy in time Redundancy in space

Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services

Page 12: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

12

Experiments

A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication, single service with retry or reboot and, service with spatial replication.

We will also perform retry or failover when the Web service is down.

Page 13: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

13

Summary of the experiments

  None Retry/Reboot

Failover Both (hybrid)

Single service, no retry

0 -- -- --

Single service with retry

-- 1 -- --

Single service with reboot

-- 2 -- --

Spatial replication

-- -- 3 4

Page 14: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

14

Parameters of the ExperimentsParameters Current setting/metric

Request frequency 1 req/min

Polling frequency 5 ms

Number of replicas 5

Client timeout period for retry 10 s

Failure rate λ # failures/hour

Load (profile of the program) % or load function

Reboot time 10 min

Failover time 1 s

Page 15: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

15

Experimental ResultsExperiments over 360 hour periods (43200 reqs)

Number of failures Normal

Number of failuresServer busy

Number of failuresServer reboots periodically

Exp 0 4928 6130 6492

Exp 1 2210 2327 2658

Exp 2 2561 3160 3323

Exp 3 1324 1711 1658

Exp 4 1089 1148 1325

Retry11.97% to 4.93%

Reboot11.97% to 6.44%

Failover11.97% to 3.56%

Retry and Failover11.97% to 2.59%

Page 16: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

16

Number of failure when the server is is normal situation

Page 17: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

17

Number of failure when the server is busy

Page 18: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

18

Number of failure when the server reboots periodically

Page 19: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

19

Reliability of the system over time

0

( ) ( )lim 0.025t

F t t F t

t

( )( ) t tR t e

Page 20: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

20

Reliability Model

Page 21: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

Reliability Model Parameters

ID Description Value

λn Network failure rate 0.02

λ* Web service failure rate 0.228

λ1 Resource problem rate 0.142

λ2 Entry point failure rate 0.150

μ* Web service repair rate 0.286

μ1 Resource problem repair rate 0.979

μ2 Entry point failure repair rate 0.979

C1 Probability that the RM responds on time 0.9

C2 Probability that the server reboots successfully 0.9

Page 22: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

22

Outcome (SHARPE)

Failure Rate0.2280.1140.057

Reliability of the proposed system

Page 23: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

23

Conclusion

Surveyed replication and design diversity techniques for reliable services.

Proposed a hybrid approach to improving the availability of Web services.

Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system.

N-Version Programming may finally become commercially viable in service environment.

Page 24: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

24

Future Directions

Make experiments comprehensive: range of service arrival rates or distributions, failure rates, polling frequency, etc.

Improve the current fault-tolerant techniques Current approach can deal with hardware and software

failures. How about software fault detectors?

N-version programming Different providers provide different solutions. There is a problem in failover or switch between the Web

Services.

Page 25: 1 Making Services Fault Tolerant Pat Chan Department of Computer Science and Engineering The Chinese University of Hong Kong 2 nd June 2006

25

Future Directions

Modeling the Web Service behavior The behavior of the We Services can be studied.

Modeling the failure The failure model of Web Service is different from

traditional software.