1 making services fault tolerant pat chan department of computer science and engineering the chinese...
Post on 18-Dec-2015
216 views
TRANSCRIPT
1
Making Services Fault Tolerant
Pat ChanDepartment of Computer Science and EngineeringThe Chinese University of Hong Kong
2nd June 2006
2
Outline
Introduction Problem Statement Methodologies for Web Service Reliability New Reliable Web Service Paradigm Road Map for Experiment Experimental Results and Discussion Conclusion Future Directions
3
Introduction
Service-oriented computing is becoming a reality.
Service-oriented Architectures (SOA) are based on a simple model of roles.
The problems of service dependability, security and timeliness are becoming critical.
We propose experimental settings and offer a roadmap to dependable Web services.
4
Problem Statement
Fault-tolerant techniques Replication Diversity
Replication is one of the efficient ways for providing reliable systems by time or space redundancy. Increasing the availability of distributed systems Key components are re-executed or replicated Protect against hardware malfunctions or transient system faults.
Another efficient technique is design diversity. By independently designing software systems or services with different
programming teams, Resort in defending against permanent software design faults.
We focus on the analysis of the replication techniques when applied to Web services.
A generic Web service system with spatial as well as temporal replication is proposed and investigated.
5
Methodologies for reliable Web services -- Redundancy Spatial redundancy
Static redundancy, all replicas are active at the same time and voting takes place to obtain a correct result.
Dynamic redundancy engages one active replica at one time while others are kept in an active or in standby state.
Temporal redundancy Redundant in time
6
Methodologies for reliable Web services -- Diversity Protect redundant systems against common-
mode failures With different designs and implementations,
common failure modes will probably cause different error effects.
N-version programming, recovery blocks…
7
Failure Response Stages of Web Services Fault confinement Fault detection Diagnosis Fail-over Reconfiguration Recovery Restart Repair Reintegration
8
Fault Confinement
Fault Detection Fault Detection
Failover Diagnosis
Online Offline
Reconfiguration
Recovery
Restart
Repair
Reintegration
9
Replication Manager
Web service selection algorithm
WatchDog
UDDI
Registry
WSDL
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Web ServiceIIS
Application
Database
Client
Port
Application
Database
1. Create web services
2. Select primary web service (PWS)
3. Register
4. Look up
5. Get WSDL
6. Invoke web service
7. Keep check the availability of the PWS
8. If PWS failed, reselect the PWS.
9. Update the WSDL
Proposed Paradigm
10
RM sends message to the Web Service
Reselect a primary Web Service
Do not get reply
Map the new address to the WSDL
System Fail
Get reply
All Service failed
Work Flow of the Replication Manager
11
Road Map for Experiment Research Redundancy in time Redundancy in space
Sequentially Parallel Majority voting using N modular redundancy Diversified version of different services
12
Experiments
A series of experiments are designed and performed for evaluating the reliability of the Web service, single service without replication, single service with retry or reboot and, service with spatial replication.
We will also perform retry or failover when the Web service is down.
13
Summary of the experiments
None Retry/Reboot
Failover Both (hybrid)
Single service, no retry
0 -- -- --
Single service with retry
-- 1 -- --
Single service with reboot
-- 2 -- --
Spatial replication
-- -- 3 4
14
Parameters of the ExperimentsParameters Current setting/metric
Request frequency 1 req/min
Polling frequency 5 ms
Number of replicas 5
Client timeout period for retry 10 s
Failure rate λ # failures/hour
Load (profile of the program) % or load function
Reboot time 10 min
Failover time 1 s
15
Experimental ResultsExperiments over 360 hour periods (43200 reqs)
Number of failures Normal
Number of failuresServer busy
Number of failuresServer reboots periodically
Exp 0 4928 6130 6492
Exp 1 2210 2327 2658
Exp 2 2561 3160 3323
Exp 3 1324 1711 1658
Exp 4 1089 1148 1325
Retry11.97% to 4.93%
Reboot11.97% to 6.44%
Failover11.97% to 3.56%
Retry and Failover11.97% to 2.59%
16
Number of failure when the server is is normal situation
17
Number of failure when the server is busy
18
Number of failure when the server reboots periodically
19
Reliability of the system over time
0
( ) ( )lim 0.025t
F t t F t
t
( )( ) t tR t e
20
Reliability Model
Reliability Model Parameters
ID Description Value
λn Network failure rate 0.02
λ* Web service failure rate 0.228
λ1 Resource problem rate 0.142
λ2 Entry point failure rate 0.150
μ* Web service repair rate 0.286
μ1 Resource problem repair rate 0.979
μ2 Entry point failure repair rate 0.979
C1 Probability that the RM responds on time 0.9
C2 Probability that the server reboots successfully 0.9
22
Outcome (SHARPE)
Failure Rate0.2280.1140.057
Reliability of the proposed system
23
Conclusion
Surveyed replication and design diversity techniques for reliable services.
Proposed a hybrid approach to improving the availability of Web services.
Carried out a series of experiments to evaluate the availability and reliability of the proposed Web service system.
N-Version Programming may finally become commercially viable in service environment.
24
Future Directions
Make experiments comprehensive: range of service arrival rates or distributions, failure rates, polling frequency, etc.
Improve the current fault-tolerant techniques Current approach can deal with hardware and software
failures. How about software fault detectors?
N-version programming Different providers provide different solutions. There is a problem in failover or switch between the Web
Services.
25
Future Directions
Modeling the Web Service behavior The behavior of the We Services can be studied.
Modeling the failure The failure model of Web Service is different from
traditional software.