derivation of fault tolerance measures of self-stabilizing ...phoenix/docs/mue08.pdf · analysis of...
TRANSCRIPT
MotivationSimulation Framework
ResultsResults
Appendix
Derivation of Fault Tolerance Measures ofSelf-Stabilizing Algorithms by Simulation
Nils Mullner
February 11, 2009
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
MotivationAnalytical approach presented in ”Reliability and AvailabilityAnalysis of Self-Stabilizing Systems” (SSS06)
I Fault tolerance measures are of high interest for systemdesigners to choose the right system for a given faultenvironment.
I In [War06], Dhama, Warns and Theel presented an analyticmethod implemented in MatLab.
I Yet, assumptions about fault propagation and approximationhinder accuracy of measures obtained and
I the method exhibits conservative assumptions only (worstcase).
I In this work, we present a simulation framework that deliversaverage measures in an accurate, timely and extremelyscalable fashion.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
MotivationAnalytical approach presented in ”Reliability and AvailabilityAnalysis of Self-Stabilizing Systems” (SSS06)
I Fault tolerance measures are of high interest for systemdesigners to choose the right system for a given faultenvironment.
I In [War06], Dhama, Warns and Theel presented an analyticmethod implemented in MatLab.
I Yet, assumptions about fault propagation and approximationhinder accuracy of measures obtained and
I the method exhibits conservative assumptions only (worstcase).
I In this work, we present a simulation framework that deliversaverage measures in an accurate, timely and extremelyscalable fashion.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
MotivationAnalytical approach presented in ”Reliability and AvailabilityAnalysis of Self-Stabilizing Systems” (SSS06)
I Fault tolerance measures are of high interest for systemdesigners to choose the right system for a given faultenvironment.
I In [War06], Dhama, Warns and Theel presented an analyticmethod implemented in MatLab.
I Yet, assumptions about fault propagation and approximationhinder accuracy of measures obtained and
I the method exhibits conservative assumptions only (worstcase).
I In this work, we present a simulation framework that deliversaverage measures in an accurate, timely and extremelyscalable fashion.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
MotivationAnalytical approach presented in ”Reliability and AvailabilityAnalysis of Self-Stabilizing Systems” (SSS06)
I Fault tolerance measures are of high interest for systemdesigners to choose the right system for a given faultenvironment.
I In [War06], Dhama, Warns and Theel presented an analyticmethod implemented in MatLab.
I Yet, assumptions about fault propagation and approximationhinder accuracy of measures obtained and
I the method exhibits conservative assumptions only (worstcase).
I In this work, we present a simulation framework that deliversaverage measures in an accurate, timely and extremelyscalable fashion.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
MotivationAnalytical approach presented in ”Reliability and AvailabilityAnalysis of Self-Stabilizing Systems” (SSS06)
I Fault tolerance measures are of high interest for systemdesigners to choose the right system for a given faultenvironment.
I In [War06], Dhama, Warns and Theel presented an analyticmethod implemented in MatLab.
I Yet, assumptions about fault propagation and approximationhinder accuracy of measures obtained and
I the method exhibits conservative assumptions only (worstcase).
I In this work, we present a simulation framework that deliversaverage measures in an accurate, timely and extremelyscalable fashion.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 1/5
Simulation Framework called SiSSDA (Simulator forSelf-Stabilizing Distributed Algorithms)
I implemented in Erlang
I highly reliable (nine nines, χ2-test)I distributed (to cope with CPU demanding algorithms)I platform independent (even heterogeneous Windows/Linux
clusters)
I support for large scale models (about 1,000 nodes per GBRAM; analytic approach limited to seven nodes!)
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 1/5
Simulation Framework called SiSSDA (Simulator forSelf-Stabilizing Distributed Algorithms)
I implemented in ErlangI highly reliable (nine nines, χ2-test)
I distributed (to cope with CPU demanding algorithms)I platform independent (even heterogeneous Windows/Linux
clusters)
I support for large scale models (about 1,000 nodes per GBRAM; analytic approach limited to seven nodes!)
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 1/5
Simulation Framework called SiSSDA (Simulator forSelf-Stabilizing Distributed Algorithms)
I implemented in ErlangI highly reliable (nine nines, χ2-test)I distributed (to cope with CPU demanding algorithms)
I platform independent (even heterogeneous Windows/Linuxclusters)
I support for large scale models (about 1,000 nodes per GBRAM; analytic approach limited to seven nodes!)
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 1/5
Simulation Framework called SiSSDA (Simulator forSelf-Stabilizing Distributed Algorithms)
I implemented in ErlangI highly reliable (nine nines, χ2-test)I distributed (to cope with CPU demanding algorithms)I platform independent (even heterogeneous Windows/Linux
clusters)
I support for large scale models (about 1,000 nodes per GBRAM; analytic approach limited to seven nodes!)
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 1/5
Simulation Framework called SiSSDA (Simulator forSelf-Stabilizing Distributed Algorithms)
I implemented in ErlangI highly reliable (nine nines, χ2-test)I distributed (to cope with CPU demanding algorithms)I platform independent (even heterogeneous Windows/Linux
clusters)
I support for large scale models (about 1,000 nodes per GBRAM; analytic approach limited to seven nodes!)
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms provided
I Breadth First SearchI Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms provided
I Breadth First SearchI Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms provided
I Breadth First SearchI Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms providedI Breadth First Search
I Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms providedI Breadth First SearchI Depth First Search
I Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms providedI Breadth First SearchI Depth First SearchI Leader Election
I Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms providedI Breadth First SearchI Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 2/5
I monitoring facility (prints every nth step)
I runs until desired accuracy is reached (maximal acceptabledeviation within last n turns)
I four distributed self-stabilizing algorithms providedI Breadth First SearchI Depth First SearchI Leader ElectionI Mutual Exclusion
I easy to extend
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 3/5
I exact fault environments (specify distinct values for eachvertex and edge)
I dynamic fault environments
I dynamic execution semantics possible (number of nodesexecuting per step in parallel).
I external fault injection and monitoring facilities
I event logging (if needed)
I choice of schedulers (three provided)
I uses discrete time Markov chains as system model
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 4/5
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Simulation Framework 5/5
server
client
fault_injector
client_algorithm
client_algorithm_bfs
fault_injector_bfs
fault_injector_dfs
fault_injector_le
fault_injector_mutex
client_algorithm_dfs
client_algorithm_le
client_algorithm_mutex
matrix_init
matrix_init_bfs
matrix_init_dfs
matrix_init_le
matrix_init_mutex
server:start().
client:start().
fault_injector:start().
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Accuracy 1/2
41 1558 3075 4592 6109 7626 9143 106601217713694152111672818245197620.00
0.10
0.20
0.30
0.40
0.50
0.60
# of steps
Availab
ilit
y
20,00010,0000
This figure exemplifies availability for first 20, 000 steps of aneight-processor system. The desired accuracy is reached ifmaximum the deviation within last n steps is lower than a certainbarrier. The Results presented in the following feature about1, 000, 000 steps per system node.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Accuracy 2/2
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.170
10
20
30
40
50
60
70
80
90
100
Insufficiently Strict Accuracy GuardsSufficiently Strict Ac-curacy Guards
Error-Probability for each receiving node and each edge
Availa
bili
ty
Strictness of accuracy guards is crucial for reliability of results!
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Test Case: All Possible 4-node Graphs
1
2
3
4
5
67
8
9
10
11
We chose depth first search (DFS) and breadth first search (BFS)algorithms for comparison with the analytic approach, executed onall possible 4-node graphs.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.400
10
20
30
40
50
60
70
80
90
100
Breadth First Search - Simulation
Topology 1Topology 2Topology 3Topology 4Topology 5Topology 6Topology 7Topology 8Topology 9Topology 10Topology 11
Global Node Error Probabilty
Lim
itin
g A
vaila
bili
ty
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.400
10
20
30
40
50
60
70
80
90
100
Breadth First Search - Analysis
Topology 1Topology 2Topology 3Topology 4Topology 5Topology 6Topology 7Topology 8Topology 9Topology 10Topology 11
Global Node Error Probability
Lim
itin
g A
vaila
bili
ty
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.400
10
20
30
40
50
60
70
80
90
100
Depth First Search - Simulation
Topology 1Topology 2Topology 3Topology 4Topology 5Topology 6Topology 7Topology 8Topology 9Topology 10Topology 11
Global Node Error Probability
Lim
itin
g A
vaila
bili
ty
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.400
10
20
30
40
50
60
70
80
90
100
Depth First Search - Analysis
Topology 1Topology 2Topology 3Topology 4Topology 5Topology 6Topology 7Topology 8Topology 9Topology 10Topology 11
Global Node Error Probability
Lim
itin
g A
vailab
ilit
y
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Conclusions
Derivation of fault tolerance measures by simulation
I reason: analytic method is insufficient
I method: simulation of self-stabilizing distributed algorithms
I features: modular design, scalability, performance, reliabilityof results
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Conclusions
Derivation of fault tolerance measures by simulation
I reason: analytic method is insufficient
I method: simulation of self-stabilizing distributed algorithms
I features: modular design, scalability, performance, reliabilityof results
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Conclusions
Derivation of fault tolerance measures by simulation
I reason: analytic method is insufficient
I method: simulation of self-stabilizing distributed algorithms
I features: modular design, scalability, performance, reliabilityof results
Nils Mullner Derivation of Fault Tolerance Measures by Simulation
MotivationSimulation Framework
ResultsResults
Appendix
Shlomi Dolev.Self-Stabilization.MIT Press, March 2000.
Marco Schneider.Self-stabilization.ACM Comput. Surv., 25(1):45–67, 1993.
Kishor Shridharbhai Trivedi.Probability and Statistics with Reliability, Queuing andComputer Science Applications.Prentice Hall PTR, Upper Saddle River, NJ, USA, 1982.
Abhishek Dhama; Oliver Theel; Timo Warns.Reliability and Availability Analysis of Self-Stabilizing Systems.In 8th International Symposium on Stabilization, Safety, andSecurity of Distributed Systems, page 17. Springer, 2006.
Nils Mullner Derivation of Fault Tolerance Measures by Simulation