Time and Prediction
Based Software Rejuvenation
By,Rajeev N.B (1RV08IS036)
Shah Smit (1RV08IS044)
Abhishek G (1RV08IS002)
Table of Contents• Abstract
• Introduction
• Problem statement and objectives
• Design
• Implementation
• Testing
• Results and discussion
• Conclusion
• References
AbstractLoopholes in existing systems :
• Present distributed systems are reactive.
• They can detect failure and take action based on that but are unable to do before hand.
Solution Proposed
• Using Time and Prediction to determine any failures and rejuvenating them before they crash the system.
• Thus, using a proactive method to take action.
Work Carried● We do detection based on some Pre-determined important parameters.
● Buffer, Cache, CPU load, Memory and Number of processes.
● We detect failures using Time based and prediction based techniques.
Abstract
Outcome of the work :
● The nodes which are about to failure are detected and rejuvenated.
● Thus, a distributed system which is less prone to crashing.
● Best part the reactive based measures can still be used just in case.
Introduction
Software Rejuvenation:
• Software rejuvenation is a proactive fault management technique aimed at cleaning up the system internal state to prevent the occurrence of more severe crash failures in the future.
Time Based Prediction:
• At certain intervals we check if the key parameters are not above the safety threshold levels.
Purely Prediction Based:
• Using a mathematical model we try to predict if a node is about to fail on basis of the key parameters.
Existing solutions:
• Distributed systems are reactive and they act after the failure has occur.
• Rejuvenation techniques are not used by these systems as well.
Introduction
Advancement Proposed:
• Proactive techniques should be used in tandem with current reactive solutions.
• Using Time and Prediction based Rejuvenation can reduce system downtime considerably by rejuvenating nodes which are about to crash.
Problem Statement with Objectives
Problem Statement :
• Performance degrades In distributed application running for a long time.
• They are susceptible to crash because of data corruption, numerical error accumulation and availability of OS resources.
• Thus, Leading to downtime and non-optimal performance.
Objectives :
• To Simulate using two Software rejuvenation approaches
Time based and Prediction based.
● To effectively detect and rejuvenate failing nodes in a system using TPSRP.
Design• Level 0
Design
● Level 1
Design
● Level 2
Design
Implementation
Petri net ● Petri Net is one of several modeling languages for the
description of distributed systems.
● Like UML, EPCs etc
● A Petri net consists of places, transitions, and arcs. Arcs run from a place to a transition or vice versa, never between places or between transitions.
● We use Petri net for our simulation.
Methodology● Simulated a proactive based appraoch for software rejuvenation
● Petri Net graphs were used for representing the nodes in the distributed systems.
● Five key parameters ( Cache ,Memory, Buffer, CPU Load, Processes ) were
considered for simulation.
● Dijkstra’s algorithm is used to traverse through all the nodes and maintain a state of
transition among nodes in the graph.
● Based on the values of the key parameters in the nodes, the simulator decides the
failing nodes in the graph .
● Results of the Simulation which depicts the key parameters of the nodes before and
after rejuvenation using Time and Prediction techniques is written to a
CSV(Comma separated value) file.
Lessons from ImplementationDouble Buffering
● While Making a simulator, it important to make sure it has no flicker, tearing or other artifacts.
● But its difficult to draw a display where pixels don't change more than once, for that we use Double buffering
Handling Multiple Events● Our simulator consists of multiple paint events and time
events, the key challenge was to maintain a synchronization between them.
● Thus to get accurate results we slowed down the transmissions so timer events can work properly.
Implementation
Parameters on Nodes with Threshold Values :
● Cache : 450
● Memory : 7500
● Buffer : 950
● CPU Load : 95
● # of Processes : 45
Implementation
● Time based Rejuvenation Policy
The nodes are inspected for various conditions of the parameters after a certain interval of time.
The nodes to be rejuvenated are decided based on the nature of parameters after the timer expiry.
Implementation
● Prediction Based Rejuvenation Policy :
Prediction of the failing nodes in a system is done on two aspects of the parameters :
1. If the parameter states are very close to the Threshold values of the parameters.
2. If the rate of change in any of the parameter values is very high or is increasing at a exponential rate.
Testing
● Testing coverage :
Unit Testing.
Integration Testing.
System testing.
Unit Testing
5 key modules of our project were tested.
● Dijkstra’s algorithm.
● Time based.
● Prediction based system.
● Timer event for monitoring.
● UI Module.
Integration Testing
● It represents various modules integrated and tested.
● We integrated numerous modules and tested them.
● Initialization of graph.
● Simulation and Rejuvenation.
● Drawing Your Own petri-net graphs.
Snapshots
Snapshots
Results
● The system detects the failing nodes and also rejuvenate them to safer values using TPSRP.
ID Buffer Cache CPU Load
Memory Processes
Before (5)
82 183 12 4346 48
After (5) 82 183 12 4346 5
Before (14)
361 468 84 3692 27
After (14)
361 50 10 3692 27
Conclusion
1 : We presented a new TPSRP to improve software
availability and also to detect failing nodes with higher
probability.
2 : Numerical analysis shows time and prediction based
policy not only outweighs the purely time-based strategy or
purely prediction-based strategy, but also can be easily
applied to a practical system.
References
[1] J.Gray, D.P.Siewiorek, “High-Availability Computer systems”, IEEE
Computer, Vol.24, Issue 9, 1991, pp39-48.
[2] K.Vaidyanathan, K.S.Trivedi, “Extended Classification of Software
Faults Based on Aging”, In 12th International Symposium on Software
Reliability Engineering (ISSRE 2001), Hong Kong, November 2001.
Page 99.
[3] Y.Huang, C.Kintala, N. Kolettis, and N.D.Fulton, “Software
Rejuvenation: Analysis, Module and Applications”, Proc. 25th IEEE Int’l
Symp. On Fault Tolerant Computing, IEEE Computer Society Pree, Los
Alamitos, CA, 1995, pp.381-390.