s-cube lp: self-healing in mixed service-oriented systems
Post on 12-Jan-2015
456 Views
Preview:
DESCRIPTION
TRANSCRIPT
www.s-cube-network.eu
S-Cube Learning Package
Self-* infrastructures:
Self-healing in Mixed Service-oriented Systems
TU Wien (TUW)
Harald Psaier, TUW
© Harald Psaier
Learning Package Categorization
S-Cube
Self-* Service Infrastructure
and Discovery Support
Self-* Service Infrastructure
Self-healing in SOA
Learning Package Overview
Problem Description
Self-healing research
Example: Self-healing policies for Mixed Service-oriented
Systems
Conclusions
© Harald Psaier
Mixed Service-oriented Systems
Open dynamic service environment to humans and services
– distributed coordination and communication
– no predefined top-down- but flexible compositions
Interactions are ad-hoc and dynamic and usually in
boundaries of an activity
Mixed System (MS) include a mixed collaboration between two main and distinct types of services:
Human-Provided Services (HPS)
– Human provide knowledge/skills/expertise as services
– Close gab between required human expertise and difficulty of implementation as software
Software-Based Services (SBS)
© Harald Psaier
Examples of mixed systems
Review services: Include shared reviewing activities arround
documents, code, and evaluations
Innovation services: foster various ideas for a new product
design
Support services: provide solutions for questions and
problems on multiple or selected subjects
Current platforms with massive use of MSs: crowdsourcing
platforms. These include, e.g., Amazon’s Mechanical Turk,
Yahoo answers, uTest.
© Harald Psaier
Let’s Consider a Scenario (1)
Humans and services interact to perform work described by
the activities in the process model.
© Harald Psaier
Service
Registry
Process Model
inv
oke
human service
activity scopes
Let’s Consider a Scenario (2)
One of the services fails to complete an assigned activity.
In a loop self-healing monitors, recognizes and adapts to the
incident © Harald Psaier
Process Model
Deployment with
Dependency
Management
Run-Time Environment
Monitoring
in
vo
ke
X
Adaptation
Self-healing
Policies
Let’s Consider a Scenario (3)
The reaction is controlled by policies connected to the
process activities
The challenge of the autonomous system is in particular the
complexity of MSs (c.f., dynamicity of MSs).
The goal of Self-* properties is to support administration in
system management.
In particular the tasks of self-healing in MS include:
– Avoid errors in design
– Avoid errors in configuration
– Replace failing services at runtime
– Handle adaptation complexity transparently to keep system healthy
– Support need of service maintenance
© Harald Psaier
Learning Package Overview
Problem Description
Self-healing research
Example: Self-healing policies for Mixed Service-oriented
Systems
Conclusions
© Harald Psaier
What is self-healing
A self-healing system should recover from the abnormal (or
“unhealthy”) state and return to the normative (“healthy”)
state, and function as it was prior to disruption.
A system with self-healing properties can be identified as a
system that comprises fault-tolerant, self-stabilizing, and
survivable system capabilities and, if needed, must be human
supported.
© Harald Psaier
The 3 common states are
Normal, Broken, and
Degraded. The challenge is
to identify Degraded in time
and to recover soundly.
Self-healing origins
Fault-tolerant system refers to a system that continues
working at a reasonable degree in the presence of faults
Self-stabilizing systems refers to a system that continuously
stabilizes the system from any perturbations.
Survivable systems sustain the unexpected
© Harald Psaier
Self-healing research: autonomic computing (1/2)
IBM's autonomic computing research envisions a layered structure that can manage itself to given high-level objectives from administrators.
Motivated by the amount spent on and overwhelming effort in system maintenance
The research tries to cover all adaptable layers down to network and operating system
Defines 4 properties for a self-managing system (self-CHOP):
– self-configuring: The ability to readjust itself “on-the fly”
– self-healing: Discover, diagnose, and react to disruptions
– self-optimization: Maximize resource utilization to meet end-user needs
– self-protection: Anticipate, detect, identify, and protect itself from attacks.
© Harald Psaier
Self-healing research: self-adaptive systems (2/2)
Self-adaptive systems evaluate their behavior and adapt on
system irregularities or when better functionality or
performance is possible
The research primarily covers the application and the
middleware layers and focuses on the system as a whole.
Includes also self-healing as a combination of self-diagnosing
and self-repairing with the capabilities to diagnose and
recover from malfunctions.
© Harald Psaier
Self-healing characteristics
© Harald Psaier
What:
Continuous availability by
compensating the dynamics of a
running system.
Why:
maintenance of health momentarily
and ...
Enduring continuity by resilience
against unintentional behavior
How:
Detect disruptions
Diagnose root cause
Derive recovery strategy
Self-healing requirements
A closed loop design which integrates sufficient sensor and
effector interfaces.
A status knowledge database and logic for an accurate state
recognition
State recognition must include failure classification for a
adequate handling of the problem
A collection of recovery policies in the format of <trigger, rule,
action>. Usually this collection is preconfigured but must also
be configurable to obtain…
Fitness and evolutionary aspects. Self-* properties generally
are applied to maintain a long-term use of the system
© Harald Psaier
Self-healing loop
© Harald Psaier
detecting: filters any
suspicious status information
diagnosing: does root cause
analysis and calculates an
appropriate recovery
recovery: carefully applies
the planned adaptations
A self-healing loop comprises 3 common states: detecting,
diagnosing, recovering
These are connected to the sensors and effectors of the
system
In the background, a knowledge-base supports the states
Self-healing states
The most general states in self-healing research are:
Normal: The system is in a “healthy” state. In particular, it
signalizes intentional functioning and all requirements are
met as expected.
Broken: This is an “unhealthy” system. It can generally be
identified by an unacceptable response which most probably
is the cause of a failure or error.
Degraded: The system is in a fuzzy transition zone between
the former. Behavior is expected to be unpredictable and
parts of the system will drift from acceptable state to some
failure state. In large-scale system in many cases this is
recognizable by considerable performance loss. If
redundant, in most cases the size provides the system with
additional recovery time.
© Harald Psaier
Failure classification: Failure types (1/2)
The main goal of this classification is to assist root cause
analysis and find the adequate resolution for the failure.
Common failure types are:
– Crash failure: undetectable malign service interruption
– Fail-stop: detected failure caused a service interruption
– Transient: instantaneous transparent interruption with measurable
side-effects
– Omission: message loss, transmission errors in communication
infrastructure
– Performance: violation of agreements on execution time
– Arbitrary: any type of failure with no specific pattern
© Harald Psaier
Failure classification: Policies (2/2)
Policies provide configuration and settings for detection and
recovery.
There are three different types of policies:
– Action policies: These are reactive policies with a specialized trigger
and immediate response is expected.
– Goal Policies: These define a set of desired states. They also
calculate the set of actions for the transition from the current (failure
affected) to a desired state
– Utility Function Policies: the set of states is connected to an utility
function. Problem solving includes extensive analysis including history
information, adaptation knowledge and a comprehensive system
awareness
Common recovery include:
– Replacement, balancing, isolation, persistence, redirection, etc.
© Harald Psaier
Fitness and evolution
Current large-scale systems, especially self-* enhanced, must
be designed for long-term service.
This means they must be resilient to changes and allow any
required future variations.
The issues to keep in mind are:
– Most arising requirements are not known a-priori but expose over time
– Intervention and changes on the current system must respect the
system’s essential functionality and avoid malicious failures at any
cost
– adaptation might reach its limits in resources
The current solution is to create self-* systems with exposed
configuration management and thus human assisted
adaptations
© Harald Psaier
S-Cube contributions to Self-healing/-* research
<NAME> – SoE1.1 Virtual Campus learning material © Harald Psaier – 21/<Max>
Psaier H., Dustdar S. (2010). A survey on self-healing systems: approaches and systems. Computing. Springer Wien.
Di Nitto, E., Ghezzi, C., Metzger, A., Papazoglou, M., Pohl, K. (2008). A journey to highly dynamic, self-adaptive service-based applications. Automated Software Engineering, 15(3), p 313—341. Springer.
Hielscher, J., Kazhamiakin, R., Metzger, A., Pistore, M. (2008). A framework for proactive self-adaptation of service-based applications based on online testing. Towards a Service-Based Internet. P 122—133. Springer.
Pernici, B. (2009). Self-healing Systems and Web Services: The WS-Diamond Approach. Business Process Management Workshops. p 440—442. Springer.
Psaier H., Skopik F., Schall D., Dustdar S. (2010). Behavior Monitoring in Self-healing Service-oriented Systems. 34th Annual IEEE Computer Software and Applications Conference (COMPSAC), July 19-23, 2010, Seoul, South Korea. IEEE.
Papazoglou, M.; Pohl, K.; Parkin, M.; Metzger, A. (2010). S-Cube - Towards Engineering, Managing and Adapting Service-Based Systems. Springer. 1st Edition., 2010, XVIII, 374 p.
Learning Package Overview
Problem Description
Self-healing research
Example: Self-healing policies for Mixed Service-oriented
Systems
Conclusions
© Harald Psaier
Mixed Service-oriented Systems: Challenges
Mixed Service-oriented Systems aka. Mixed Systems (MS)
are open to humans and services.
Inherit all properties of SOA including distributed, ad-hoc
interactions along with a communication infrastructure and
coordination.
… and aforementioned properties
… and examples
What are the challenges in MS?
– the „openness“ of the system allows to join many and possibly
unreliable services
– In particular humans are unreliable related to their, e.g., different
working hours, particular preferences, current mood, and context.
© Harald Psaier
Scenario: Expert Network
The key is to share the subtask of the activity among the
appropriate experts for the subtask. This is usually solved by
delegation and re-delegation. However can fail on individual
misbehavior.
Main challenge: How to guarantee that the activity is
complete, also, on time?
© Harald Psaier
Includes two parties: the
service consumer with a
request as an activity – and
experts and resources in
the service network.
The network combines all
knowledge required to
process jointly the activity
Delegation and processing behavior
A model of the network helps to analyze a possible problem
– HPS and SBS are represented as nodes
– Interactions are allowed over established channels
– The current work load of nodes is indicated by the queues
At runtime the model additionally indicates
– The delegation directions and frequency by the arrow direction and the
thickness of the connection
– The current work load is indicated
by the queue fill state
With the model we can present
two main patterns of misbehavior
© Harald Psaier
1st misbehavior pattern: Delegation Factory
The delegation factory misbehavior pattern:
– a accepts and delegates particular tasks frequently
– However, a processes few tasks and has a low task-queue
The factory behavior impact:
– produces unusual amounts of task delegations
– tasks miss their deadline
– leads to performance degradations of the entire network
© Harald Psaier
2nd misbehavior pattern: Delegation Sink
The delegation sink Misbehavior pattern:
– d accepts too many offered tasks
– However, d processes slow (e.g., overestimates its capability vs.
received overload)
Sink behavior impact:
– produces unusual amounts of task delegations
– tasks miss their deadline
– leads to performance degradations of the entire network
© Harald Psaier
Observing and avoiding misbehavior
A successful self-healing architecture that can handle the
misbehavior situations must
– avoid unpredictable system behavior leading to faults
– indentify and handle degraded states. Degraded states here relate to
poor progress in activity process because of increasing factory/source
behavior
Feasible adaptation actions must not include direct
punishment of the misbehaving participating experts. Instead
a transparent temporary decoupling from the system is
considered.
Also, the architecture must be aware of the side-effects of the
healing actions.
– a feedback loop informs about the success of the adaptation
© Harald Psaier
The VieCure Framework
© Harald Psaier
Between the MS atop a
monitoring and adaptation
layer connects to the
framework.
From the interaction logs
events are derived and
diagnosed.
The Behavior Registry
provides the metrics to
identify the misbehavior
patterns
During recovery the
interaction channels are
adjusted
Self-healing steps on misbehavior
System is in prefect health
An overload in node b is detected
Assuming a causes the most
overload traffic, the recovery action
regulates channel (i) between a and b
However, b remains overloaded. An
additional unknown cause is
assumed
An alternative for b is found and
channels to d are opened
Channels (ii) and (iii) are now
available
© Harald Psaier
Learning Package Overview
Problem Description
Self-healing research
Example: Self-healing policies for Mixed Service-oriented
Systems
Conclusions
© Harald Psaier
Summary
Self-healing research principles
– A self-healing system should recover from the abnormal (or
“unhealthy”) state and return to the normative (“healthy”) state, and
function as it was prior to disruption.
– The 3 common states are Normal, Broken, and Degraded. The
Challenge is to identify Degraded in time and to recover soundly.
– In order to recover a self-healing loop is required that detects,
diagnose, and recovers the system.
Self-healing in MS
– the „openness“ of the system and the generally unpredictable human
behavior are sources of system degradation.
– The two presented misbehavior models are delegation factory and
sink. Either a node delegates without respecting the capacity of the
neighbors or a node overestimates its capacity.
– The VieCure Framework considers and resolves both cases.
© Harald Psaier
Further S-Cube Reading
© Harald Psaier
Psaier H., Juszczyk L., Skopik F., Schall D., Dustdar S. (2010). Runtime Behavior Monitoring and Self-Adaptation in Service-Oriented Systems. 4th IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), September 27 - October 01, 2010, Budapest, Hungary. IEEE.
Psaier H., Skopik F., Schall D., Juszczyk L., Treiber M., Dustdar S. (2010). A Programming Model for Self-Adaptive Open Enterprise Systems. 5th Workshop of the 11th International Middleware Conference (MW4SOC), November 29 - December 3, 2010, Bangalore, India. ACM.
Psaier H., Skopik F., Schall D., Dustdar S. (2011). Resource and Agreement Management in Dynamic
Crowdcomputing Environments. 15th IEEE International EDOC Conference (EDOC), 29th August - 2nd
September, 2011, Helsinki, Finland, IEEE.
.
Dustdar, S.; Schall, D.; Skopik, F.; Juszczyk, L.; Psaier, H. (Eds.) (2011). Socially Enhanced Services
Computing -- Modern Models and Algorithms for Distributed Systems. (1) p. 37. Springer
Acknowledgements
The research leading to these results has
received funding from the European
Community’s Seventh Framework
Programme [FP7/2007-2013] under grant
agreement 215483 (S-Cube).
© Harald Psaier
top related