fabián e. bustamante, winter 2006 autonomic computing the vision of autonomic computing, j. kephart...
TRANSCRIPT
Fabián E. Bustamante, Winter 2006
Autonomic Computing
The vision of autonomic computing, J. Kephart and D. Chess, IEEE Computer, Jan. 2003.
Also - A.G. Ganek and T.A. Corbi, “The dawning of the autonomic computing era”, IBM Systems Journal, 42 (1), 2003.
- R. Want, T. Pering and D. Tennehouse, “Comparing autonomic and proactive computing”, IBMS Systems Journal, 42 (1), 2003.
.
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
2
The problem
The main obstacle to further progress in IT industry– Not a change in Moore’s law, but – Looming software complexity crisis
• Beyond admin single environments, to integration into intra- and inter-corporate computing systems
“Complexity is the business we are in, and complexity is what limits us.”, Fred Brooks Jr.
Better programming won’t do it Consider– ~1/3 to ½ of a company’s total IT budget goes to preventing
and recovering from crashes– “For every dollar to purchase storage, you spend $9 to have
someone manage it.”, N. Tabellion, CTO Fujitsu Softek– ~40% of computer outages are caused by operator errors– Average downtime impact for IT ~ $1.4 millions revenue/hour
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
3
The answer/hope – Autonomic computing
Autonomic systems – can manage themselves given high-level objectives from admins.~ autonomic nervous system
An autonomic system– Knows itself– Knows its environment & the context surrounding its activity– (Re) configure itself under varying and unpredictable
conditions– Is always on the look to optimize its working– Is able to protect and heal itself– Anticipates the optimized resources needed to meet a user’s
information needs
To incorporate these characteristics, it must have the following properties/features …
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
4
Self-* properties
Self-configuration– Current: Data centers made of components from/for multiple
vendors/platforms; installation, configuration & integration is time consuming & error prone
– Autonomic: Automated based high-level policies, host system adjust itself automatically and seamalessly
Self-optimization– Current: Hundreds of manually set, nonlinear tuning knobs– Autonomic: Components and system continually seek
optimization opportunities
Self-healing– Current: e.g. problem determination can take weeks– Autonomic: self detection, diagnosis, and repair for HW&SW
Self-protection– Current: Detection & recovery from attacks & cascading
failures is manual– Autonomic: Self-defense using early warning to anticipate &
prevent system-wide failures
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
5
Autonomic element
Autonomic systems – interactive collection of autonomic elementsAutonomic element– 1+ managed elements +
autonomic manager that controls it
– Function at many levels – from disk drives to entire enterprises
– Fixed behavior, connections and relationships gives away to increased dynamism and flexibility expresed as high-level goals
Autonomic manager
Analyze Plan
Knowledge
Monitor Execute
Managed element
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
6
Evolution to autonomic systems
Basic Level 1
Managed Level 2
Predictive Level 3
Adaptive Level 4
Autonomic Level 5
Multiple sources of system generated data
Requires extensive, highly skilled IT staff
Consolidation of data through management tools
IT staff analyzes and takes actions
System monitors, correlates, and recommends actions
IT staff approves and initiate actions
System monitors, correlates and takes actions
IT staff manages performance against Service Level Agreements (SLAs)
Integrated components dynamically managed by business rules/policies
IT staff focuses on enabling business needs
Greater system awareness
Improved productivity
Reduced dependency on deep skills
Faster and better decision making
IT agility and resiliency with minimal human interaction
Business policy drives IT management
Business agility and resilience
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
7
Engineering challenges
Design, test and verification
Installation and configuration
Monitoring, problem determination, upgrading
Managing the life cycle– Autonomic systems will have multiple elements at different
stages, handling multiple tasks, … how to handle all?
Relationships among autonomic elements– Specification of services needed/provided; ways to locate
providers; ways to establish SLA; …
Robustness against self-management-based attacks
Goal specification and robustness to wrongly specified goals
CS 395/495 Autonomic Computing SystemsEECS, Northwestern University
8
Scientific challenges
How to understand, control, and design emergent behavior– Understanding the mapping from local to global behavior is
not enough
Develop a theory of robustness – Beginning with a definition
Learning and optimization theory– Machine learning by a single element in static environment is
just the basic – multiagent systems in dynamic environments
Negotiation theory– How should the multiple elements negotiate?
Automated statistical modeling– Statistical modeling for detection/prediction of performance
models; ways to aggregate statistical variables to reduce dimensionality