research challenges in autonomic computing
DESCRIPTION
Research Challenges in Autonomic Computing. Jeff Kephart IBM Research. [email protected] www.research.ibm.com/autonomic. Outline. Background and Motivation Autonomic Computing Research at IBM Architecture Overview of Research Program Autonomic Computing Research Challenges Conclusions. - PowerPoint PPT PresentationTRANSCRIPT
IBM Research
© 2003 IBM Corporation
Research Challenges inAutonomic Computing
Jeff KephartIBM Research
[email protected]/autonomic
2
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Outline
Background and Motivation
Autonomic Computing Research at IBM Architecture
Overview of Research Program
Autonomic Computing Research Challenges
Conclusions
3
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Background and Motivation (Kephart)
My role in autonomic computing My group does research on agents and multi-agent systems
– Architecture, Communication, Negotiation, Machine learning AC Research strategy; joint program manager
University relations; faculty awards, equipment grants
Chair, Autonomic Computing Advisory Board
What I hope to achieve here Stir up interest in autonomic computing research
Explore collaborations with IBM Research
Learn from you: new viewpoints, new approaches
4
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Complex heterogeneous infrastructures are a reality!
Directory Directory and Security and Security
ServicesServicesExistingExisting
ApplicationsApplicationsand Dataand Data
BusinessBusinessDataData
DataDataServerServer
WebWebApplicationApplication
ServerServer
Storage AreaStorage AreaNetworkNetwork
BPs andBPs andExternalExternalServicesServices
WebWebServerServer
DNSDNSServerServer
DataData
Dozens of systems and applications
Hundreds of components
Thousands of tuning
parameters
5
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Autonomic Computing: Motivation
Individual system elements increasingly difficult to maintain and operate 100s of config, tuning parameters for commercial databases, servers, storage
Heterogeneous systems are becoming increasingly connected Integration becoming ever more difficult
Architects can't intricately plan component interactions Increasingly dynamic; more frequently with unanticipated components
This places greater burden on system administrators, but they are already overtaxed
they are already a major source of cost (6:1 for storage) and error
We need self-managing computing systems Behavior specified by sys admins via high-level policies
System and its components figure out how to carry out policies
6
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Facets of Self-Management
Self- The Human-Intensive Present The Autonomic Future
Configure Corporate data centers are multi-vendor, multi-platform. Installing, configuring, integrating systems is time-consuming, error-prone.
Automated configuration of components, systems according to high-level policies; rest of system adjusts seamlessly.
Heal Problem determination in large, complex systems can take a team of programmers weeks.
Automated detection, diagnosis, and repair of localized software/hardware problems.
Optimize Web servers, databases have hundreds of nonlinear tuning parameters; many new ones with each release. Adjusted manually.
Components and systems will continually strive to improve their own performance and efficiency.
Protect Manual vulnerability analysis. Manual detection and recovery from attacks, cascading failures.
Automated defense against malicious attacks or cascading failures; use early warning to anticipate and prevent system-wide failures.
Increased resiliency, responsiveness, efficiency, ROI
Reduced down-time, risk, time-to-value, cost
Business case:
7
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Manual Autonomic
Ben
efi
tsS
kill
sC
har
acte
rist
ics
Level 1 Level 2 Level 3
Evolving towards Autonomic Computing Systems
Multiple sources of
system generated data
Extensive, highly skilled
IT staff
Basic Requirements
Met
Data & actionsconsolidated through mgt
tools
IT staffanalyzes &
takes actions
Greater system awareness
Improved productivity
Sys monitors correlates & recommends
actions
IT staffapproves &
initiates actions
Less need for deep skills
Faster/better decision making
Sys monitors correlates &
takesaction
IT staff manages performance against SLAs
Human/system interaction
IT agility & resiliency
Level 5
Componentsdynamically respond to business policies
IT staff focuseson enabling
business needs
Business policy drives IT mgt
Business agility and resiliency
Level 4
8
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Outline
Background and Motivation
Autonomic Computing Research at IBM Architecture
Overview of Research Program
AI Research Challenges
Conclusions
9
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Autonomic Computing ArchitectureThe Autonomic Element
AEs are the basic atoms of autonomic systems
An AE contains Exactly one autonomic manager
Zero or more managed element(s)
AE is responsible for Managing own behavior in accordance
with policies
Interacting with other autonomic elements to provide or consume computational services
An Autonomic Element
Managed Element
ES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager
An Autonomic Element
E.g. Database, storage, server, software app, workload mgr, sentinel, arbiter, OGSA infrastructure elements
Service-oriented architecture
Software agents
10
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Autonomic Computing Architecture Element interactions
System self-* properties, behavior arise from interactions among autonomic managers
Interactions are Dynamic, ephemeral
Formed by (negotiated) agreement
Flexible in pattern; determined by policies
Based on OGSA and specific AC extensions
– Required messages
– Optional but standard
– Application-specific
For advanced interactions: conversation support “Choreography” defines structure of multi-step
interactions
A multi-agent system!
11
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Overview of IBM’s Autonomic Computing Research Program
Over 150 researchers working on various aspects of Autonomic Computing Some projects predate AC initiative; now trying to realign them with AC architecture
Technologies for specific autonomic elements Database, storage, server, client…
Generic element technologies for autonomic elements Autonomic Manager Toolset integrates many element-level technologies
– Modeling, analysis, forecasting, optimization, planning, feedback control, etc. Uses Open Grid Services Architecture standards for inter-element communication Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later
Generic system-level technologies Dependency management, problem determination and remediation, workload management,
provisioning, …
System scenarios and prototypes Small- to medium-scale autonomic systems Demonstrate self-* arising from AC architecture + technology Identify gaps, necessary modifications
12
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Overview of IBM’s Autonomic Computing Research Program
Over 150 researchers working on various aspects of Autonomic Computing Some projects predate AC initiative; now trying to realign them with AC architecture
Technologies for specific autonomic elements Database, storage, server, client…
Generic element technologies for autonomic elements Autonomic Manager Toolset integrates many element-level technologies
– Modeling, analysis, forecasting, optimization, planning, feedback control, etc. Uses Open Grid Services Architecture standards for inter-element communication Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later
Generic system-level technologies Dependency management, problem determination and remediation, workload management,
provisioning, …
System scenarios and prototypes Small- to medium-scale autonomic systems Demonstrate self-* arising from AC architecture + technology Identify gaps, necessary modifications
13
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
LEarning Optimizer for DB2 (LEO)G. Lohman, Almaden
Plan Execution
Optimizer
Best Plan
Plan Execution
Optimizer
Best Plan
StatisticsSQL Compilation
1. Monitor
2. Analyze
3. Feedback4. Exploit
AdjustmentsAdjustments
Estimated CardinalitiesEstimated
Cardinalities
Actual Cardinalities
ActualCardinalities
Query
14
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
IBM IceCube ServerR. Freitas, Almaden
“Brick”
10 Gbit/s
capacitive
“Coupler”
(6) per brick
=
“Thermal
Bus Array”
6”
Prototype Brick:
- (12) 2.5” disks
- 8-port Switch
- Linux on fast CPU
Full IceCube System
blue: Storage Bricks
yellow: Compute Bricks
3D mesh @ 10 Gb/s per link
No connectors,
wires, fibers,
lasers or fans
Lego-like Collection of ‘Intelligent Bricks” Fail-in-place policy: bad bricks are left in place 7 x smaller than equivalent standard systems Fast, power-hungry components (CPU etc) ok Includes resource allocation software First Application : Petabyte-class Storage Server
intended to be managed by one person
15
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
SLEDS (SLA-based management of storage performance)D. Chambliss, Almaden
Storage customers establish SLAs w/ storage system Storage system throttles optimally in accord w/ SLAs
0 20 40 60 80 100
Demand (k IOPS)
1
10
100
1000
Response T
ime (
ms) Missed
Target
On Target
On Target
Cust Policy
Cust Policy
Storage Customers
SAN Fabric
Storage Server
SLAServer
Manager
16
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Personal software configurationD. Bantz & D. Frank, Watson
inventory collection
Analysisrules
Analysis:characterize
inventory
PPE
Plan:Choose
components, resolve
dependencies
SmartCatalog
Clean up?New software?Make space?
install/ uninstall
Planningrules
Policies
Automate SW maint & migration on personal devices
“Upgrade all my applications”
“Make my new laptop work like the old one”
“Migrate most valuable Palm apps to my PC”
17
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Overview of IBM’s Autonomic Computing Research Program
Over 150 researchers working on various aspects of Autonomic Computing Some projects predate AC initiative; now trying to realign them with AC architecture
Technologies for specific autonomic elements Database, storage, server, client…
Generic element technologies for autonomic elements Autonomic Manager Toolset integrates many element-level technologies
– Modeling, analysis, forecasting, optimization, planning, feedback control, etc. Uses Open Grid Services Architecture standards for inter-element communication Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later
Generic system-level technologies Dependency management, problem determination and remediation, workload management,
provisioning, …
System scenarios and prototypes Small- to medium-scale autonomic systems Demonstrate self-* arising from AC architecture + technology Identify gaps, necessary modifications
18
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Autonomic Manager ToolkitW. Arnold et al., Watson
Facilitates autonomic mgr construction In accordance w/ AC architecture
Catcher for generic AM technologies OGSA messaging
Policy tools
Monitoring technologies
AI tools for knowledge representation, reasoning
Math libraries for modeling, analysis, planning
Feedback control
V1.0 available as part of Emerging Technologies Toolkit v 1.1 on IBM alphaWorks (www.alphaworks.ibm.com)
Considering open sourceAn Autonomic Element
Managed Element
ES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager
An Autonomic Element
ES
19
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Policies and Autonomic ComputingD. Verma and D. Kandlur, Watson
Policy: Set of guidelines or directives provided to autonomic element to influence its behavior.
Key Challenge: Move away from low level controls
Move towards high level directives (policies) over autonomic decisions
Developing scenarios, standards and technologies to support policies for autonomic computing
Element
M
A
S
EP
E
K
S E
Element
MM
AA
S
EEPP
E
KK
S E
1. External policies are delivered through effectors.
3. AnalyzeAnalyze system operation w.r.t. policiesCreates reports as dictated by policy
4. PlanAssigns tasks based on policesAssigns resources based on policies Enables sensorsAdd/modify/delete policies
2. Policies are stored as knowledge
5. Enabled/disabled based on policies
6. Enabled/disabled based on external policies
20
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Mathematical Modeling and Optimization
M. Squillante, Watson
Develop and implement sophisticated mathematical methods and algorithms to support AC systems Modeling
– Statistical Analysis– Stochastic Models– Forecasting
Optimization
– Discrete– Stochastic– Nonlinear
Control
– Control Theory– Dynamical Systems– Chaos
Think TimesThink Times
ServersServers
RouterRouter
21
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Generic Adaptive ControlJ. Hellerstein, Watson
Admin
ES
KeepAliveMaxClients
CPUMem
CPU*Mem*
Apache Server
ControllerCPU*Mem*
M+
-
t
Web service requests
A
E
P
Feedback control to tune effectors
Based on high-level behavioral specs Multiple goals
Multiple effectors
Time-varying demand
Various database and server applications
22
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Utility Functions and Autonomic ComputingW. Walsh, Watson
Utility functions can guide autonomic decision making within an element Self-optimization: natural and flexible
way to express optimization criteria based on business objectives
– Avoids hard-coded preferences, special-purpose algorithms
Basis for translating business-level objectives into resource allocation objectives Algorithms based on modeling and
optimization
Response time RT
V(R
T)
Utility function
23
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Overview of IBM’s Autonomic Computing Research Program
Over 150 researchers working on various aspects of Autonomic Computing Some projects predate AC initiative; now trying to realign them with AC architecture
Technologies for specific autonomic elements Database, storage, server, client…
Generic element technologies for autonomic elements Autonomic Manager Toolset integrates many element-level technologies
– Modeling, analysis, forecasting, optimization, planning, feedback control, etc. Uses Open Grid Services Architecture standards for inter-element communication Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later
Generic system-level technologies Dependency management, problem determination and remediation, workload management,
provisioning, …
System scenarios and prototypes Small- to medium-scale autonomic systems Demonstrate self-* arising from AC architecture + technology Identify gaps, necessary modifications
24
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Dependency Mgt & Self-Healing G. Kar, Watson and H. Lee & S. Ma, Watson
Determine functional dependencies among elements Mine design docs, system config metadata, log files
Actively probe running system Use dependency information for system management
Localize problem (real-time active inference & learning)
WS AS DBS R HWS HAS HDBSpWS 1 1 1 1 1 1 1 pAS 0 1 1 1 0 1 1pDBS 0 0 1 1 0 0 1pingR 0 0 0 1 0 0 0pingWS 0 0 0 1 1 0 0pingAS 0 0 0 1 0 1 0 pingDBS 0 0 0 1 0 0 1
Dependency Matrix
Probe
Analysis & Control
Router
Web Server DB Server
App Server
HWS
HA
S
HDBS
25
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Overview of IBM’s Autonomic Computing Research Program
Over 150 researchers working on various aspects of Autonomic Computing Some projects predate AC initiative; now trying to realign them with AC architecture
Technologies for specific autonomic elements Database, storage, server, client…
Generic element technologies for autonomic elements Autonomic Manager Toolset integrates many element-level technologies
– Modeling, analysis, forecasting, optimization, planning, feedback control, etc. Uses Open Grid Services Architecture standards for inter-element communication Available (with ETTK v1.1) on www.alphaworks.ibm.com; open source later
Generic system-level technologies Dependency management, problem determination and remediation, workload management,
provisioning, …
System scenarios and prototypes Small- to medium-scale autonomic systems Demonstrate self-* arising from AC architecture + technology Identify gaps, necessary modifications
26
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Human Interaction with Autonomic SystemsP. Maglio, Almaden
Basic questions What do middleware administrators do?
How can we better support the problems and practices they have?
Learn answers to these questions via ethnographic studies
Use insights to develop new ways to interact with complex computing systems
… but we thought that was the return
port!
We had it wrong. Our assumption of how it worked was incorrect.
We start with looking at the proxy server log files, then the web server log files, then the application server admin log files then the application log files.
27
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Enterprise Workload ManagementD. Dillenberger
InternetInternet
Appliance Appliance ServersServers
Web Web Application Application
ServersServersData and Data and
Transaction Transaction ServersServers
Internet/Internet/ExtranetExtranet
Business Business PartnersPartners
Large, distributed,heterogeneous system
Achieves end-to-end performance via adaptive algorithms Administrator defines policy
– Desired response times for various classes of users, apps eWLM managers on each resource cooperate to adaptively tune parameters
– OS, network, storage, virtual server knobs– JVM heap size, # garbage collection threads– Workload balancing, routing parameters
28
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Example scenario: Autonomic Data Center
Autonomic Data Center
Client 1-1
ResourceArbiter
Registry
PolicyRepository
SystemManager
Application Environment
Application
Manager
Database
Router Server
Storage
Application Environment
DatabaseRouter
Server
Storage
Application
Manager
Client 1-2
Client 2-1
Client 2-2
Resource-level utility Service-
level utility
29
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Outline
Background and Motivation
Autonomic Computing Research at IBM Architecture
Overview of Research Program
Scenarios
Autonomic Computing Research Challenges Systems and Software
– Architecture, software engineering & tools, testing/validation– Prototyping a large-scale self-* system
Human-Computer Interaction
– Policies, Interfaces Artificial Intelligence
Learning, Negotiation, Self-healing, Emergent Behavior
Conclusions
30
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Architecture
AE: How to coordinate multiple threads of activity? AE’s live in complex environments
Multiple task instances and types
– concurrent, asynchronous Multiple interacting expert modules
AE: How to detect/resolve conflicts arising from Internal decisions by independent expert modules
External directives (possibly asynchronous)
Internal policies vs. external directives
System-level: Enable more flexible, service-oriented patterns of interaction As opposed to traditional top-down, hierarchical systems management
Multi-agent architecture
– Communication– Representing and reasoning about needs, capabilities, dependencies
Managed ElementES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic Manager
An Autonomic Element
ES
Define set of fundamental architectural principles from which self-* emerges
31
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Software engineering and programming tools
Develop appropriate software engineering concepts and programming tools for composing autonomic elements and systems; support for Monitoring, analysis, planning and execution
Expressing and understanding policies
Interactions with other elements
– Negotiation– Monitoring and enforcing agreements
32
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Testing and Verification
Develop methods for testing and verifying behavior of autonomic elements testbeds and simulation environments
in situ mechanisms that permit new versions of software to run alongside old versions until they have established their trustworthiness
33
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Policy: “Set of guidelines or directives provided to autonomic element to influence its behavior”
Challenge: Policy
Managed ElementES
Monitor
Analyze
Execute
Plan
Knowledge
Autonomic ManagerES
Human interface Authoring and understanding policies
Avoiding or ameliorating specification errors
Developing a universal representation and grammar Many different application domains, disciplines
Many different flavors of policy
Covers service agreements too?
Algorithms that operate upon policies (and agreements?) Automated derivation of actions (e.g. planning, optimization)
Automated derivation of lower-level policies from high-level policies
E.g. “Maximize profit from this set of service contracts”
Conflict resolution Both design time and run time
Need to establish protocols, interfaces, algorithms
34
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | CMU, September 4, 2003
Three flavors of (policy = “decision-making guide”)
CurrentState
S
PossibleState
1
PossibleState
2
PossibleState
3
a1
a2
a3
Action rule If (S) then do a2
Results implicitly in desired state 2
Goal Achieve a most desired state 2
Compute a2 most likely to result in 2
Assumes that most desired state can be determined a priori
Utility function Achieve state with maximal net value V() – C(aS)
Benefit and burden of being explicit about value
States have intrinsic value; value of policy is a derived quantity
Element utility functions
System utility functions
Machinecode
Rules
ActionsElementGoals
Workflows
[More levels of code hierarchy]
Higher-level specifications
GenerativePlanning
Optimization Modeling,
Optimization
Adapters,TranslatersProgramming
Decision-theoreticPlanning
35
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Policies: Theory meets Reality
We can’t specify the full state of the world Policy conflicts can arise from incomplete descriptions of state
E.g. different action-rule antecedents can apply to same state, but have conflicting consequents
Goal-type policies can conflict too (sets of acceptable and feasible states don’t intersect)
It’s hard to elicit a full specification of desired behavior from people Preference elicitation is difficult when there are many attributes
But people are good at noticing when the system isn’t behaving as they like
– “Complaint-based tuning” (Ganger, CMU)
Can a universal representation and calculus handle such a broad range? Storage, network, database, server, etc.
Temporal conditions; correlations
Access control
Classification
36
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Human-System Interface
Develop new languages, metaphors and translation technologies that enable humans to monitor, visualize, and control AC systems Specify goals and objectives to AC systems, and visualize their
potential effect
Techniques must be
– Sufficiently expressive of preferences regarding cost vs. performance, security, risk and reliability
– Sufficiently structured and/or naturally suited to human psychology and cognition to keep specification errors to an absolute minimum
– Robust to specification errors
37
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Learning
Single element level AE needs to learn a model of itself and environment quickly; environment
is noisy, and dynamic in both state and structure
On-line, so exploration of the space can be costly and/or harmful
May be several hundreds of tunable parameters!
– Maybe only a few dozen are relevant, but which ones?– Some of them can only be changed upon reboot – is it worthwhile?
System level Multi-agent system: several interacting learners
What are good learning algorithms for cooperative, competitive systems?
– What are conditions for stability?– What is sensitivity to perturbations?
Opportunities for layered learning
Establish theoretical foundation for understanding and performing learning and optimization in multi-agent systems.
38
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Negotiation
Develop and analyze Methods for expressing or computing preferences
Negotiation protocols
Negotiation algorithms
Establish theoretical foundation for negotiation Explore conditions under which to apply
– Bilateral– Multi-lateral (mediated, or not)– Supply-chain
Study how system behavior depends on mixture of negotiation algorithms in AE population
39
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Self-Healing Systems
GU
IInference & Learning
Engs.
ProbeDriver
Real-timeEventMgr
Diagnos. StateDep. Info, Config
Problem Diagnosis/Localization Mgr
Simulator & Action Mgr
ProblemDeterm.
DB
Web Server DatabaseNetwork Router
Probe Station
Remediator
DependencyAgent
Develop robust, scalable approaches to monitoring/controlling health, security and performance of autonomic systems Automated capture of human expert
knowledge about problem diagnosis and recovery
Predictive, adaptive diagnosis/recovery
Data mining to learn correlated event patterns for diagnosis
Automated learning and execution of appropriate recovery plan
Construction and learning of adaptive statistical models of large networked systems
And do it all without being too invasive!
40
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Challenge: Control and Harness Emergent Behavior
Understand, control, and exploit emergent behavior in autonomic systems How do self-*, stability, etc. depend on
– Behaviors and goals of the autonomic elements– Pattern and type of interactions among AEs– External influences and demands on system
Invert relationship to attain desired global behavior
– How?– Are there fundamental limits?
Develop theory of interacting feedback loops Hierarchical
Distributed
41
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Outline
Background and Motivation
Autonomic Computing Research at IBM Architecture
Scenarios
Overview of Research Program
AI Research Challenges
Conclusions
42
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Conclusions
Autonomic Computing is a grand challenge, requiring advances in several fields of science and technology
Policy, planning, learning, knowledge representation, multi-agent systems, negotiation, emergent behavior
Human-system interfaces
Integrating these technologies to support self-management in complex, realistic environments is a research challenge in itself
What are the best architectures and design patterns? Role of (multi-)agent systems? Building system prototypes is key to developing and validating AC technology and
architecture
What to do if you’re interested in working on these problems Just go do it and publish your results Find an IBM Researcher who is interested in collaborating with you (I can help)
Get them to help you pursue a faculty award or equipment grant
How can we establish a research community around autonomic computing? International Conference on Autonomic Computing, May 17-18, 2004, New York City
Co-located with WWW 2004Co-chair: Manish Parashar
What about defining challenge problems?
We have developed several realistic industry scenarios that could serve as a basis
43
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Additional Information
A Vision of Autonomic Computing IEEE Computer, January 2003
IBM Systems Journal special issue on Autonomic Computing http://www.research.ibm.com/journal/sj42-1.html
Web site www.research.ibm.com/autonomic
International Conference on Autonomic Computing www.autonomic-conference.org
May 17-18, New York City
Submission deadline: January 12, 2003
44
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Backup Slides
45
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Other Autonomic Computing Workshops and Conferences
First Workshop on Algorithms and Architectures for Self-Managing Systems (at FCRC ’03)
June 11, 2003 in San Diego, CA 5th Annual International Conference on Active Middleware Services:
Autonomic Computing Workshop June 25, 2003 in Seattle, WA
IJCAI-03 AI and Autonomic Computing: Developing a Research Agenda for Self Managing Computer Systems
August 10, 2003 in Acapulco, Mexico First International Workshop Autonomic Computing Systems at 14th
International Conference on Database and Expert Systems Applications (DEXA'2003)
1-5 September, 2003 in Prague, Czech Republic 14th IFIP/IEEE International Workshop on Distributed Systems: Operations
& Management (DSOM-03) October 20-22, 2003 in Heidelberg, Germany
46
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Controller
AC Mech.
Thermostat
AC Mech.
Thermostat
AC Mech.
Thermostat
AC Mech.
Thermostat
• Locus of high-level policy optimization• Authority over thermostats in domain
• Local knowledge of environment• Direct control of cooling mechanism• Varying degrees of sophistication
Challenge: Putting it all together into a self-managing systemAutonomic Thermostat scenario
47
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Scenario: Autonomic Thermostat
72
$100
76
$80
Temperature50 90
Valu
e
Value function
How much would you pay to get temperature T?
Cost function
72
$100
76
Temperature50 90
Cost
$25
$35
How costly is it to attain temperature T?
U(Temperature) = Value(Temperature) – Cost(Temperature)
Controller Policy: Choose temperature that maximizes
48
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Scenario: Autonomic Thermostat
AC Mech.
Thermostat
Controller
kWh(Tcurrent, Textern, T)
kWh1(T) ?
kWh1(T)
72
10
76
Temperature50 90
kW
H2.5
3.5
AC Mech.
Thermostat
AC Mech.
Thermostat
72
$100
76
$80
Temperature50 90
Valu
eValue function
Policy Repos.
Power Co.
C(kWh)
kWh
Cost
0 5 10
V1(T) – C1(T) ?
Determine T*that maximizesV1(T) – C1(T)
49
IBM Research
© 2003 IBM CorporationResearch Challenges in Autonomic Computing | Rutgers University, November 13, 2003
Scenario: Autonomic ThermostatConflict Resolution
AC Mech.
Thermostat
Controller
Temp. goal = T* +/- *
Action Policies
1.If (in cooling mode && Tcurr < T* - *) then turn AC off 2.If (in cooling mode && Tcurr > T* + *) then turn AC on
Man. Control
Temp. goal = T’ +/- ’
Priority Policies
1. Abide by temp goal from entity with higher authority2. If (cost exceeds X) reset temp goal to affordable value
72
$100
76
$80
Temperature50 90
Val
ue
Value function
Cost function
72
$100
76
Temperature50 90
Cost
$25
$35