oreilly ac talk - ibm - united states
TRANSCRIPT
Robert Morris
Autonomic ComputingAutonomic Computing
Director of IBM Almaden Research CenterVP, Personal Systems & Storage
IBM AlmadenResearch Center
©
$1000 BuysC
omputations per second
1E+12
1E+9
1E+6
1E+3
1E+0
1E-3
1E-5
1900 1920 1940 1960 1980 2000 2020
Year
Com
putations per second
Electro-mechanicalMechanical
Vacuum TubeDiscrete TransistorIntegrated Circuit
After:"Mind Children: The Future of Robot and Human Intelligence," Hans Moravec, Harvard University Press, 1988,"The Age of Spiritual Machines: When Computers Exceed Human Intelligence" Ray Kurzweil, Viking, 1999.
©
The High Cost of IT Management
For example: the cost to manage storage is typically twice the cost of the actual storage system.
Storage: What $3 million bought in 1984 and 2000.
(1) J. P. Gelb, "System-managed storage," IBM Systems Journal, Vol 28, No. 1, 1989 pp. 77-103.
(2) "Storage on Tap: Understanding the Business Value of Storage Service Providers", ITCentrix report, March 2001.
(3) "Server Storage and RAID Worldwide" (SRRD-WW-MS-9901), Gartner Group/Dataquest report, May 1999.
1984 2000
$2 millionStorageAdministration
$1
$2
$3 mil
$1
$2
$3 mil
$1 millionSystem
$1 millionStorage Administration
$2 millionSystem
©
Making the Front Page
America Online6 August 1996 outage: 24 hoursMaintenance/Human ErrorCost: $3 million in rebatesInvestment: ???
AT&T13 April 1998 outage: Six to 26
hoursSoftware UpgradeCost: $40 million in rebatesForced to file SLAs with the
FCC (frame relay)
eBayOutage: 22 hours 12 June 1999Operating System FailureCost: $3 million to $5 million
revenue hit and 26% declinein stock price
E*Trade3 February 1999 through 3 March 1999: Four outages of at least five
hours System UpgradesCost: ???? 22 percent stock price hit on 5
February 1999
Dev. Bank of Singapore1 July 1999 to August 1999: Processing ErrorsIncorrect debiting of POS
due to a system overloadCost: Embarrassment/loss of
integrity; interest charges
Charles Schwab & Co.24 February 1999 through 21 April 1999: Four outages of at least four hours Upgrades/Operator ErrorsCost: ???; Announced that it had made $70
million in new infrastructure investment.
Causes of UnplannedApplication Downtime
TechnologyFailures20%
40%
40%
OperatorErrors
ApplicationFailures
NYSEJune 8, 2001>1700 stocks stopped trading for 90 minutesSoftware UpgradeCost: ???
©
The Bad News: Complexity
Complex heterogeneous infrastructures are a reality and are hard!
DIRECTOR AND
SECURITY SERVICES
EXISTINGAPPLICATIONS
AND DATA
BUSINESSDATA
DATADATASERVERSERVERWEBWEB
APP APP SERVERSERVER
STORAGE AREA
NETWORK
BUSINESSPROCESSES AND
EXTERNAL SERVICES
Inte
rnet
Fire
wal
l
WEBWEBSERVERSERVER
DNSDNSSERVERSERVER
DATA
Cac
he
Load
Bal
ance
r
Inte
rnet
Fire
wal
l
Dozens of systems and applications
Hundreds of components
Thousands of tuning
parameters
©
Autonomic Computing Characteristics
Self-configuring
Adapt automatically to the dynamically changing environments
Self-optimizing
Monitor and tune resources automatically
Self-protecting
Anticipate, detect, identify, and protect against attacks from anywhere
Self-healing
Discover, diagnose, and react to disruptions
©
The Scope of Autonomic Computing
Applications
Middleware,Software
Operating Systems
Server, Storage, Network
Holistic approach:• Automation & manageability enablement at each system layer• Federated heterogeneous components interacting cohesively
©
Autonomic Computing Evolution
Components
Electronic Switching Systems
RAID/ IBM SharkDB Optimizer
Virus Management Software Rejuvenation
eLiza
SMART/LEO
Levels of SophisticationKnown
examplesCurrent
DirectionsFutureGoal
More of the same and better
©
Autonomic Computing is shipping now
eLiza features on IBM’s e-servers:
Self-Optimizing(Dynamic Service Level Attainment)
Self-Configuring(Define “on the fly”)
Self-Healing(Business continuance)
Self-Protecting(Safeguard assets)
• Dynamic LPAR• Intelligent Resource Director
• Remote Deployment Manager• Auto discovery and update of firmware
• Software Rejuvenation• Automatic de-allocation of processors, cache, LPAR
• Self-protecting kernel• Agent-building learning environment
©
Product
Store
Month
Optimizing an RDBMS
DB2 Optimizer
Consider Environment• CPU # & Speed• Disk #, latency, throughput• Memory Available• Parallelism• Concurrency
Consider Data• Amount• Distribution• Patterns
SQL Requests
Extensive rewrite for complex SQL
No Intervention Required!
Efficient System Usage!
Excellent Performance!P. G. Selinger et al, “Access Path Selection in a Relational Database Management System”, SIGMOD 1979, pp 23-34.
©
LEO: Learning in Query Optimization
StatisticsOptimizerOptimizerOptimizer
Best PlanBest Best PlanPlan
Plan Execution
Plan Plan ExecutionExecution
SQL Compilation
Michael Stillger, Guy Lohman, Volker Mark, Mokhtar Kandil, "LEO -- DB2's LEarning Optimizer", Proceedings of Intl. Conf. on Very Large Databases (VLDB), Sept. 2001
©
LEO: Learning in Query Optimization
StatisticsOptimizerOptimizerOptimizer
Best PlanBest Best PlanPlan
Plan Execution
Plan Plan ExecutionExecution
Estimated CardinalitiesEstimatedEstimated
CardinalitiesCardinalities
Actual Cardinalities
ActualActualCardinalitiesCardinalities
SQL Compilation
1. Monitor
©
LEO: Learning in Query Optimization
AdjustmentsAdjustmentsAdjustments
StatisticsOptimizerOptimizerOptimizer
Best PlanBest Best PlanPlan
Plan Execution
Plan Plan ExecutionExecution
Estimated CardinalitiesEstimatedEstimated
CardinalitiesCardinalities
Actual Cardinalities
ActualActualCardinalitiesCardinalities
SQL Compilation
1. Monitor
2. Analyze(stats)
©
LEO: Learning in Query Optimization
AdjustmentsAdjustmentsAdjustments
StatisticsOptimizerOptimizerOptimizer
Best PlanBest Best PlanPlan
Plan Execution
Plan Plan ExecutionExecution
Estimated CardinalitiesEstimatedEstimated
CardinalitiesCardinalities
Actual Cardinalities
ActualActualCardinalitiesCardinalities
SQL Compilation
1. Monitor
2. Analyze(stats)
3. Feedback
©
LEO: Learning in Query Optimization
AdjustmentsAdjustmentsAdjustments
StatisticsOptimizerOptimizerOptimizer
Best PlanBest Best PlanPlan
Plan Execution
Plan Plan ExecutionExecution
Estimated CardinalitiesEstimatedEstimated
CardinalitiesCardinalities
Actual Cardinalities
ActualActualCardinalitiesCardinalities
SQL Compilation
1. Monitor
2. Analyze(stats)
3. Feedback4. Exploit
©
Autonomic Computing Evolution
Homogeneous Components Interacting
Components
Adaptive network routing, Network congestion control
High availability clustering
ESSRAID
DB OptimizerVirus Management
Collective Intelligence Storage Bricks
Oceano
Software Rejuvenation
eLiza
SMART/LEO
Levels of SophisticationKnown
examplesCurrent
DirectionsFutureGoal
New packaging concepts for
storageSubscription computing
More of the same and better
©
Oceano: Technology for Multi-Customer Server Farms
Today • Fixed resource allocation• Separate management• Best effort basis, using own
resources
K. Appleby et al , "Oceano - SLA Based Management of a Computing Utility", 7th IFIP/IEEE Interenational Symposium on Integrated Network Management (IM), 2001.
IBM
Macy’s
Requests
Macy's SportsWeb
Throttles incoming requests
IBM IBM
SportsWebIBM
IBMIBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
Virtualized hardware
ROUTER
• "Free pool" management
• Dynamically allocated
• Challenges: SLA Management, security, privacy, load sharing, overload control, accounting
FutureRequests
Virtualized storage
©
Collective Intelligent Storage Bricks
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
Intelligent Brick
•Higher redundancy than RAID•Cooling performance hot spots by proactive copies• Improved sparing to eliminate repair actions for life of system
©
IceCube: Collection of Intelligent Bricks
Storage Brick
Various "Ice Cube" shapes©
Collective Intelligent Bricks
IceCube: Up to 1 Petabyte ( bytes)
Total power = 250 kW
AC
AC
>75 dB air noise
Cool
Total power = 220 kW Quiet! (64 dB)
WaterChiller
31 kW
WaterChiller
25 kW
• 32 Racks• 640 CIBs• 8x240 GB 3.5" Disks per CIB• 275 W per CIB• 5.5 kW per Rack
• 640 CIBs• 8x240 GB 3.5" Disks per CIB• 275 W per CIB
1510
1510
©
Subscription Computing e-Utility
Single point of contact
Customization
Protection
Personalization
24x7 SupportPrice per seat offering
Training & Education
Problem Detection/Resolution Hassle
free IT
Stay up-to-date
David Bantz, Ajay Mohindra and Dennis Shea , "Subscription Computing", submitted to IEEE Internet Computing
©
Virtualization on the Client
shared resources
dedicatedresources
hypervisor
Operatingsystem #1
Operating system #2
RAMUSB- peripherals
Switched resources:- context (registers)Emulated resources:-Graphics- NIC
RAMUSB- peripherals
app
dedicatedresources
app app app
Virtual Machine #1 Virtual Machine #2
©
Virtualization on the Client
Corporate apps
personalapps Firewall
Securityfeatures
Windows OS
to service provider
secure, special purpose link to service provider
OldApps
(Win 95)
Grid Computing
Apps(Linux orWindows)
PersonalApps
(Win2K, ME)Highly managedCorporate apps
(Win 2K) RentedApps
(Win 2K,XP)
Trusted Client(secure Linux)
Autonomicconnectivity to all networks
to all networks
Evolves to
©
Autonomic Computing Evolution
Homogeneous Components Interacting
Heterogeneous Components Interacting
Components
SNMP
Adaptive network routing, Network congestion control
High availability clustering
ESSRAID
DB OptimizerVirus Management
Collective Intelligence Storage Bricks
Oceano
Mounties
Workload Management
Software Rejuvenation
eLiza
SMART/LEO
Levels of SophisticationKnown
examplesCurrent
DirectionsFutureGoal
New packaging concepts for
storageSubscription computing
AutonomicComputing Stack
Social PolicyDB/Storage
Co-optimization
More of the same and better
©
Goal-oriented Recovery in a Heterogeneous System
Mounties
Central
RMgrRMgr
RMgrEvent
Facility Events
Mounties
Agent
Cluster InfrastructureRegistryHeart BeatMessaging
Even
ts
CommandsCommands
" S. Fakhouri, et al., “Active Middleware Services in a Decision Support System for Managing Highly Available Distributed Resources“, Middleware 2000
©
Goal-oriented Recovery in a Heterogeneous System
Optimizer
Evaluator & Decision
Processing Service
Pre-Processor
Post-Processor
Gossamers
Repository
Event HandlingEvents FromEvent Facility
Mounties Central: Internals" S. Fakhouri, et al., “Active Middleware Services in a Decision Support System for Managing Highly Available Distributed Resources“, Middleware 2000
©
Autonomic Computing Stack
Application and Integration Middleware
DataBase and/or File System
Operating System
Storage SystemAutonomic Computing Agent
Autonomic Computing Agent
Autonomic Computing Agent
Autonomic Computing Agent
humanshumansHumans
BusinessProcesses Internet
©
The Autonomic Agent
ManagedComponent
Policy-basedAutonomic Agent
(monitor, diagnose, act )*
may be model-based
History
Measurement
Measurement
Workload and service agreements
(goals, SLAs, etc)
Hints andDirections
Alerts andmeasurement
Policy Exchange and negotiation,
alerts(to other agents)
umbilical
©
Autonomic Computing Evolution
Homogeneous Components Interacting
Serving the World(people, business processes)
Heterogeneous Components Interacting
Components
SMS
SNMP
Adaptive network routing, Network congestion control
High availability clustering
ESSRAID
DB OptimizerVirus Management
Collective Intelligence Storage Bricks
Oceano
Mounties
Workload Management
Software Rejuvenation
eLiza
SMART/LEO
Levels of SophisticationKnown
examplesCurrent
DirectionsFutureGoal
New packaging concepts for
storageSubscription computing
Policy Language and Protocols
AutonomicComputing Stack
Social PolicyDB/Storage
Co-optimization
More of the same and better
Storage Tank
Policy Management
©
Policy Managed Storage: Storage Tank
AIXClient
Win 2000Client
SolarisClient
LinuxClient
SAN
Cluster Data Controller
Metadata
MetadataController
MetadataController
MetadataController
IP Network Storage Tank Infrastructure
Backup
- automated, policy-based storage and data management- high performance, multi-platform file sharing
Metadata policyData sharingData backup and restore
©
Autonomic Computing Evolution
Homogeneous Components Interacting
Serving the World(people, business processes)
Heterogeneous Components Interacting
Components
SMS
SNMP
Adaptive network routing, Network congestion control
High availability clustering
ESSRAID
DB OptimizerVirus Management
Collective Intelligence Storage Bricks
Oceano
Mounties
Workload Management
Software Rejuvenation
eLiza
SMART/LEO
Levels of SophisticationKnown
examplesCurrent
DirectionsFutureGoal
New packaging concepts for
storageSubscription computing
Policy Language and Protocols
AutonomicComputing Stack
Social PolicyDB/Storage
Co-optimization
More of the same and better
Storage Tank
Policy Management
Collaboration-Academia-Government-Industry
©
What others are saying about Autonomic Computing:
“We need to focus on : Availability… Maintainability… Scalability… Cost… Performance…”
“Improving recovery/repair [Recovery-Oriented Computing] improves availability”
David A. Patterson,Pardee Chair of Computer Science,
University of California, Berkeley
John L. Hennessy,President, Stanford University
• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.
• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.
©
What others are saying about Autonomic Computing:
“Trouble Free systems: Build a system used by millions of people each day, administered and managed by a ½ time person.”
“For computers to be taken for granted, they must always be available wherever and whenever people need them, they must reliably protect personal information from misuse and give people control over how their data is used, and they must be unfailingly secure. We call this concept Trustworthy Computing.”
Bill Gates,Chairman and Chief Software Architect,
Microsoft
Jim Gray,Distinguished Engineer and
Manager, Microsoft's Bay Area Research Center
©
What others are saying about Autonomic Computing:
“Planetary scale computing: A new computing model that allocates IT resources on demand anywhere”
“The new economics requires that systems be autonomic: autoinstalling, automanaging, autohealing, and autoprogramming.”
Vinod Khosla,General Partner of Kleiner Perkins
Caufield & Byers
Patrick Scaglia,Center Director, Internet and Computing
Platforms Technologies Center, HP Laboratories
• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.
• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.
©
The Autonomic Computing Challenge
Problem has been with us for a long time and will not be solved overnight.
Successful approach will be open, interdisciplinary, ambitious, cooperative, real.
Participation of academia, government and industry needed.
True autonomic computing is inevitable, but we must act now to drive the vision.
Welcome cooperation to develop necessary standards.
Autonomic Computing is our next Grand Challenge
http://www.ibm.com/research/autonomic
©