oreilly ac talk - ibm - united states

36
Robert Morris Autonomic Computing Autonomic Computing Director of IBM Almaden Research Center VP, Personal Systems & Storage [email protected] IBM Almaden Research Center ©

Upload: others

Post on 12-Sep-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Oreilly AC Talk - IBM - United States

Robert Morris

Autonomic ComputingAutonomic Computing

Director of IBM Almaden Research CenterVP, Personal Systems & Storage

[email protected]

IBM AlmadenResearch Center

©

Page 2: Oreilly AC Talk - IBM - United States

$1000 BuysC

omputations per second

1E+12

1E+9

1E+6

1E+3

1E+0

1E-3

1E-5

1900 1920 1940 1960 1980 2000 2020

Year

Com

putations per second

Electro-mechanicalMechanical

Vacuum TubeDiscrete TransistorIntegrated Circuit

After:"Mind Children: The Future of Robot and Human Intelligence," Hans Moravec, Harvard University Press, 1988,"The Age of Spiritual Machines: When Computers Exceed Human Intelligence" Ray Kurzweil, Viking, 1999.

©

Page 3: Oreilly AC Talk - IBM - United States

The High Cost of IT Management

For example: the cost to manage storage is typically twice the cost of the actual storage system.

Storage: What $3 million bought in 1984 and 2000.

(1) J. P. Gelb, "System-managed storage," IBM Systems Journal, Vol 28, No. 1, 1989 pp. 77-103.

(2) "Storage on Tap: Understanding the Business Value of Storage Service Providers", ITCentrix report, March 2001.

(3) "Server Storage and RAID Worldwide" (SRRD-WW-MS-9901), Gartner Group/Dataquest report, May 1999.

1984 2000

$2 millionStorageAdministration

$1

$2

$3 mil

$1

$2

$3 mil

$1 millionSystem

$1 millionStorage Administration

$2 millionSystem

©

Page 4: Oreilly AC Talk - IBM - United States

Making the Front Page

America Online6 August 1996 outage: 24 hoursMaintenance/Human ErrorCost: $3 million in rebatesInvestment: ???

AT&T13 April 1998 outage: Six to 26

hoursSoftware UpgradeCost: $40 million in rebatesForced to file SLAs with the

FCC (frame relay)

eBayOutage: 22 hours 12 June 1999Operating System FailureCost: $3 million to $5 million

revenue hit and 26% declinein stock price

E*Trade3 February 1999 through 3 March 1999: Four outages of at least five

hours System UpgradesCost: ???? 22 percent stock price hit on 5

February 1999

Dev. Bank of Singapore1 July 1999 to August 1999: Processing ErrorsIncorrect debiting of POS

due to a system overloadCost: Embarrassment/loss of

integrity; interest charges

Charles Schwab & Co.24 February 1999 through 21 April 1999: Four outages of at least four hours Upgrades/Operator ErrorsCost: ???; Announced that it had made $70

million in new infrastructure investment.

Causes of UnplannedApplication Downtime

TechnologyFailures20%

40%

40%

OperatorErrors

ApplicationFailures

NYSEJune 8, 2001>1700 stocks stopped trading for 90 minutesSoftware UpgradeCost: ???

©

Page 5: Oreilly AC Talk - IBM - United States
Page 6: Oreilly AC Talk - IBM - United States

The Bad News: Complexity

Complex heterogeneous infrastructures are a reality and are hard!

DIRECTOR AND

SECURITY SERVICES

EXISTINGAPPLICATIONS

AND DATA

BUSINESSDATA

DATADATASERVERSERVERWEBWEB

APP APP SERVERSERVER

STORAGE AREA

NETWORK

BUSINESSPROCESSES AND

EXTERNAL SERVICES

Inte

rnet

Fire

wal

l

WEBWEBSERVERSERVER

DNSDNSSERVERSERVER

DATA

Cac

he

Load

Bal

ance

r

Inte

rnet

Fire

wal

l

Dozens of systems and applications

Hundreds of components

Thousands of tuning

parameters

©

Page 7: Oreilly AC Talk - IBM - United States

Autonomic Computing Characteristics

Self-configuring

Adapt automatically to the dynamically changing environments

Self-optimizing

Monitor and tune resources automatically

Self-protecting

Anticipate, detect, identify, and protect against attacks from anywhere

Self-healing

Discover, diagnose, and react to disruptions

©

Page 8: Oreilly AC Talk - IBM - United States

The Scope of Autonomic Computing

Applications

Middleware,Software

Operating Systems

Server, Storage, Network

Holistic approach:• Automation & manageability enablement at each system layer• Federated heterogeneous components interacting cohesively

©

Page 9: Oreilly AC Talk - IBM - United States

Autonomic Computing Evolution

Components

Electronic Switching Systems

RAID/ IBM SharkDB Optimizer

Virus Management Software Rejuvenation

eLiza

SMART/LEO

Levels of SophisticationKnown

examplesCurrent

DirectionsFutureGoal

More of the same and better

©

Page 10: Oreilly AC Talk - IBM - United States

Autonomic Computing is shipping now

eLiza features on IBM’s e-servers:

Self-Optimizing(Dynamic Service Level Attainment)

Self-Configuring(Define “on the fly”)

Self-Healing(Business continuance)

Self-Protecting(Safeguard assets)

• Dynamic LPAR• Intelligent Resource Director

• Remote Deployment Manager• Auto discovery and update of firmware

• Software Rejuvenation• Automatic de-allocation of processors, cache, LPAR

• Self-protecting kernel• Agent-building learning environment

©

Page 11: Oreilly AC Talk - IBM - United States

Product

Store

Month

Optimizing an RDBMS

DB2 Optimizer

Consider Environment• CPU # & Speed• Disk #, latency, throughput• Memory Available• Parallelism• Concurrency

Consider Data• Amount• Distribution• Patterns

SQL Requests

Extensive rewrite for complex SQL

No Intervention Required!

Efficient System Usage!

Excellent Performance!P. G. Selinger et al, “Access Path Selection in a Relational Database Management System”, SIGMOD 1979, pp 23-34.

©

Page 12: Oreilly AC Talk - IBM - United States

LEO: Learning in Query Optimization

StatisticsOptimizerOptimizerOptimizer

Best PlanBest Best PlanPlan

Plan Execution

Plan Plan ExecutionExecution

SQL Compilation

Michael Stillger, Guy Lohman, Volker Mark, Mokhtar Kandil, "LEO -- DB2's LEarning Optimizer", Proceedings of Intl. Conf. on Very Large Databases (VLDB), Sept. 2001

©

Page 13: Oreilly AC Talk - IBM - United States

LEO: Learning in Query Optimization

StatisticsOptimizerOptimizerOptimizer

Best PlanBest Best PlanPlan

Plan Execution

Plan Plan ExecutionExecution

Estimated CardinalitiesEstimatedEstimated

CardinalitiesCardinalities

Actual Cardinalities

ActualActualCardinalitiesCardinalities

SQL Compilation

1. Monitor

©

Page 14: Oreilly AC Talk - IBM - United States

LEO: Learning in Query Optimization

AdjustmentsAdjustmentsAdjustments

StatisticsOptimizerOptimizerOptimizer

Best PlanBest Best PlanPlan

Plan Execution

Plan Plan ExecutionExecution

Estimated CardinalitiesEstimatedEstimated

CardinalitiesCardinalities

Actual Cardinalities

ActualActualCardinalitiesCardinalities

SQL Compilation

1. Monitor

2. Analyze(stats)

©

Page 15: Oreilly AC Talk - IBM - United States

LEO: Learning in Query Optimization

AdjustmentsAdjustmentsAdjustments

StatisticsOptimizerOptimizerOptimizer

Best PlanBest Best PlanPlan

Plan Execution

Plan Plan ExecutionExecution

Estimated CardinalitiesEstimatedEstimated

CardinalitiesCardinalities

Actual Cardinalities

ActualActualCardinalitiesCardinalities

SQL Compilation

1. Monitor

2. Analyze(stats)

3. Feedback

©

Page 16: Oreilly AC Talk - IBM - United States

LEO: Learning in Query Optimization

AdjustmentsAdjustmentsAdjustments

StatisticsOptimizerOptimizerOptimizer

Best PlanBest Best PlanPlan

Plan Execution

Plan Plan ExecutionExecution

Estimated CardinalitiesEstimatedEstimated

CardinalitiesCardinalities

Actual Cardinalities

ActualActualCardinalitiesCardinalities

SQL Compilation

1. Monitor

2. Analyze(stats)

3. Feedback4. Exploit

©

Page 17: Oreilly AC Talk - IBM - United States

Autonomic Computing Evolution

Homogeneous Components Interacting

Components

Adaptive network routing, Network congestion control

High availability clustering

ESSRAID

DB OptimizerVirus Management

Collective Intelligence Storage Bricks

Oceano

Software Rejuvenation

eLiza

SMART/LEO

Levels of SophisticationKnown

examplesCurrent

DirectionsFutureGoal

New packaging concepts for

storageSubscription computing

More of the same and better

©

Page 18: Oreilly AC Talk - IBM - United States

Oceano: Technology for Multi-Customer Server Farms

Today • Fixed resource allocation• Separate management• Best effort basis, using own

resources

K. Appleby et al , "Oceano - SLA Based Management of a Computing Utility", 7th IFIP/IEEE Interenational Symposium on Integrated Network Management (IM), 2001.

IBM

Macy’s

Requests

Macy's SportsWeb

Throttles incoming requests

IBM IBM

SportsWebIBM

IBMIBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM

Virtualized hardware

ROUTER

• "Free pool" management

• Dynamically allocated

• Challenges: SLA Management, security, privacy, load sharing, overload control, accounting

FutureRequests

Virtualized storage

©

Page 19: Oreilly AC Talk - IBM - United States

Collective Intelligent Storage Bricks

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

Intelligent Brick

•Higher redundancy than RAID•Cooling performance hot spots by proactive copies• Improved sparing to eliminate repair actions for life of system

©

Page 20: Oreilly AC Talk - IBM - United States

IceCube: Collection of Intelligent Bricks

Storage Brick

Various "Ice Cube" shapes©

Page 21: Oreilly AC Talk - IBM - United States

Collective Intelligent Bricks

IceCube: Up to 1 Petabyte ( bytes)

Total power = 250 kW

AC

AC

>75 dB air noise

Cool

Total power = 220 kW Quiet! (64 dB)

WaterChiller

31 kW

WaterChiller

25 kW

• 32 Racks• 640 CIBs• 8x240 GB 3.5" Disks per CIB• 275 W per CIB• 5.5 kW per Rack

• 640 CIBs• 8x240 GB 3.5" Disks per CIB• 275 W per CIB

1510

1510

©

Page 22: Oreilly AC Talk - IBM - United States

Subscription Computing e-Utility

Single point of contact

Customization

Protection

Personalization

24x7 SupportPrice per seat offering

Training & Education

Problem Detection/Resolution Hassle

free IT

Stay up-to-date

David Bantz, Ajay Mohindra and Dennis Shea , "Subscription Computing", submitted to IEEE Internet Computing

©

Page 23: Oreilly AC Talk - IBM - United States

Virtualization on the Client

shared resources

dedicatedresources

hypervisor

Operatingsystem #1

Operating system #2

RAMUSB- peripherals

Switched resources:- context (registers)Emulated resources:-Graphics- NIC

RAMUSB- peripherals

app

dedicatedresources

app app app

Virtual Machine #1 Virtual Machine #2

©

Page 24: Oreilly AC Talk - IBM - United States

Virtualization on the Client

Corporate apps

personalapps Firewall

Securityfeatures

Windows OS

to service provider

secure, special purpose link to service provider

OldApps

(Win 95)

Grid Computing

Apps(Linux orWindows)

PersonalApps

(Win2K, ME)Highly managedCorporate apps

(Win 2K) RentedApps

(Win 2K,XP)

Trusted Client(secure Linux)

Autonomicconnectivity to all networks

to all networks

Evolves to

©

Page 25: Oreilly AC Talk - IBM - United States

Autonomic Computing Evolution

Homogeneous Components Interacting

Heterogeneous Components Interacting

Components

SNMP

Adaptive network routing, Network congestion control

High availability clustering

ESSRAID

DB OptimizerVirus Management

Collective Intelligence Storage Bricks

Oceano

Mounties

Workload Management

Software Rejuvenation

eLiza

SMART/LEO

Levels of SophisticationKnown

examplesCurrent

DirectionsFutureGoal

New packaging concepts for

storageSubscription computing

AutonomicComputing Stack

Social PolicyDB/Storage

Co-optimization

More of the same and better

©

Page 26: Oreilly AC Talk - IBM - United States

Goal-oriented Recovery in a Heterogeneous System

Mounties

Central

RMgrRMgr

RMgrEvent

Facility Events

Mounties

Agent

Cluster InfrastructureRegistryHeart BeatMessaging

Even

ts

CommandsCommands

" S. Fakhouri, et al., “Active Middleware Services in a Decision Support System for Managing Highly Available Distributed Resources“, Middleware 2000

©

Page 27: Oreilly AC Talk - IBM - United States

Goal-oriented Recovery in a Heterogeneous System

Optimizer

Evaluator & Decision

Processing Service

Pre-Processor

Post-Processor

Gossamers

Repository

Event HandlingEvents FromEvent Facility

Mounties Central: Internals" S. Fakhouri, et al., “Active Middleware Services in a Decision Support System for Managing Highly Available Distributed Resources“, Middleware 2000

©

Page 28: Oreilly AC Talk - IBM - United States

Autonomic Computing Stack

Application and Integration Middleware

DataBase and/or File System

Operating System

Storage SystemAutonomic Computing Agent

Autonomic Computing Agent

Autonomic Computing Agent

Autonomic Computing Agent

humanshumansHumans

BusinessProcesses Internet

©

Page 29: Oreilly AC Talk - IBM - United States

The Autonomic Agent

ManagedComponent

Policy-basedAutonomic Agent

(monitor, diagnose, act )*

may be model-based

History

Measurement

Measurement

Workload and service agreements

(goals, SLAs, etc)

Hints andDirections

Alerts andmeasurement

Policy Exchange and negotiation,

alerts(to other agents)

umbilical

©

Page 30: Oreilly AC Talk - IBM - United States

Autonomic Computing Evolution

Homogeneous Components Interacting

Serving the World(people, business processes)

Heterogeneous Components Interacting

Components

SMS

SNMP

Adaptive network routing, Network congestion control

High availability clustering

ESSRAID

DB OptimizerVirus Management

Collective Intelligence Storage Bricks

Oceano

Mounties

Workload Management

Software Rejuvenation

eLiza

SMART/LEO

Levels of SophisticationKnown

examplesCurrent

DirectionsFutureGoal

New packaging concepts for

storageSubscription computing

Policy Language and Protocols

AutonomicComputing Stack

Social PolicyDB/Storage

Co-optimization

More of the same and better

Storage Tank

Policy Management

©

Page 31: Oreilly AC Talk - IBM - United States

Policy Managed Storage: Storage Tank

AIXClient

Win 2000Client

SolarisClient

LinuxClient

SAN

Cluster Data Controller

Metadata

MetadataController

MetadataController

MetadataController

IP Network Storage Tank Infrastructure

Backup

- automated, policy-based storage and data management- high performance, multi-platform file sharing

Metadata policyData sharingData backup and restore

©

Page 32: Oreilly AC Talk - IBM - United States

Autonomic Computing Evolution

Homogeneous Components Interacting

Serving the World(people, business processes)

Heterogeneous Components Interacting

Components

SMS

SNMP

Adaptive network routing, Network congestion control

High availability clustering

ESSRAID

DB OptimizerVirus Management

Collective Intelligence Storage Bricks

Oceano

Mounties

Workload Management

Software Rejuvenation

eLiza

SMART/LEO

Levels of SophisticationKnown

examplesCurrent

DirectionsFutureGoal

New packaging concepts for

storageSubscription computing

Policy Language and Protocols

AutonomicComputing Stack

Social PolicyDB/Storage

Co-optimization

More of the same and better

Storage Tank

Policy Management

Collaboration-Academia-Government-Industry

©

Page 33: Oreilly AC Talk - IBM - United States

What others are saying about Autonomic Computing:

“We need to focus on : Availability… Maintainability… Scalability… Cost… Performance…”

“Improving recovery/repair [Recovery-Oriented Computing] improves availability”

David A. Patterson,Pardee Chair of Computer Science,

University of California, Berkeley

John L. Hennessy,President, Stanford University

• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.

• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.

©

Page 34: Oreilly AC Talk - IBM - United States

What others are saying about Autonomic Computing:

“Trouble Free systems: Build a system used by millions of people each day, administered and managed by a ½ time person.”

“For computers to be taken for granted, they must always be available wherever and whenever people need them, they must reliably protect personal information from misuse and give people control over how their data is used, and they must be unfailingly secure. We call this concept Trustworthy Computing.”

Bill Gates,Chairman and Chief Software Architect,

Microsoft

Jim Gray,Distinguished Engineer and

Manager, Microsoft's Bay Area Research Center

©

Page 35: Oreilly AC Talk - IBM - United States

What others are saying about Autonomic Computing:

“Planetary scale computing: A new computing model that allocates IT resources on demand anywhere”

“The new economics requires that systems be autonomic: autoinstalling, automanaging, autohealing, and autoprogramming.”

Vinod Khosla,General Partner of Kleiner Perkins

Caufield & Byers

Patrick Scaglia,Center Director, Internet and Computing

Platforms Technologies Center, HP Laboratories

• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.

• Presentation to IBM Almaden Institute, “Autonomic Computing”, April 2002.

©

Page 36: Oreilly AC Talk - IBM - United States

The Autonomic Computing Challenge

Problem has been with us for a long time and will not be solved overnight.

Successful approach will be open, interdisciplinary, ambitious, cooperative, real.

Participation of academia, government and industry needed.

True autonomic computing is inevitable, but we must act now to drive the vision.

Welcome cooperation to develop necessary standards.

Autonomic Computing is our next Grand Challenge

http://www.ibm.com/research/autonomic

©