autonomic computing © 2003 ibm corporation thomas studwell autonomic computing - problem...

13
Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination [email protected] Common Base Events

Upload: lorena-henderson

Post on 19-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

© 2003 IBM Corporation

Thomas Studwell Autonomic Computing - Problem [email protected]

Common Base Events

Page 2: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Objective

To open dialog toward agreement of a Common Base Event specification.

Page 3: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Agenda

Problems Facing Today's Data Collection The 3 Tuple Canonical Situation Canonical Situation Data Format: Common Base Event

Page 4: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Problems Facing Today's Data Collection Complexity of eBusiness

Collection of distributed and heterogeneous software and hardware components Variety of Data and Collectors/Adapters

Consume and publish proprietary data formats

Require ad hoc and product specific code Data format and APIs

Design and Standards considerations Standardization of management solution is incomplete

Different skills set to configure, maintain, and tune

Difficult to correlate for e2e problem diagnostics Instrumentation

Many-to-Many

Standards compliance

Customer pain and cost of ownership

Page 5: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

The 3 Tuple

Complexity of data increases when problem occurs in a multi component solution

Without standards the event data are of little value to autonomic management in problem determination and action in response

To alleviate this, event data are structured in 3 categories The identification of the component that is affected by the situation This is also known as the source of a situation

The identification of the component that is reporting the situation This is also known as the reporter of a situation It may be the same as the source component of the situation

The situation data itself Properties or attributes that describes the situations

Page 6: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Canonical Situation Situation is defined as the data that a component reports for external consumption

by general/product-specific management applications Situations are commonly communicated through messages logged and or forwarded to a consumer of the data, such as administrative or management tools

Examples of situations includes: memory allocation failure, buffer overflow, i/o failure, etc

Each product reports the situation in their own format, using their own terminology Makes correlating events between products difficult Requires further standardization of notification contents, categorization, and taxonomy of situations

The goal of the common situations is not to drastically change what the components are currently doing, rather, to put some structure and rigor behind how components report situations

Canonical situation, using the common data formats

Canonical representation of the situation is used for analysis

Adapter can be used to convert the data to a canonical situation Create a taxonomy for identifying and classifying situations

Category, Type, Disposition, Scope, Task, etc …

Apply taxonomy to product logs

Page 7: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Canonical Situation Categories START

These are message that deal with the start up process for a component. Message that indicate that a component has begun the startup process, that it has finished the startup process, or that it has aborted the startup process all fall into this category.

STOP These are message that deal with the shutdown process for a component. Message that indicate that a component has begun to stop, that it has stopped, or that the stopping process has failed all fall into this category.

FEATURE These are messages that announce a feature of a component. Message that indicate things like services being available and services or features being unavailable fall into this category.

DEPENDENCY These are messages that components produce to say that they cannot find some component or feature that they need. Messages that say a resource was not found, or that an application or subsystem that was unavailable, fall into this category.

REQUEST These are messages that a component uses to identify the completion status of a request. Typically these requests are complex management tasks or transactions that a component undertakes on behalf of a requester and not the mainline simple requests or transactions.

CONFIGURE These are messages that components use to identify their configuration. Messages that describe current configuration state and configuration changes fall into this category.

CONNECTThese are messages that components use to identify aspects about a connection to another component. Messages that say a connection failed, that a connection was created, or that a connection was ended all fall into this category.

CREATE These are messages documenting when a component creates an entity. Messages telling that a document got created, or a file was created, or an EJB was created all fall into this category.

REPORT These are the messages that are reported from the component, such as heartbeat or performance information. Data such as current CPU utilization, current memory heap size, etc. would fall into this category.

AVAILABLE These are messages that are reported from the component, regarding its operational state and availability. This situation provides a context for operations that can be performed on the component by distinguishing if a product is installed, operational and ready to process functional requests, or operational and ready/not ready to process management requests.

Page 8: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Example of Situations Category and Taxonomy

START_SITUATION = (START_SITUATION_NAME, SUCCESS_DISPOSITION, START_SITUATION_QUALIFIER);

START_SITUATION_NAME = “START”;SUCCESS_DISPOSITION = (“SUCCESSFUL” | “UNSUCCESSFUL”);

START_SITUATION_QUALIFIER = (“START INITIATED” | “START COMPLETED” | “RESTART INITIATED” | “STARTING”);

WSVR0200I: Starting application: PlantsByWebSphere START, SUCCESSFUL, START INITIATED

WSVR0221I: Application started: trade3

START, SUCCESSFUL, START COMPLETE

Other examples: WSVR0024I: Server server1 stopped

STOP, SUCCESSFUL, STOP COMPLETED

SRVE0026E: [Servlet Error]-[]: java.lang.IllegalStateException … Primary Message SQL0913N Unsuccessful execution caused by deadlock or timeout Secondary Message

CONNECT, UNSUCCESSFUL, CLOSED

Page 9: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Distillation of existing work to CBE

Several existing formats were analyzed Common elements define essential elements

Reviewed formats and types include: PD Artifact, TEC Event, Tivoli Log XML, BEI Event, BEI Context, JMX

Notification, SNMP, CIM_AlertIndication, Java 1.4, Apache commons logging, WAS, JRAS

Mappings shown in spec

Page 10: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

The Data

What component is observer of the Situation

* ComponentIDComponent affected by the problem What Component Observed the Situation

* ComponentIDComponent reporting the problem

extensionNamelocalInstanceIdglobalInstanceIdcreationTimeseverityprioritysituationType

Msg

repeatCountelapsedTimesequenceNumber

msgDataElement

associatedEventscontextDataElements

extendedDataElements

* ComponentID

(policy)

(cor/relation)

(extensibility)

location locationType applicationexecutionEnv

component subComponentcomponentType

instanceId processIdthreadId

For details please refer to the Canonical Situation Data Format: the Common Base Events (ACAB.BO0301)

Situation data

Page 11: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Canonical Situation Data Format: Common Base Event (CBE)

Canonical Situation Data Format: Common Base EventFacilitates the effective exchange and correlation of data among disparate enterprise applications that support logging, management, problem determination, autonomic computing and e-business functions in an enterprise.

Defines structure of an event sent as the result of a situation, in a consistent and a common format Provides flexibility to allow for adoption to application specific needs

CBE Extensibility Extended Data Element

Allows for product specific/required attributes that are not common across product groups and not accounted for in the CBE

Provides capabilities to add "named" properties name, type, values (or hexValue), and optional children to create a hierarchy of this elements

Provides capabilities to add monitoring and resource usage data

Product Specific Schema Allows to include product specific schema in the "any namespace" of the CBE Schema

<xsd: any namespace="##other" minOccurs="0" maxOccurs="unboundeu

Page 12: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Common Base Event/Situation Data - Model

ComponentIdentification

location : String

locationType : String

application : String

executionEnvironment : String

component : String

subComponent : String

componentIdType : String

instanceId : String

processId : String

threadId : String

AssociatedEvent

name : String

type : String

MsgDataElement

msgId : String

msgIdType : String

msgCatalogId : String

msgCatalogTokens : String[]

msgCatalog : String

msgLocale : String

ContextDataElement

contextId : Stringtype : Stringname : String

contextValue : String

CommonBaseEvent

extensionName : String

localInstanceId : String

globalInstanceId : String

creationTime : String

severity : short

priority : short

situationType : String

msg : String

repeatCount : short

elapsedTime : String

sequenceNumber : long

version : String = commonbaseevent1_0

11 11

reporterComponentId

11 11

sourceComponentId

0..n

1

0..n

1associatedEvents

0..n

1

0..n

1

resolvedEvents

10..1 10..1

msgDataElement

1

0..n

1

0..n

contextDataElements

ExtendedDataElement

name : String

type : String

values : String[]

hexValue : byte[]

id : String

0..n

1

0..n

1

extendedDataElements

0..n

1

0..n

dataRefs

1

Page 13: Autonomic Computing © 2003 IBM Corporation Thomas Studwell Autonomic Computing - Problem Determination studwell@us.ibm.com Common Base Events

Autonomic Computing

ibm.com/autonomic © July/2003 IBM Corporation

Common Base Event/Situation Data - Model

Common Base Event

extensionName

localInstanceId

globalInstanceId

creationTime

severity

priority

situationType

Msg

repeatCount

elapsedTime

sequenceNumber

msgDataElement

reporterComponentId

sourceComponentId

associatedEvents

contextDataElements

extendedDataElements

Message Data Element

msgId

msgIdType

msgCatalogId

msgCatalogTokens

msgCatalog

msgLocale

Component Identification

location

locationType

application

executionEnvironment

component

subComponent

componentIdType

instanceId

processId

threadId

Component Identification

location

locationType

application

executionEnvironment

component

subComponent

componentIdType

instanceId

processId

threadId

AssociatedEvent

assocationEngineresolvedEvents

AssociatedEngine

name

type...

Context DataElement

contextId

type

name

contextValue

...

Extended DataElement

id, name

type

values

hexValue

dataRefs

Extended DataElement …

CommonBaseEvent …