Transcript
Page 1: Chapter Template for Service-Oriented Computing Series

1. Overlord: SOA Governance from JBossThis document discusses what Overlord will provide within the context of SOA Governance. The first section will discuss only core requirements and components. The second section will discuss the roadmap and where partners may be able to play a role.

Any computer system, whether it is centralized or distributed, needs some form of governance, i.e., the act of monitoring and managing the system. Such governance may be as simple as ensuring only authorized users have access to services, or as complex as guaranteeing the system and its components maintain a level of availability or reliability in the presence of failures or increased system load. Managing distributed systems has always been a critical aspect for users, developers and administrators. As those distributed systems grew in scope and scalability, spanning multiple organizations with different infrastructures and trust boundaries, governance became more difficult but even more important.

Governance deals with the processes by which a system operates. In order to have successful governance, some form of management, monitoring and administration is required for these processes. SOA governance is the discipline of creating policies and communicating and enforcing them.

In order to govern a SOA infrastructure (SOI) such as an ESB, there needs to be a framework in place that allows policies and Service Level Agreements (SLAs) to be defined, enforced and audited across multiple security and identity domains. Such a framework must be able to define policies for individual services and then either enforce them or provide means by which they can be managed and enforced by some other component within the SOA. One aspect of SOA governance that is implied by most definitions but often overlooked is the necessity to communicate such policies to users of services.

Any implementation of governance provided by an SOI should be centered on the four principles of enterprise architecture: the people involved, the processes, the technology and services. A good governance implementation needs to be supported by a hierarchical organizational reporting structure. This impacts on an SOI in several ways, with the most obvious being that the different levels in the reporting structure (e.g., developers, business managers, service sponsors etc.) need different views onto the system as services are built and deployed. Unfortunately at this stage in the evolution of the Enterprise Service Bus as a SOA infrastructure, many implementations present only a single view (the developer view) and organizations must rely on ad hoc mechanisms to cover the other cases. This inevitably leads to an impedance mismatch (translation difficulties) as managers try to understand how to map low-level details onto their expectations. Within Overlord we believe strongly that all good SOIs must eventually cater for everyone involved in the SOA development and runtime within the same environment.

In the following sections we shall discuss SOA and governance from the perspective of JBoss products currently and in the future. It should be realized that all of what we will outline within this document will eventually become implemented within the JBoss SOI.

1

Page 2: Chapter Template for Service-Oriented Computing Series

1.1 Infrastructure and tooling supportLow-level infrastructure support for governance will come from JON and more native support within the ESB and other projects. Importantly our SOA Infrastructure has always made the registry (UDDI in our case) a central component, which helps drive some aspects of governance into the minds of users and developers. Furthermore, all good governance solutions need a repository for storing artifacts (e.g., service definitions) and we have development efforts in the Guvnor and DNA projects to fill that important gap.

Depending upon the role of the user or component some governance tools within Overlord to be Eclipse based, whereas others will be Web based. Plus there will be some tools that will have representations in both arenas because some of the capabilities will be need to be available across different roles in a format that is natural for that role. For instance, sometimes what a sys admin needs to do is also what a developer needs to do (e.g., inspect a service contract).

Managing a LAN-based distributed system can be hard enough: imagine expanding that so it covers different business domains where the developer or deployer do not control all of the underlying infrastructure and cannot work on the assumption that people are trustworthy (and possibly live on different sides of the planet!) With SOA governance there are run-time and design-time requirements: typically a runtime component executes to ensure things like Service Level Agreements (SLAs) are maintained, whereas a tooling-based governance component could be for run-time monitoring/management or design time service composition.

With Overlord, you'll be able to graphically display:MTTF/MTTR information on behalf of specific nodes and services on those

nodes. Also for all nodes and services that are deployed.throughput for services.time taken to process specific types of messages (e.g., how long to do

transformations, how long to do transformations on behalf of user X).number of requests sent to services during the lifetime of the service/node

(in general, it is always important to distinguish between services and the nodes on which they execute).number of faults (service/node) in a given duration.information about where messages are being received.information about where messages are being sent (responses as well as

faults).potential dependency tracking data. This can be used to determine sources

of common failure. Can also be used when deciding whether or not (and where) to migrate services, for improved availability or performance.what versions of services exist within a specific process (VM).Includes sending probe messages that can test availability and performance

on request. However, this functionality is also duplicated into the design-time tooling.All of this information may be obtained periodically from a central (though more realistically a federated) data store or direct from the services themselves. However, both sys admins and developers will need to be able to connect to services (and composites) and inspect their governance criteria at

2

Page 3: Chapter Template for Service-Oriented Computing Series

will, e.g., when was the last time they violated a contract, why and under what input messages/state: the dynamic factor is incredibly important. This information needs to be made available across individual services as well as the entire SOA-P deployment.

Within the Overlord project we are working on a separate and dedicated governance console that is used to receive alarms/warnings when contracts/SLAs are violated or close to being violated. Obviously the console is only one such destination for these alerts: sys admin inboxes are just as important. However, that's where the infrastructure comes into play.

Traditional management tooling (e.g., via JMX) including:

start/stop a service.suspend/resume a service.

add/update restriction lists for services. This limits the list of receivers that a service considers valid and will process messages from. A similar list of destinations for responses will exist. This plays into the role/relationship concept because although a developer may not consider the issue of security (maybe can't, given that services could be deployed into environments that did not exist when the developer was building the service), the sys admin (or service container admin) will have to.migrate services (and potentially dependent services).inspect service contract.update service definition.attach and tune specific service parameters.

Fortunately we'll get a lot of this from close integration with the JBoss Operations Network.

Design time tooling from Overlord includes:

defining the service definition/contract, which includes specifying what message types it allows. This is tied into the service implementation in order that the SOI can verify incoming messages against this contract for validity. Part of the contract will also include security and role information which will define who can interact with the service (may be fine grained based on time of day, specific message type, etc.) Policies are attached at this level on a per service or per operation basis (if not defined on an operation basis, the service level policy is taken if defined).policy definition/construction, tracking and enforcement. Not just part of

the development tool, but also an integral part of the underlying SOI. Policies need to be shared so that other developers can utilise them in their own service construction. Typically these will be stored in the repository.service construction from other services, i.e., composite services. This has

an input on SLA and on governance enforcement. In some cases a physical instance of the service may not exist either and the infrastructure becomes responsible for imposing the abstraction of a service by directing interactions accordingly.inspecting the registry and repository during design time to locate and

inspect desired services for composition within applications. Also ties into runtime management so that the user can inspect all running services. This

3

Page 4: Chapter Template for Service-Oriented Computing Series

would also tie into our graphical process flow tool, by allowing a drag-and-drop approach to application construction.service development then into service deployment. The tool will allow the

user to view a list of available nodes and processes. The availability, performance etc. of those nodes will also be displayed (more tooling and infrastructure support). Then you can drag a service implementation on to the node and deploy it, either dynamically or statically. This ties into the runtime management tool that allows the user to view deployed services on nodes.Service implementation tooling:

from the client perspective you go to the registry and select the right service based on requirements (functional as well as non-functional). The associated contract defines the message format, as mentioned earlier. The tool will either auto-generate an appropriate stub for the client code or provide a way of tying state variables (from the application code, or incoming messages etc.) in to the outbound invocations on the service. At the tooling level, we're really looking at defining client tasks that the developer writes and defines the output instance variables that are hooked into the service's input variables. As far as the client tool is concerned, we are simply tying together these variables. Capabilities such as security and transactions may be exposed to the client.from the service developer perspective, we are defining services as

compositions of tasks, actions, dispatchers etc. In the graphical designer we specify the input variables that are required for each operation type (defined as specific messages). This also plays into the contract definition effort mentioned earlier, since the message formats accepted by a service are implicitly defined by the requirements on input state variables.WS-CDL tooling will also be tied into the run-time as well as design-time

governance aspects of Overlord.the need to be able to deploy services into a virtual environment to allow

them to be tested without affecting a running system. A service has to be able to be deployed in a test mode. What this means is that at a minimum the service is not available to arbitrary users. Test services should also not be deployed into a running process/container that is being used by other (non-test) services and applications in case they cause it to fail and, worst case scenario, take the entire process with them.Very important to our work here was the donation by Thomas Erl of his Service Modeler.

1.2 The Registry

Registries have always been seen as one of the key services in the SOA Triad. However, until recently most SOA infrastructures and ESBs ignored them (JBossESB was the first SOI to put a registry at its heart). The registry allows users to manage services deployed within the SOI, based on pre-set policies. It is also possible to store metadata about the services.

Both design-time and run-time policies are associated with services in the registry. Developers can define a uniform set of policies that are enforced by the SOI. The registry plays a key role in helping to enforce policies during the process of provisioning services within the SOI. At runtime, other components within the SOI or application may interact with the registry to find services and their run-time policies.

4

Page 5: Chapter Template for Service-Oriented Computing Series

With an SOI that has a complete governance implementation, those policies will then be enforced during the execution of the service. A service management solution can also update metadata (e.g., security capabilities) related to a policy in the registry.

Given that a registry is a key enabling technology for a mature SOI, a standardized approach to them is important: UDDI (Uniform Discovery, Description, and Integration), which provides a data model and standard interfaces for reading from and writing to the registry. The UDDI standard allows for any other system to access information about services and update that information.

SOA registries may be deployed in a distributed fashion based on organizational needs, e.g., an enterprise registry, departmental registries and application-specific registries and testing registries (where services that are only in use for testing purposes are described).

5

Page 6: Chapter Template for Service-Oriented Computing Series

1.3 Service Lifecycle

The Enterprise Service Bus as a concrete implementation platform for SOA places certain rules on the way in which services are developed, deployed and managed. These rules are often defined in terms of phases and referred to collectively as the Service Lifecycle and impact directly on the required capabilities of the governance framework we have outlined previously:

! Model

! Assemble

! Deploy

! Manage

Service lifecycle management concentrates on the development and deployment of services, whereas the governance aspect brings access control, policies etc. into the way in which services are used within a business process. Furthermore, as

6

Page 7: Chapter Template for Service-Oriented Computing Series

mentioned before, governance fills the critical management requirement for deployed services.

A service’s lifecycle management is affected by its relationship with other services in the environment. Irrespective of whether these services are provided by the SOI or as part of the application, services do not exist in isolation at any stage in their lifecycle.

The model and assemble phases of the service lifecycle are often referred to as the design-time aspect because they are concerned with the development of the service prior to its being deployed within the SOI. The first step in deploying a service is obviously identifying the requirement for the service and from that, what capabilities it should offer. This process can often take an arbitrary duration as it is typically iterative, particularly if multiple organizations or developers are involved. Sharing of services within an SOI at runtime is often a reflection of the sharing of services that goes on during the development stages of the service lifecycle.

Identifying the needs of a service within an SOI can be based on short-term or long-term requirements. A successful SOA-based approach would tend to look at the service from a wider perspective than a single application. Particularly in large scale SOIs, unless a service offers a very restrictive/specific capability, once it is deployed it is often difficult to predict how it will be used. Furthermore, within few exceptions, most SOA deployments are expected to run continuously for long durations and hence replacing a service may be difficult without having an adverse affect on the normal execution of the SOI: quiescent periods simply do not exist in many environments.

Service re-use is one of the core principles behind SOA. Unlike in previous incarnations, re-use is not about physically copying or sharing libraries and running them within your own execution environment (which can result in security breaches). Service re-use is simply that: re-using an existing service wherever it is deployed within the SOI.

As soon as a service is selected for design it is important to think in terms of contract and policy definition. These will be critical for a number of reasons including service re-use (“is this service really offering what I want?”) and governance (“is this service really doing what it said it would?”). Although it is possible to retrofit contracts and policies after the service has been developed, successful SOI deployments are frequently based on the approach of doing this as early in the design phase as possible.

During the assembly phase of the lifecycle, the service is to be developed, either from scratch or through leveraging existing services within the SOI. The latter offers the ability to more quickly develop and deploy new services, reducing time to market: one of the critical benefits offered by SOA. Leveraging existing infrastructural investment is important throughout SOA and an SOI that offers support for this is a good candidate for a development and deployment environment.

The SOI must facilitate design-time discovery of services and provide the capability to compose new services from existing services. This is often approached through the use of Registries and Repositories. Furthermore, it should give sufficient design-time tool support to structure the flow of business processes into the interactions between services.

7

Page 8: Chapter Template for Service-Oriented Computing Series

1.4 Identity within the SOI

In any computer system, user identity is critical to enforcing security restrictions and access control over services. Over the years many organizations have centralized their administration of user identities and the privileges associated with them, e.g., whether or not a specific individual is allowed administrator privileges. Centralizing makes security enforcement easier because there is only a single “site” to protect from intruders. Furthermore, some government or vertical industry rules and regulations require centralized management of this kind of information for compliance purposes.

Identity and Access Management products give assurances that individuals are who they appear to be and that they can be restricted. An SOI must provide some means by which a user (human or process) can establish its identity (obtain a credential) and then pass this to a target service in a format it understands. From an interoperability perspective, standard based formats are very important, or you will end up developing an isolated SOA application whose clients and services are tied to a specific SOI. WS-Security is one of the standards you should expect to see supported.

Where identity is concerned, the SOI must ensure that every intermediary can authenticate the requesting client (which could be a service) before passing credentials to the next service. As the credential information flows, it may be augmented or completely changed by each intermediate service: identity management must be federated hierarchically in order for it to scale and match the business domain. For example, although an intermediary service may call another service on behalf of a client, it may not be possible or legal for the identity of the original client to be exposed outside of the first service. Obviously the service must also be able to authenticate the client/service based on credentials or intermediary evidence before deciding whether it has authorization to call a service do some work.

8

Page 9: Chapter Template for Service-Oriented Computing Series

The SOI should provide support for:

! Digital certificates or tokens to prove identity;

! How credentials can be associated with a message based on a service’s capabilities;

! How intermediates can use credentials, authenticate them and pass evidence of that authentication to other services;

1.5 The Service Level Agreement

A service level agreement is a document that defines the relationship between two parties: the provider of the service and the recipient of the service. The SLA essentially defines a contract that exists between the two parties: the quality of service that the provider will give to the recipient. SLAs address situations when compliance to benchmarks must be verified from the perspective of a contractual obligation.

An SLA is an extremely important contractual obligation (usually in the form of a document in the physical world). In essence it defines the parameters for the delivery of a service for the benefit of both parties. At a minimum it will define:

! The service(s) to be delivered.

! Performance and how to report deviations from agreed metrics.

! The recipient’s responsibilities.

! Problem management.

! Security.

Although their origins are in the physical world, SLAs are just as important in computing environments and especially so in an SOI comprehensive SLA should

9

Page 10: Chapter Template for Service-Oriented Computing Series

always be seen as an essential requirement for both the provision of a service and the use of that service. It can improve the quality of the development process.

Implementing Service Level Agreements in a SOA requires a process flow that can define SLAs, measure compliance and act accordingly. This drives some essential capabilities:

! The ability to capture any type of service level related metric on a per-message basis.

! A flexible authoring environment to create policy logic based on SLA metrics and other service data.

! A mechanism to verify policy compliance and handle SLA violations or related exceptions.

Any infrastructure that meets these requirements also has to work with any existing components that might be part of the overall business process being managed. This may include identity management systems, application servers, Web servers, portals, etc. From this perspective, it makes sense to use an intermediary that is not tightly bound to the underlying infrastructure but can implement SLAs at a standards-based level.

This is where SOA governance starts to become a necessity. For early SOA adoption this type of governance might not make sense but as the extent of your SOA grows and services start to be reused more and more then governance is required and care must be taken to examine each services SLA and provide enough information so that other services can consume it with confidence. SLA and governance becomes powerful allies in analyzing, developing and testing new services.

There are a number of initiatives that are trying to address governance and SLA, notably WS-Policy from the W3C. However, at this stage there is no standard for SLA.

1.6 Policies and Contracts

We have already seen how defining the contract between service and client is an important aspect of the service lifecycle. In order to do this successfully it is important for the SOI to offer support for a message driven, contract driven development approach, such as JBossESB.

10

Page 11: Chapter Template for Service-Oriented Computing Series

An important aspect of any contract is the ability to define policies. A policy represents a constraint or condition on the use, deployment or description of a service. Policies are inherently driven by the service to define policies about issues that are important to them. Policies need to address the overall impact to the business of the services that are being created and deployed. They need to create a strong connection between the business and the SOI infrastructure.

A contract can refer to the service interface, the messages it can accept, their formats, or even a legal contract entered into when using the service. The difference between a policy and a contract is that the latter is an agreed policy between service and user.

Policies are only of real value if they can be authenticated or enforced, which is where other aspects of governance come in. Therefore SOIs that support SOA governance fall into five categories, which build upon one another:

! No policy support: the SOI has no support for policies or contracts within the infrastructure. The need for policies must be defined outside of the SOI and communicated using ad hoc techniques.

! Definition of policies: the SOI supports the capture and creation of policies at design-time (typically via a graphical interface) and run-time (usually through an intermediary such as a registry).

! Management of policies: the SOI allows the policies of services to be viewed (either directly by contacting the running service, or indirectly via an intermediary) and updated.

! Enforcement: policies are verified and enforced by the SOI.

! Storage: policies that are defined for one service may be useful for another, e.g., transactional capabilities. As such a library of policy types can be built up and shared between services and developers. These policies are typically stored within a repository.

The other metadata and policies (representing service constraints and capabilities) stored in the registry include:

11

Page 12: Chapter Template for Service-Oriented Computing Series

! Policies that describe configuration/description information for non-functional capabilities of the service, such as those defined by the WS-Security or WS-TX policies, for configuring low-level security and transactional aspects of the service.

! Policies that are markers for compliance or compatibility with certain standards or specifications, such as support for WS-Addressing or compliance with the WS-I basic profiles.

! Policies that represent constraints that must be fulfilled, such as SLAs or contractual obligations.

An implicit part of a service contract that is often overlooked is that of the service semantics: essentially what the service is supposed to do. The purpose of a service is the highest semantic characterization of the service. The service semantics include such items as the format and structure of any data communicated between the users of a service interaction.

A successful policy framework within an SOI must fulfill the following requirements:

! A canonical representation (typically XML) for expressing policies at different levels within the SOI (per service, per operation, per deployment container etc.) Policies should be composable.

! The ability to secure policies so that they can only be changed by authorized individuals or components.

! The ability to persist policies (within a repository).

! The ability to create, enforce and manage policies.

! The ability to locate policy definitions within the SOI. For management reasons it is often the case that policies will be associated with services by reference and the actual policy definition will need to be fetched from some location (e.g., the repository) in order to monitor or enforce it.

Policies should not be considered as static entities. As with their counterparts in the real world, policies change to reflect the environment in which they exist. For example, a security policy that was initially considered sufficient for inter-corporation deployments may be deemed to weak if services are eventually exposed beyond the corporate firewall and need to be improved. Any good SOI will allow this to occur dynamically without requiring changes to deployed services.

However, in which case how are changes to the policies communicated to the services and the enforcement aspects of the SOI governance? Once again, this will be implementation specific. Components (including human users) that wish to know about changes to policies, may subscribe to Policy Managers that push changes out to them when they occur, or require periodic polling if a push approach is not implemented. In an event-driven architecture, changes to policies constitute an event and therefore informing interested parties will often be catered for within the SOI architecture naturally, i.e., there will be no bespoke policy-management/monitoring component necessary.

12

Page 13: Chapter Template for Service-Oriented Computing Series

1.6.1 CDL

Now many people in our industry ignore formal methods or pay lip service to them, believing they are only of use to theoreticians. Unfortunately until that changes Computer Science will always be a "soft" science: more an art than anything. That's not a good thing because it limits efficiency. In a local application (everything on one machine) you can get away with cutting some corners. But in a distributed system, particularly one that needs to be fault tolerant, it's worse. For example, how do you prove the correctness of a system when you cannot reason about the ways in which the individual components (or services) will act given specific expected (or unexpected) stimuli? Put another way, how can you ensure that the system behaves as expected and continues to do so as it executes, especially if it has non-deterministic properties? As the complexity of your application increases, this problem approaches being NP-complete.

Rather than just throwing together an "architecture" diagram and developing services in relative isolation, and trusting to luck, we decided that something better had to exist for our customers. Now there are formal ways of doing this using PetriNets, for example. WS-CDL uses Pi-Calculus to help define the structure of your services and composite application; you can then define the flow of messages between them, building up a powerful way in which to reason effectively about the resultant. On paper the end result is something that can be shown to be provably correct. And this is not some static, developer-time process either. Because these "contracts" between services work in terms of messages and endpoints, you can (in theory) develop runtime monitoring that enhances your governance solution and is (again) provably correct: not only can you reason successfully about your distributed system when it is developed and deployed initially, but you can continue to do so as it executes.

13

Page 14: Chapter Template for Service-Oriented Computing Series

A good governance solution could tie into this and be triggered when the contract is violated, either warning users or preventing the system from making forward progress (always a good thing if a mission-critical environment is involved).

With the WS-CDL tooling you can define your scenarios (the interactions between parties). You can then use the tool to define the roles and relationships for your application and then you can dive down into very specific interactions such as credit checking or winning the auction.

The tooling is all Eclipse based, it should be relatively straightforward to tie this into our overall tooling strategy as well, providing a uniform approach to system management and governance. But even without this, what this combination offers is very important: you can now develop your applications and services and prove they work before deployment. Furthermore, in the SOA world of service re-use, where you probably didn't develop everything, a suitable WS-CDL related contract for each service should allow developers to re-use services in a more formal manner and prove a priori that the composite application is still correct, rather than doing things in the ad hoc manner that currently pervades the industry.

1.6.2 Policy Monitoring and Enforcement

In order to govern a SOA infrastructure such as an SOI, there needs to be a framework in place that allows these non-functional policies (attributes) and Service Level Agreements (SLAs) to be defined, enforced and audited across multiple consumers and services. Such a framework must be able to define policies for individual services and then either enforce them or provide means by which they can be managed and enforced by some other component within the SOA; at a minimum alerts should be supported to inform users (human or software) that contracts and policies have been, or are about to be, broken.

Policy enforcement should be provided by the SOI infrastructure through Policy Enforcement Points (PEP), such as interceptors within the consumer/service protocol stack or intelligent (dynamic) proxies through whom messages involved in interactions must pass. Only certain policies (such as security) are applicable to the proxy pattern if the proxy does not reside within the same address space as the ultimate destination.

14

Page 15: Chapter Template for Service-Oriented Computing Series

The technical aspects of how policies are enforced will typically be implementation specific. However, it is important to know that support for policies without enforcement provides limited utility. When monitoring, management and enforcement are all in place (a complete governance solution), the SOI should work with the various types of user (human or software) that need to be informed when policies or contracts are violated.

Obviously the PEP needs the capability to understand, monitor and perhaps enforce the policies under its control. As well as limiting the types of policies that can be defined, some SOI implementations will therefore limit the deployment scope for services such that they can only be deployed within environments (containers, processes etc.) that support the policies defined for them. Other SOI implementations define policy enforcement components that are pluggable and network-able, i.e., can be downloaded from a central Policy Management Repository by a PEP when it encounters policies for which it has no native monitoring or support capability.

SOIs can be categorized according to how they support policy monitoring and enforcement:

! No monitoring or enforcement: such SOIs are of limited use where service policies and contracts are concerned. Unless enforcement happens outside of the SOI, e.g., through direct interaction between producer and consumer, these implementations should be used with care, particularly as the size and complexity of deployments increase.

! Monitoring but no enforcement: at least these SOI implementations monitor the policies and send alerts when violations occur. The form of alerting may be as limited as simply outputting to a service-local log, or as complex as emailing and interacting with Business Activity Monitoring (BAM) consoles. But the outcome is

15

Page 16: Chapter Template for Service-Oriented Computing Series

the same: interested parties can be informed when policies are not adhered to. Obviously whether or not such information is communicated in (relatively) real time will depend upon the capabilities of the SOI.

! Monitoring and enforcement: where possible, the SOI implementation will enforce policies, such as ensuring that messages are appropriately secure and if not, refusing to allow such messages to be sent or received.

1.7 Service Monitoring and Business Activity Monitoring

The term Business Activity Monitoring (BAM), which was originally coined by Gartner, is used to describe the real-time access to critical business performance metrics in order to improve the efficiency and effectiveness of business processes. Real-time process/service monitoring is a common capability supported in many distributed infrastructures. However, BAM differs in that it draws information from multiple sources to enable a broader and richer view of business activities. BAM also encompasses business intelligence as well as network and systems management. BAM is often weighted toward the business side of the enterprise. As such, there has recently been a movement for BAM implementations to be closely related to the governance infrastructures.

As such, BAM leverages the technical aspects of monitoring and altering, but for technical reasons as well as business-focused events. From the perspective of a business user, the ability to create alerts based on business processes can be crucial in order to comply with regulations, auditing etc. Although BAM has been around for many years, it has begun to be associated strongly with the business-driven aspects of SOA.

1.7.1 Why BAM?

To understand why BAM is important, let’s take a real-world example: an online catalogue shop. In this scenario there will be a continuous production of sales related

16

Page 17: Chapter Template for Service-Oriented Computing Series

information, as well as logistics (e.g., are the items currently available or do they need ordering from elsewhere?), financial (e.g., does the user have the right line of credit?) etc. All successful companies have operational processes that allow them to analyze information in real-time, creating alerts when problems occur (or are about to occur). This is what BAM formalizes and provides to all applications within the IT system.

Therefore most BAM systems will attempt to provide the following capabilities:

! The ability to monitor in real-time business processes and activities, alerting users (by email, voice, SMS etc.) before problems arise and in some cases allowing the system to be updated to prevent further problems. The BAM infrastructure pushes the events at real-time to the dashboard.

! The ability to create dashboards, which are similar in intent to a car’s dashboard and are used to present important information in a visual manner (often on a single screen) that is tailored to a specific role or viewpoint. It is worth stressing the importance of real-time information processing as far as BAM is concerned because other approaches to monitoring exist (such as Business Intelligence) that utilize dashboards. However, these systems are not real-time based, typically refreshing information periodically.

As we mentioned above, an important aspect of BAM (or the monitoring/management aspects of any good governance infrastructure) is alerting users (or other services) about contract violations etc. Within BAM these alerts are triggered on business events (e.g., is the service meeting its SLA, or are response times longer than required?), but in general alerts may be triggered for a range of reasons, e.g., the service has failed and the infrastructure needs to start another instance.

17

Page 18: Chapter Template for Service-Oriented Computing Series

The complexity of alerts that are supported will depend upon the implementation, as will how alert messages are delivered. While most of the first BAM solutions were closely linked to Business Process Management BPM solutions and therefore processed events emitted as the process was being orchestrated, this had the disadvantage of requiring enterprises to invest in BPM before being able to acquire and use BAM. Fortunately the newer generation of BAM solutions is based on Complex Event Processing (CEP) technology, and can process high volumes of underlying technical events to derive higher level Business Events, therefore severing the dependency on BPM, and providing the benefits of BAM to a wider audience of customers.

Any good monitoring/management infrastructure should support at least the following:

! The ability to measure the total elapsed time from the sending of a request to a service and obtaining a response, i.e., the round-trip time. This is necessary for determining whether or not service execution times defined within SLAs are (about to be) violated and is particularly useful in high volume services.

! Failure/fault detection alerts can be critical to business users as well as infrastructural clients and services for tracking service availability and status. Some SOIs are very static, requiring services be deployed by system administrators; if there are failures of services then the administrator needs to start a new instance. However, the next generation of SOIs are taking a leaf out of distributed systems of the 1990’s and embracing a more dynamic

18

Page 19: Chapter Template for Service-Oriented Computing Series

approach to service deployment: failures of services can be detected by these SOIs and new instances automatically deployed. Similar techniques are used to dynamically redeploy services to ensure a level of availability as well as to provide load balancing.

! Detecting trends in services (similar techniques to Data Warehousing capabilities, but using real-time data rather than offline databases). Alerts that are triggered when thresholds are reached are good when it is important to know that something has gone awry, but it is often more important to predict when something is going wrong so that corrective measures can be taken and prevent the original alert being triggered. Furthermore, threshold triggers are often absolutes and do not take into account natural perturbations, often leading to false positive alerts. A popular approach to this is using non-threshold analysis, such as Bayesian Belief Networks (BINs), which are based on probabilistic inference, allowing a prediction of an event based on previous historical information. This can be an extremely powerful approach for BAM implementations, allowing much more accurate predication of trends. We also believe that BINs will begin to be used more widely within other areas of SOA governance, particularly in the long term prediction of faults or service bottlenecks.

! Monitoring message payload size can be useful in detecting reasons for bottlenecks, abnormal service requests/response times etc.

1.8 Service Activity Monitoring

Although BAM was popularized by BPM, the fundamental basis behind it (monitoring the activities in an environment and informing interested parties when certain events are triggered) has been around since the early days of (distributed) system management and monitoring. BAM specializes this general notion and targets the business analyst.

Within a distributed environment (and many local environments) services are monitored by the infrastructure for a number of reasons, including performance and fault tolerance, e.g., detecting when services fail so that new instances can be automatically started elsewhere. Over the years distributed system implementations have typically provided different solutions to specific monitoring requirements, e.g., failure detection (or suspicion) would be implemented differently from that used to detect performance bottlenecks. However, for some types of event monitoring this leads to overlap and possible inefficiencies. For instance, some approaches to detecting (or suspecting) failures may also be used to detect services that are simply slow, indicating problems with the network or overloaded machine on which the service resides.

As we saw earlier when discussing Policy Enforcement Points, the general concept of interceptors (or filters) exist in many distributed systems architectures over the past few decades. In fact Policy Enforcement Points are typically implemented as a specific type of interceptor built on the more general infrastructure provisioning. It turns out that interceptors are also a good implementation technique for some types of failure detection.

We are now seeing a merging of many different approaches to entity monitoring within distributed systems (where an entity could be a service, a machine, a network link or something else entirely) and particularly SOIs. The emergence of event processing has also seen an impact on this general entity monitoring, where some

19

Page 20: Chapter Template for Service-Oriented Computing Series

implementations treat failure, slowness to respond etc. as particular events. This uniform monitoring is termed Service Activity Monitoring (SAM) and typically includes the following:

! Message throughput (the number of messages a service can process within a unit of time). This might also include the time taken to process specific types of messages (e.g., how long to do transformations).

! Service availability (whether or not the service is active).

! Service Mean Time To Failure (MTTF) and Mean Time To Recovery (MTTR).

! Information about where messages are sent.

As the diagram below illustrates, the information is made available to the infrastructure so that it may be able to take advantage of it for improved QoS, fault tolerance etc. The streams may be pulled from existing infrastructure, such as availability probing messages that are typically used to detect machine or service failures, or may be created specifically for the SAM environment. Furthermore, streams may be dynamically generated in real-time (and perhaps persisted over time) or static, pre-defined information, where the SAM can be used to mine the data over time and based on explicit queries.

With the advent of SAM we are beginning to see some BAM implementations that are built on it, whereas other implementations are (continue to be) built from scratch and only target the business activities. The SAM approach may offer more flexibility and power to monitoring and management, whereas a specific implementation may be more easily transported to different environments (since it is not tied to a specific underlying infrastructure).

20

Page 21: Chapter Template for Service-Oriented Computing Series

2. RoadmapInitial implementation must concentrate on the repository. This will be based on Guvnor and DNA. Until DNA is available, we will use Jackrabbit as the JCR backend.

The use of CDL is a relatively medium term requirement for users, since it really comes in to play when you have more than a handful of services. However, this is a significant positive differentiator for JBoss and is innovative. Hence the CDL integration with Overlord is an ongoing effort now, driven by our interactions with Hattrick Software.

Given everything governance needs to perform, e.g., monitoring of contracts/SLAs and enforcement, the need for a good SAMM infrastructure is needed in the short-to-medium term: retrofitting the infrastructure will prove problematical and inefficient. The SAMM infrastructure should be based on CEP and BIN. Furthermore it should be developed in a pluggable manner to allow deployers to use their own CEP implementations if necessary. This does not necessarily mean the wholesale replacement of whatever CEP implementation we use by default: federation of CEP should be possible. SAMM also needs to tie into RHQ as one of the event streams.

Tooling for both design-time and run-time governance is a continuous requirement. Initial tools will concentrate on run-time governance.

The Service Modeler (donated by Thomas Erl) is an ongoing community driven tool. This ties in with the immediate need for work on defining a service contract definition language which will be based on WS-Policy and Policy Intents.

The PEP architecture will be developed in the short term. Then all SOA Platform projects need to understand the need for PEP and consider where they will occur within their architecture. The implementations of PEPs for all projects will be an ongoing effort and should also tie into the SAMM implementation when it is available. In the short term PEPs will be developed in an ad hoc manner in order to provide a pragmatic solution to immediate problems.

Identity Management and Security are immediate requirements.

21


Top Related