implementing fault resilient protection for mysap in a

8
Implementing Fault Resilient Protection for mySAP in a Linux Environment Introducing LifeKeeper from SteelEye Technology Introduction In the past, high-availability solutions were costly to the point that they made economic sense only for elite business networks and for systems running high-end mission critical applications. The building of highly available systems required purchasing specialized hardware and implementing proprietary interfaces in applications to be protected. For those who could not afford such a solution, a certain amount of system downtime had to be tolerated. Today’s businesses and customers, however, require high-availability solutions across the board. A global business needs 24-by-7 access to information. In an Internet service model, organizations must count on customers arriving at their site any hour of any day – “regular business hours” have no meaning on a Web site; scheduling downtime for routine maintenance has vast economic impact. A recent survey found that: • After 8 seconds of waiting, the typical customer abandons a non-responsive web site. • One minute of data unavailability costs the average business between 8 and 10 thousand dollars. Likewise, corporate intranets deployed to facilitate the sharing and consolidation of business operations data need to be available to employees so that they can perform their functions, be it Human Resources, Finance, or Logistics. In the ERP space, where products such as mySAP provide the infrastructure for applications on which businesses run, the cost of downtime can approach $1 million per hour. Independent Gartner Group and Dataquest studies project that in 1999 alone, downtime cost United States firms in excess of $4.6 billion total. The numbers below represent average business costs per hour of application downtime. These numbers represent losses in sales revenue only. They do not include intangible losses, such as employee productivity and customer dissatisfaction. In today’s web enabled economy, a customer who cannot access the site or services they desire is only a click away from moving to a competitor. Through the deployment of high-availability clustering middleware with commodity servers, businesses can achieve 99.99% uptime (about 53 minutes of downtime in a year) for their business critical applications at a fraction of the cost historically associated with fault resilient systems. The deployment of such an environment for mySAP ensures the availability of business functions at a price point that supports a strong ROI typically expected from a deployment of mySAP. SteelEye and SAP AG have worked closely together to ensure that the availability expectations corporate users have for critical business infrastructure components are achievable through the deployment of a joint mySAP/LifeKeeper solution. Business Operation Average Cost/ Hour of Downtime Communications: Converged Services (Phone, Web, Call Center…) > $10.0 million Financial: Brokerage Operations $6.45 million Financial: Credit Card/Sales Authorization $2.6 million Corporate Infrastructure: ERP $780,000 Media: Pay Per View $150,000 Transportation: Airline Ticketing $89,500 Media: Event Ticket Sales $69,000 Source: Dataquest 1998 and Others WHITE PAPER Software for Innovative Open Solutions

Upload: others

Post on 05-Feb-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Implementing Fault Resilient Protection formySAP in a Linux EnvironmentIntroducing LifeKeeper from SteelEye Technology

Introduction

In the past, high-availability solutions were costly to the point that they made economic sense only for elite business networks and for systems running high-end mission critical applications. The building of highly available systems required purchasing specialized hardware and implementing proprietary interfaces in applications to be protected. For those who could not afford such a solution, a certain amount of system downtime had to be tolerated.

Today’s businesses and customers, however, require high-availability solutions across the board. A global business needs 24-by-7 access to information. In an Internet service model, organizations must count on customers arriving at their site any hour of any day – “regular business hours” have no meaning on a Web site; scheduling downtime for routine maintenance has vast economic impact.

A recent survey found that:

• After 8 seconds of waiting, the typical customer abandons a non-responsive web site.

• One minute of data unavailability costs the average business between 8 and 10 thousand dollars.

Likewise, corporate intranets deployed to facilitate the sharing and consolidation of business operations data need to be available to employees so that they can perform their functions, be it Human Resources, Finance, or Logistics.

In the ERP space, where products such as mySAP provide the infrastructure for applications on which businesses run, the cost of downtime can approach $1 million per hour. Independent Gartner Group and Dataquest studies project that in 1999 alone, downtime cost United States firms in excess of $4.6 billion total. The numbers below represent average business costs per hour of application downtime.

These numbers represent losses in sales revenue only. They do not include intangible losses, such as employee productivity and customer dissatisfaction. In today’s web enabled economy, a customer who cannot access the site or services they desire is only a click away from moving to a competitor.

Through the deployment of high-availability clustering middleware with commodity servers, businesses can achieve 99.99% uptime (about 53 minutes of downtime in a year) for their business critical applications at a fraction of the cost historically associated with fault resilient systems. The deployment of such an environment for mySAP ensures the availability of business functions at a price point that supports a strong ROI typically expected from a deployment of mySAP. SteelEye and SAP AG have worked closely together to ensure that the availability expectations corporate users have for critical business infrastructure components are achievable through the deployment of a joint mySAP/LifeKeeper solution.

Business OperationAverage Cost/

Hour of Downtime

Communications: Converged Services (Phone, Web, Call Center…)

> $10.0 million

Financial: Brokerage Operations $6.45 million

Financial: Credit Card/Sales Authorization

$2.6 million

Corporate Infrastructure: ERP $780,000

Media: Pay Per View $150,000

Transportation: Airline Ticketing $89,500

Media: Event Ticket Sales $69,000

Source: Dataquest 1998 and Others

WH

ITE

PA

PE

R

Software for Innovative Open Solutions

MySAP Overview and Components

mySAP is a comprehensive software package providing a flexible infrastructure around which companies can build their business processes and information flow. mySAP provides a wide range of core business functions including, for example, financial management and human resources. SAP R/3 systems contain an ABAP Engine. The NetWeaver environment includes a Java Engine in addition to the ABAP Engine. Additionally, the Java Engine can be added to an ABAP Engine installation.

mySAP is composed of several cooperating services that may be run on a single server or may be distributed across several servers in a cluster. Some of these services have redundancy built into them, while others do not and represent single points of failure in the SAP environment. The SAP Central Instance (Enqueue and Message server for ABAP), the SAP System Central Services (SCS) Instance (Enqueue and Message Server for Java), the Database server, and the NFS server represent single points of failure (SPOF) and are the services that LifeKeeper will protect.

The SAP Central Instance (CI) is a standalone SAP Basis unit which provides services used by clients connected to the SAP system. Among these services are the Message server and the Enqueue server, which run only on the single SAP Central Instance. The Message server maintains a list of all available resources in an SAP system, determines which instance a user logs on to during a client connect, and handles all communication between SAP instances. The Enqueue server is used by SAP to administer the lock table in a distributed SAP system. If the CI server hosting the Enqueue service fails, all SAP transaction locks that have not yet been committed are lost. R/3 guarantees that no user can perform a transaction while the Enqueue service is unavailable in order to guarantee database consistency. Placing the Enqueue and Message services together on the CI is recommended by SAP since the Message service must always access the Enqueue service for inter-process communication. These services provide critical SAP functions that, by existing only in the SAP Central Instance, suffer from being a single point of failure in the SAP environment. Obviously, the CI, which contains the Enqueue and Message services, needs to be restarted as quickly as possible following a failure so that normal operations can resume.

The SAP System Central Services (SCS) has essentially the same function as the SAP CI with the exception that it provides the Enqueue and Message servers for NetWeaver Java environments.

The Database server (DB) acts as the data repository for the SAP environment. Several databases are supported by

SAP for Linux deployments. The exact versions that have been certified by SAP can be found at: http://www.sap.com/solutions/technology/linux/platforms/index.asp. The Database server may or may not run on the same physical server as the Central Instance, but since the Database server is not replicated in any manner, it also becomes a single point of failure for the SAP environment. Access to the database holding corporate information and SAP transaction update logs is critical to the functioning of an integrated business management solution.

The NFS server (NFS) provides for the mounting of several critical directories. In most configurations, either the system hosting the CI or the system hosting the DB will act as the NFS server, though SAP can be configured such that the NFS server resides on a separate dedicated system.

An additional component that, while not a SPOF for the configuration, should be understood is the Application Server (AS). The AS is any mySAP instance other than the Central Instance, which means that it does not host the Enqueue or Message services. The AS can be configured to host any of the other services (dialog, update, batch/background, spool, gateway) and is typically used to spread the SAP workload across multiple servers and provide redundancy for these services. The number of Application Servers in an SAP system is typically determined by system workload.

Obviously, with the existence of these SPOFs in the mySAP environment, the deployment of a complementary High Availability solution is critical to protect against costly system downtime.

SAP High Availability Configurations

In the document, SAP R/3 in Switchover Environments, SAP recommends that High Availability middleware products support six different failover scenarios, labeled SO-1 through SO-6, across three physical configurations as shown in the pictures and table below.

The document, SAP Web Application Server is Switchover Environments, includes the same switchover scenarios as the document SAP R/3 in Switchover Environments for the ABAP environment. The document also describes three recommended switchover scenarios for Java environments.

Note that the scenarios deal exclusively with host failure and subsequent switchover of the SAP CI or SCS and DB services. Handling of NFS failover is considered to be an operating system (or HA middleware) issue, so it is not covered in the SAP configurations. In configurations where NFS is being used to provide data mount points, LifeKeeper will fulfill all HA requirements.

Software for Innovative Open Solutions

A second point is that switchover of AS services are not included. This is for two reasons. First, critical services should be configured as redundant services on multiple AS hosts to avoid being a SPOF. Since no AS would therefore be a SPOF, consideration for AS switchover is not required.Second, if one of the services typically placed on the AS (such as batch/background, update, spool, dialog, gateway) needs to be centralized for some reason that service should be configured to be part of the CI or SCS. This way, a switchover of the CI or SCS will handle switching over these services as well.

The fact that the mySAP services can be distributed, and some even replicated, across multiple host machines, leads to a variety of possible system configurations.

In the following configuration illustrations, servers are named s1, s2, … sN, and they are represented as rectangular boxes. A gray background behind servers means that the servers are configured as a switchover cluster. The SAP services hosted on each server are listed within the server box, with DB denoting the DBMS, CI denoting Central Instance (ABAP environment), SCS denoting the System Central Services (Java environment), and AS denoting Application Server. Parentheses around the service name means that the service is in standby mode, ready to take over from the active server when necessary.

The cylinder between the Active and Standby DB servers indicates the necessity of having the database located on a shared volume. The cylinder between the Active and Standby servers indicates the necessity of having the Central Instance files located on a shared volume.

ABAP Environment Switchover Scenarios

Please note the scenarios described in this section also apply to SAP NetWeaver ABAP+Java Add-in environments.

Configuration #1: CI and DB on separate servers

The first configuration represents an Active/Active setup where the DB is active on one server (s1) and the CI is active on a second server (s2). If the first node fails, the failing service (either CI or DB) is switched to the second cluster node. Servers s1 and s2 are backing each other up, so switchover can occur in either direction. Of course, both servers must be sized to handle both the DB and CI running on them at the same time. These scenarios are noted as SO-1 and SO-2 by SAP.

Also note that there can be any number of Application Servers in the cluster. By their definition, these ASs are running services which can be replicated among multiple AS systems. Redundant AS systems should be configured and SAP Logon groups can be used so that if an AS goes down it is transparent to the user. Because of this SAP feature, no HA middleware is required on the AS systems.

Configuration #2: CI and DB on same server

In scenario SO-3, the CI and DB run on the first server (s1) and the second server (s2) acts as a backup for these services while running an AS instance. Because an AS instance need not be active on the same physical server as a CI, the AS instance should be shutdown as part of the switchover procedure prior to the CI/DB being started on s2.

In this scenario, Server s2 could also be idle if desired, but there is no reason why it should not be used as an AS while simultaneously acting as a standby for s1.

In this configuration, the CI and DB will always move between servers together. SAP requires that the DB be started and available before the CI starts, and LifeKeeper will ensure this relationship is honored during the switchover.

One additional operational issue is that following a recovery of server s1, the DB/CI services may need to be migrated back to the Primary Server and the AS on the Standby Server restarted. This automatic switchback behavior is configurable and is a task which LifeKeeper will perform if required.

SAP Scenario Switchover Scenario

SO-1 s1 fails –startup DB on s2

SO-2 s2 fails –startup CI on s1

Configuration 1: CI and DB on separate servers

DB(CI)s1

(DB)CIs2

AS

sN

Software for Innovative Open Solutions

Configuration #3: CI and DB separated into distinct failover clusters

Configuration #3 separates the CI and DB into two distinct failover clusters and requires a dedicated standby server for each.

SAP points out that there are two times when you may want to deploy dedicated standby servers. The first is when the CI and the DB are running under different operating systems. Since you cannot switchover the CI from one OS to another, you also cannot mix operating systems within a single cluster. Therefore two distinct clusters are required if you run in this environment.

The second scenario is when you are operating in very demanding environment such that you require a separate server for each of the CI and the DB. In previously defined scenarios, the CI and DB will be running on the same physical server following a switchover. If this is not acceptable because of server capacity, deploying dedicated stand-by servers for each of the SPOF components is an alternative.

Due to complexity involved, you should contact SteelEye for a discussion of your specific needs if you want to deploy into one of these scenarios.

Java Environment Switchover Scenarios

Configuration # 1: DB and SCS in its own switchover unit, Java-CI outside

The first Java environment configuration represents an active/active scenario. The database is protected and running on node s1. In the event of a software or server failure, the database will be moved to node s2. Likewise, the SAP SCS is running and protected on node s2 in the event of software or server failure the SAP SCS will be moved to node s1.

The Java Central Instance (CI) is installed on a server that is not a part of the cluster. There are also redundant Application Servers (AS) running in the SAP environment.

Configuration #2: DB and SCS in its own switchover unit, Java-CI inside

The second Java environment configuration also represents an active/active scenario. The database is protected and running on node s1. In the event of a software or server failure, the database will be moved to node s2. Likewise, the SAP SCS is running and protected on node s2 in the event of software or server failure the SAP SCS will be moved to node s1.

The Java Central Instance (CI) is installed on a server that is a part of the cluster. The Java CI is not protected and does not failover. In this configuration a redundant Java Application Server (AS) is installed on the SAP backup

SAP Scenario Switchover Scenario

SO-3s1 fails – shutdown AS on s2,

startup CI & DB on s2Configuration 2: CI and DB on same server

DB(CI)s1

(DB)CI

[AS]s2

AS

sN

DB(SCS)s1 s2

JAVA-CI

s3

AS

sN

DB(SCS)

Configuration # 1: DB and SCS in its own switchover unit, Java-CI outside

SAP Scenario Switchover Scenario

SO-4s1 fails – shutdown AS on s2,

startup CI on s2

SO-5s3 fails – shutdown AS on s4,

startup DB on s4

SO-6s3 fails –startup DB on s4

(no AS running on s4 initially)

Configuration 3: CI and DB separated into distinct failover clusters

(CI)

s1

CI[AS]s2

(DB)

s3

DB[AS]s4

Software for Innovative Open Solutions

server. The Java AS is not shutdown when the SAP SCS fails over. There are also redundant Application Servers (AS) running in the SAP environment.

Configuration #3: DB and SCS in its own switchover unit, two switchover environments

In Configuration #3, the database and the SCS are each protected in their own distinct cluster.

In each cluster there is a dedicated backup server. The SCS is running on server s1. If there is a software or server failure the SCS is moved to server s2. The database is running on server s3. If there is a software or server failure the database is moved to server s4.

Although not shown in the diagram below, the Java Central Instance (CI) is installed on a server that is not a part of either cluster. There are also redundant application servers installed in the environment.

Protecting mySAP with LifeKeeper

The preceding section describes the scenarios in which deploying a high availability environment is appropriate. LifeKeeper for Linux delivers the middleware component required to deploy these scenarios. This section examines how LifeKeeper does this.

Architecture

LifeKeeper consists of three distinct components that cooperate to ensure fault-resilient high availability. These are shown in the drawing below and described in the following section.

The LifeKeeper core delivers the basic software infrastructure required to build a cluster. This includes a cluster database, cluster communication management and interfaces required by other LifeKeeper components. The core product also comes bundled with recovery software for core system components, such as memory, CPUs, the operating system, the SCSI disk subsystem, and file systems and IP addresses.

Application Recovery Kits (ARKs) sit on top of the core and provide application-specific information required to perform health monitoring and recovery. There are ARKs for Oracle, IBM’s DB2 UDB, MySQL, SAPDB, Apache, Sendmail, and mySAP, to name a few. While independent in the sense that each ARK contains specialized knowledge of its own application, they can be combined to build complex hierarchies with interdependencies.

For example, in order to protect mySAP, the mySAP ARK, which provides monitoring and switchover for the CI, will be used in conjunction with the appropriate database (Oracle, for example) ARK for DB protection and the NFS Server ARK for protection of the NFS mounts. The IP ARK

Application Recovery Kit

LifeKeeper Core

Recovery Direction/Action

Resource Monitoring

LifeKeeper GUI

LifeKeeper Node

To LCM onanother system

LCD Interface (LCDI)

LifeKeeperCommunicationsManager (LCM)

LifeKeeper Alarm Interface

LifeKeeper Recovery Action & Control Interface (LRACI)

LifeKeeperConfiguration

Database (LCD)

DB(SCS)

s1

(DB)

[AS] [Java-CI]s2

AS

sN

(SCS)

Configuration #2: DB and SCS in its own switchover unit, Java-CI inside

(SCS)

s1 s2

(DB)

s3

(DB)

s4

(SCS)

Configuration #3: DB and SCS in its own switchover unit, two switchover environments

Software for Innovative Open Solutions

would be used as well to provide for a virtual IP address that can be moved between NIC cards in the cluster as needed. The various ARKs are used to build what is called a “hierarchy” which provides protection for all of the components of the application environment.

Each of these ARKs contain code which monitors the health of the application under protection and which is able to stop and restart the application both locally and on another cluster server. The mySAP ARK has been developed in consultation with SAP to ensure that it is using the most effective means for monitoring and recovering the SAP CI and/or SCS and that all integration issues between the CI, SCS, the DB, NFS and the AS are accounted for in the recovery operations.

The third component of the LifeKeeper architecture is the Graphical User Interface (GUI). The GUI is used to build the cluster, to define which applications/services are to be protected, to assign stand-by responsibility to appropriate nodes, and to monitor the cluster. Written in Java, the GUI can be run on either the cluster systems themselves or from any browser that can access the cluster.

Together, these three pieces, the core, the GUI and the associated ARKs, deliver the fullest featured HA middleware product available for Linux today. LifeKeeper’s customizable architecture makes it ideal for providing the protection required in mySAP environments.

How it Works

LifeKeeper provides two critical functions for the mySAP environment: the ability to monitor the health of all critical SPOF components (we call these resources) and to take an appropriate recovery action when degradation in health is detected. LifeKeeper will, at user-defined intervals, check the health of the SAP CI and/or SCS, the DB, NFS, the IP addresses being used by client connections and underlying system services. If a problem is detected, LifeKeeper will attempt to recover the troubled resource locally on the same server. If this is not successful, a switchover to the standby server will be initiated. In this same vein, if the entire system on which these components are running should experience a failure, LifeKeeper will migrate all of the server processes to the correct standby server. Monitoring and protection are provided at both the individual component and system level.

All dependent resources will be migrated together, brought up in the correct order, and monitoring of their health will resume on the new server. An SNMP alert will be sent as notification that the switchover action has occurred so that appropriate personnel can be called in to diagnose and correct the problems on the initial server. When the initial

server is to the point that the failed service can resume running on it, LifeKeeper can perform an automatic switchback if so configured.

Looking at the active/active configuration below, LifeKeeper will be installed on both nodes indicated by the gray shading. In addition to the LifeKeeper core package, ARKs for mySAP, for the database being used, for NFS if data is being shared via exports and for IP address protection would be installed. The resource hierarchy for the DB instance on s1 will be protected by server s2 while the CI instance on s2 will be protected by s1. No protection is provided for the AS since this is a redundant component by definition.

Data required by both the CI and the DB will reside on the shared storage so that it can be accessed regardless of which server is running the specific service. This shared storage may be either direct-attach SCSI, Fibre channel attached thru a SAN, or a NAS device.

IP addresses used by clients or other SAP services to access either the CI or the DB will be configured as virtual addresses so that they can move between NIC cards in the servers as needed during switchover. LifeKeeper automatically updates network ARP tables in the event of a switchover so that client/service accesses via IP addresses are able to seamlessly migrate between NICs as appropriate.

Please Note: In the description above and the drawing below SCS can be substituted for CI for a similar configuration in NetWeaver Java environments.

DB(CI)s1

(DB)CIs2

AS

sN

Active/Active mySAP Configuration

Software for Innovative Open Solutions

Through the LifeKeeper GUI, you can easily create a complete resource hierarchy so that the monitoring and recovery operations include the system resources used by the mySAP Central Instance. The GUI screen shot below shows the above SAP configuration with system “adam” representing server s2 and system “eve” representing server s1.

Note that there is a DB hierarchy (labeled DB_PRO) active on “eve” and a CI hierarchy (labeled SAP-PRO_00) active on “adam”. The DB hierarchy contains a number of file systems required by the database along with an IP address being used for database access. The CI hierarchy also consists of a number of components: file systems, IP address, and NFS shares that are required by the CI service. Each of these hierarchies is being protected by the other server in the cluster, as denoted by the “StandBy” indicator on the resource buttons.

A similar screenshot for a SAP NetWeaver Java only installation follows:

The notable difference is that the trans directory is not necessary in a NetWeaver Java only environment.

The LifeKeeper GUI simplifies the building of this complex environment through its wizard driven interface and provides at-a-glance monitoring of the environment’s health via its descriptive icons.

LifeKeeper monitors the health of each of these individual resources and, if it detects a resource failure, takes appropriate recovery actions for the entire hierarchy. With its mySAP ARK, LifeKeeper ensures continuous availability of mySAP for the performance of critical business processes.

Software for Innovative Open Solutions

A Total Solution: mySAP with LifeKeeper

LifeKeeper® from SteelEye Technology provides each of the HA middleware characteristics described above and brings the levels of availability to Intel-based platforms that have historically been reserved for high-end specialized systems.

LifeKeeper is a mature technology which had its genesis in AT&T’s Bell Labs and has since been deployed in hundreds of mission-critical environments across the telecommunications, retail, financial, and manufacturing spaces protecting varied applications and system configurations. LifeKeeper operates on top of both Linux and Windows and has off-the-shelf health monitoring and recovery agents (called Application Recovery Kits) for standard application services including:

• IP Connections, File and Print services

• Apache and IIS Web Servers

• Exchange and Sendmail Mail Servers

• DB2, Oracle, MySQL, SAPDB, and Informix Database servers

• mySAP

• Software Developer’s Kit for writing your own Recovery Kits for custom applications

When combined with components of mySAP, LifeKeeper

brings mission critical availability to business infrastructure at great economic benefit. Many businesses today struggle with the balance between the need for systems that can provide the levels of availability required to support critical applications and the need to reduce total cost of ownership. The SAP/SteelEye integrated product combination enables businesses to achieve both goals.

References

SteelEye Technology™ SAP R/3 Recovery Kit Design Document. (Internal Document).

SteelEye Technology™ SAP R/3 Recovery Kit Administration Guide.

SteelEye Technology™ LifeKeeper® Product Briefs.

( http://www.steeleye.com/products/literature.html )

SAP R/3 in Switchover Environments. Copyright 1997 SAP AG. Product number 50020596.

SAP Web Application Server in Switchover Environments. Copyright May 2005 SAP AG.

Feedback

Direct all comments and questions on this white paper to: [email protected]

Oct

10

© 2010 SIOS Technology Corp. All rights reserved. SIOS, SIOS Technology, LifeKeeper and SteelEye DataKeeper and associated logos are registered trademarks or

trademarks of SIOS Technology Corp. and/or its affiliates in the United States and/or other countries. All other trademarks are the property of their respective owners.

SIOS Technology Corp. • US/Canada 866.318.0108 • Europe + 44 1494 429382 • Int’l +1 (650) 843-0655

2929 Campus Drive, Suite 250, San Mateo, CA 94403

Software for Innovative Open Solutions