whitepaper: real world oracle real application cluster ... challenges for oracle rac...real world...

November 2006

Version 1.2 – ISSUE

Whitepaper:

Real World Oracle Real Application Cluster challenges

Real world Oracle RAC Challenges

of 24

© Linxcel Europe Ltd Date: 22/11/06

DOCUMENT INFORMATION

Copyright

© The copyright to this document is owned by Linxcel Europe Ltd. No part of this document may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without their prior permission.

Contact Us

If you have any queries or wish to discuss our Oracle or SQL/Server consulting services, please contact us at:

Linxcel Europe Limited

Office 408 / 4th Floor

1 Liverpool Street

London

EC2M 7QD

Telephone: +44 207 193 7508 or Fax: +44 870 428 5803

Email: [email protected]

Revision History

Version Date Changes

1.0 August 05 Initial Release

1.1 November 2005 Updated to amend a few minor typo’s

1.2 November 2006 Updated with modified location/contact telephone number details


of 24


TABLE OF CONTENTS

1. EXECUTIVE SUMMARY.........................................................................................................4

2. INTRODUCTION.......................................................................................................................6

3. UNDERSTANDING THE RAC ARCHITECTURE ...............................................................7 3.1 SINGLE INSTANCE CONFIGURATION ...........................................................................................8 3.2 SIMPLIFIED RAC CONFIGURATION .............................................................................................9

3.2.1 Choosing the Oracle Software Location..........................................................................10 3.2.2 RAC’s Impact on Your Oracle Backup and Recovery Technology..................................11

3.3 2 INSTANCE REAL-WORLD RAC IMPLEMENTATION .................................................................13 3.4 HIGH AVAILABILITY FOR THE INTERCONNECT..........................................................................13 3.5 INTERCONNECT BANDWIDTH CONSIDERATIONS .......................................................................14 3.6 INTERCONNECT LATENCY CONSIDERATIONS ............................................................................14 3.7 HIGH AVAILABILITY FOR THE STORAGE ...................................................................................15

4. CHALLENGES IN ADOPTING RAC TECHNOLOGY ......................................................16 4.1 BUSINESS DRIVERS FOR RAC DEPLOYMENT ............................................................................16 4.2 CLUSTER CONFIGURATIONS DEPLOYED IN THE REAL WORLD ..................................................18 4.3 UNBREAKABLE LINUX..............................................................................................................19

5. CHOOSING A STORAGE ARCHITECTURE FOR RAC ..................................................21 5.1 RAW PARTITIONS......................................................................................................................21 5.2 ORACLE CLUSTER FILE SYSTEM (OCFS)..................................................................................22 5.3 ORACLE 10G AUTOMATIC STORAGE MANAGEMENT.................................................................22

6. PERFORMANCE AND PERFORMANCE MANAGEMENT OF RAC SYSTEMS .........24


of 24


1. EXECUTIVE SUMMARY

Oracle RAC is a high-availability solution which provides active-active clustering capabilities that can make better use of your server hardware resources, compared to traditional active-passive cluster architectures. However Oracle RAC does bring with it additional complexity, cost and a significant degree of change in order to deploy and successful manage even the most basic RAC deployment. These variations to a typical Oracle deployment need to costed, technically understood, managed and mitigated against in each organisation if the benefits of RAC are to be ever achieved.

• In order to choose how many cluster nodes to deploy; you should consider how the workload would run upon loss of a single node. Two and three node clusters are the most popular configurations deployed today and the easiest to understand. Remember that 10g supports a shared ORACLE_HOME to reduce the administration cost of installation and patching.

• The benefits of hardware virtualization AKA “The Grid” with respect to cluster databases are more promises than reality today although the potential of such architectures are widely recognized. Whilst nodes can be added to an existing RAC, or removed, without outages, the processes to do that today are far from automated and on-demand.

• The requirement for clusterware, shared storage, and a high speed interconnect adds cost and complexity to the configuration that are not present in a single instance Oracle configuration, these require close management and testing that do not exist in standalone solutions.

• The requirement for shared storage means that very careful attention must be paid to ensure that all components in the shared storage architecture are compatible and supported – much more so than in a single instance configuration. This continues to be an area of complexity and raises many practical questions with all customer deployments.

• Comprehensive training is required for both system administrators and DBAs to manage and deploy a RAC system, and ensure that the availability benefits promised by RAC can actually be delivered. Any deployment should include interconnects without single points of failure, unfortunately Oracle doesn’t provide built in features to enable this.

• Don’t expect to necessarily deploy an existing OLTP-type application straight to RAC and gain perfect scalability across the cluster at day one. Although cluster performance seems to have improved compared to 9i, be prepared and plan for some performance impact analysis phase during the migration. Oracle provides the “Tuning Pack” option in 10/g to facilitate this, although it is an additional costed option.

• Oracle’s Unbreakable initiative on Linux has the potential to provide a better support experience for customers choosing RAC on Linux – however it’s relatively easy for the unaware to deploy a system that “breaks” Unbreakable particularly in the area of shared storage.


of 24


• RAC licenses are an incremental cost in the Enterprise Edition - per-processor licensing can add 50% to the cost based on list prices. However, for many applications and organizations, Named User licensing costs much less. View license costs in TPC-C benchmarks with caution – due to the number of users (hundred of thousands) required achieving the high transaction rates; per-processor option is always used.

• Keep in mind that Oracle Automatic Storage Management (a new approach to database storage in 10g) and Oracle’s own clusterware are mandatory for RAC deployed with 10g Standard Edition, so you will absolutely be required to develop skills internally or engage a services organisation to deploy and support these configurations.

• ASM has the capability to deliver cost and control benefits, but extensive testing is required before the theoretical benefits of such leading-edge technology are demonstrated and widely accepted in business critical production applications.


of 24


2. INTRODUCTION

Since the release of Oracle 9i, Oracle has positioned it’s Real Application Clusters architecture as a one-stop-shop DBMS solution for providing performance, availability, and scalability.

For organisations where costs are tightly constrained (which means most if not all of us) the emergence of enterprise Linux has enabled the possibility of RAC deployment on lower cost Intel-based (and latterly non-Intel) servers giving rise to performance gains compared to existing RISC-based Oracle solutions.

This paper, based on Linxcel’s experiences of real-world RAC deployments, sets out some of the real-world challenges we’ve seen in organizations who have deployed, or considered deploying, or are planning to deploy RAC solutions.

It’s intended for DBAs, developers, and managers from within and outside the Oracle community. We’ve tried to limit the amount of technical detail – but of necessity there is some. You can’t understand the challenges of RAC without understanding exactly what RAC is. This paper covers both 9i and 10g – not least because if you deployed RAC on 9i, you’re probably interested in whether some of the challenges with 9i have been overcome in 10g.

We cover:

• Understanding the RAC architecture

• Challenges in adopting new technology

• Understanding the costs

• Choosing a storage architecture

• RAC performance and performance management


of 24


3. UNDERSTANDING THE RAC ARCHITECTURE

Before considering the benefits of RAC (business or otherwise) it’s essential to have a clear understanding of the RAC architecture and terminology. The benefits of RAC come at a significant incremental cost in most cases, and in many respects the basis of the increased costs arises from the architecture of RAC itself and the changes to age old DBA processes and technologies.

The challenge for organizations is that differences in the Oracle architecture of a RAC deployment differ very significantly from a traditional single instance configuration. However, the typical slide show presentation of a RAC configuration doesn’t necessarily show the full complexity: and as is usually the case in the real-world compared to the marketing world, the devil is in the detail.

This section describes a simple single instance Oracle configuration, a simplified RAC architecture diagram and a real-world RAC deployment with all the incremental components you need for an enterprise-strength configuration.


of 24


3.1 Single Instance Configuration

The terms Oracle database and Oracle instance have different meanings, although they are often used synonymously for an Oracle configuration in a non-clustered environment on a single server.

An Oracle server comprises an Oracle database and an Oracle server instance. The instance is actually a shared memory area (referred to as the System Global Area) and some background processes, including the database writer (dbw0), the redo log writer (lgwr), and the archived redo writer (arc0).

Oracle database data is stored on disk in the data files, with transaction changes written through the online redo logs and optionally saved permanently in the archived redo logs to enable recovery to a point in time. The control files hold the database structure and the parameter files hold the configuration. Production systems always run in ARCHIVELOG mode, which enables use of the archived redo logs.

… …

Single Instance Oracle

Oracle Server

Disk Storage Oracle softwareOnline redo logsData filesControl FilesParameter FileArchived Redo Logs

Oracle SGA data buffer cache redo log buffer SQL cache

dbw0 lgwr arc0

Oracle Instance


of 24


3.2 Simplified RAC Configuration

When you move to a RAC configuration, multiple instances on separate servers access the same database. In the diagram, database DB1 is deployed using a RAC configuration. The instances are named DB11 and DB12.

The online redo logs, data files, control files and parameter files must reside on shared storage. Shared storage that is written to cooperatively from multiple servers at the same time is at the heart of RAC. Storage by default is not designed to be shared in this way.

RISK: That’s why the choice of shared storage one of the most challenging and critical decisions you have to make when choosing to deploy RAC, and why a whole section of this document is devoted to it.

In order ensure that writes are performed in the right sequence, the multiple instances must co-operate. For example, what happens if one instance needs to process a table row and that row is present in a block in the other cache, and perhaps hasn’t been written to disk yet? That’s where the high speed interconnect comes into play.

COST: This interconnect, which is an incremental requirement for RAC configurations is used to present a single logical view of all Oracle SGAs across the cluster.

Oracle manages this logical view through a process known as Global Cache Fusion (GCF).

GCF ensures that data blocks are available where and when required. It’s also at the heart of some performance benefits from RAC configurations, especially for applications that are more read than write intensive. For example, it’s typically faster to fetch a block from the cache in the other instance across the interconnect than perform a physical read from the local instance. The downside is also true: for situations where there is need to update a particular block simultaneously on both nodes, RAC introduces the potential for additional overhead due to the need to synchronize the transfer of the block.

… …

2 Instance RAC Oracle (simplified)

Oracle Server1

SHAREDDisk Storage

Online redo logsData filesControl FilesParameter File


dbw0 lgwr arc0

Oracle software ??Archived Redo Logs ??

… … Oracle Server2


dbw0 lgwr arc0

High Speed Interconnect

Oracle Instance DB11

Oracle Instance DB12


of 24


RISK: You can understand from this diagram how RAC enables scalability through “scale-out”. For example, if your 2 instance cluster requires more hardware resource, you can add another node (and instance) to the cluster; this approach assumes that the application architecture will scale out to support this deployment model.

This diagram omits two essential components of the Oracle configuration – the location of the archived redo logs and the Oracle software itself. You need to decide where to locate them and that’s not necessarily an easy decision, because each solution has costs and benefits.

3.2.1 Choosing the Oracle Software Location

Logic suggests that if each Oracle instance could share a single copy of the Oracle software cluster-wide, similar to the way in which the database files are accessible via shared storage, then support and administration costs could be reduced.

Two considerations mean that this is not a straightforward decision:

• Oracle 9i is not designed to share the Oracle software

• Sharing the Oracle software makes it potentially a single point of failure

Before considering whether to share the Oracle software, you should be aware that Oracle 9i was not designed to support sharing of the software across multiple nodes. Some Oracle files contain node-specific information that leads to conflicts when only one instance of the file is shared cluster-wide.

For example, Oracle installations frequently deploy the Oracle listener with the default name LISTENER, which associates a network port with the local hostname of the server on which the listener process runs. When the listener.ora configuration file is shared across multiple cluster instances, there is conflict because the same listener name can no longer be shared across multiple servers.

In this case the solution to the problem is to create a named listener for each node, for example LISTENER_SERVER1 and LISTENER_SERVER2 for a cluster containing two nodes named server1 and server2. A single listener.ora can now be used to contain both entries and shared cluster-wide. The downside is DBA that administration procedures need to change to ensure that LISTENER_SERVER1 is only ever started on server1 and LISTENER_SERVER2 is only ever started on server2. The management challenge is enforcing this approach at all times. This listener example is one of a number of similar scenarios that exist and require careful consideration and testing in advance of deploying 9i RAC.

COST: As a result, our experience is that most 9i RAC installations chose a local Oracle software installation along with the duplication of storage management, administration and software management that entails.

Oracle 10g supports the sharing of the Oracle software by design – and in fact the named listener approach is the one taken for the situation in the example above. For 9i, the solution was to duplicate the Oracle software locally on each node – with the administrative overhead for tasks like patching increasing along with the number of nodes in the cluster. In order to lessen the burden with managing node-local Oracle software installations, the Oracle Installer transparently applies patches to all nodes in the cluster when the patching is applied to single node.

RISK: We believe that additional health-checks to ensure that the Oracle software is in fact identical on all nodes in the cluster after the Oracle Installer procedures have completed to be prudent. After-all different versions of Oracle software on cluster nodes may lead to instabilities.


of 24


Although 10g is designed to support cluster-wide sharing of Oracle software, should you do it? Keep in mind that if you do so, the availability of the Oracle software becomes a potential single point of failure for the whole cluster compared to the local per-server installation approach.

RISK: You might consider using NFS as a technology to share the software cluster wide. NFS is based on a master-slave architecture where one node in the cluster hosts the shared software and the other nodes are clients that share it. If you lose the master node, then you lose the software cluster-wide. This would violate the 100% availability which is one of the foundations of RAC.

To protect your cluster from the loss of the shared Oracle software means that high-availability (HA) capability must be built into to your software-sharing architecture. In the case of NFS you might choose to deploy an HA version of NFS such as that from VERITAS. This provides the NFS master on a virtual IP address which can quickly be relocated to a different physical server when the original server experiences an outage.

COST/RISK: Solutions like those from VERITAS and others are robust and industrial strength but typically come with an incremental price tag. In this case, deploying the most robust and lowest overhead solution comes at additional cost in both acquisition terms, but also in changes to people’s skills and processes in your organisation.

3.2.2 RAC’s Impact on Your Oracle Backup and Recovery Technology

Our view at Linxcel has long been that the most reliable Oracle backups (and more importantly recoveries) are best performed using Oracle’s Recovery Manager (RMAN) server utility. The focus on recoverability rather than backup performance is critical. It’s imperative that you don’t discover flaws in your backup procedures when you need to recover a production system in an emergency.

COST: If you don’t use RMAN today, the need to hide the added complexity of the RAC architecture makes it mandatory in our opinion that you start using RMAN for your backups.

COST: Keep in mind that a decision to implement RMAN immediately has a cost implication: you’re almost certainly going to need to acquire a 3rd party media management package in order to integrate your backup with a tape library of some form.

Once you’ve taken that decision, then there still some challenges to overcome in how exactly you run your RAC database backups:

1. To which location do you write the archived redo logs for each instance?

2. Which node do you run the backup from?

3. How do you handle the situation where the backup node is not available?

Remember that each instance in your cluster has its own “thread” of redo for managing block changes that take place as a result of processing performed in that instance. Each thread is identified by a unique thread number. When you think about it, this makes sense, because each instance in the cluster needs the ability to run stand-alone if the other instances are down.

Typically in a cluster environment, Oracle is configured so that the thread number is used in the name of the archived redo log files generated by each instance, making it possible to write


of 24


the archived redo logs for each instance into the same directory without any name clashes. To actually do that requires a cluster-wide shared directory.

COST & RISK: This requirement raises the same HA issues that arise when sharing the Oracle software. If you are prepared to take the cost of implementing an HA shared directory for the archived redo logs this brings a major benefit: you can run your RMAN backup from any node in the cluster, and the archived redo logs will be available for backup independent of the node. On the other hand, if you choose to write archived redo logs for each instance to a local directory to save on costs, whilst you can run the database backup from any node, you’ll need to run a separate backup command on each cluster node to backup the local archived redo logs. A third possibility is to use NFS to ensure that all local directories appear on all nodes: this leads to the same issues that arose when using NFS for the Oracle software.

Finally, you need to handle the situation where your chosen cluster node from which to initiate the backup is not available. The issue of associating the backup with a specific physical node can be overcome by initiating the backup from a virtual hostname, thereby enabling the backup to run transparently from any cluster node that is up.

COST: This flexibility requires the purchase of additional software to provide the virtual hostname and failover.

COST/RISK: The alternative is to manually code the backup script to ensure the backup is guaranteed to be redirected to a node hosting an Oracle instance that is up and running – this is more complicated and requires effort to develop and maintain.

COST/RISK: The deployment of RAC will introduce additional costs (such as training, testing, and infrastructure integration) that come with deploying a new technology. With them will be associated risks and costs that can be mitigated, but in order to do so the adoption of RAC should be treated as a major project in terms of integration (hardware and software), process impact, skills development and infrastructure change.


of 24


3.3 2 Instance Real-World RAC Implementation

As a high-availability/high-performance solution, your RAC deployment needs to have HA facilities and high-throughput capability built in through-out the stack. The following diagram shows an example of the hardware duplication required for a two instance cluster.

2 Instance RAC Oracle (real-world)

SHAREDDisk Storage

Online redo logsData filesControl FilesParameter FileOracle SoftwareArchived Redo Logs

High Speed Interconnects

Oracle Server1 Oracle Server2

SAN Switches

3.4 High Availability for the Interconnect

It’s essential that HA facilities are provided for the cluster interconnect – for example, if the interconnect was provided through a single switch, loss of the switch would cause outages for cluster instances.

Note: In 10g use of a switch for the interconnect in mandatory even for a two-node cluster. This is a change from 9i where a cross-over cable was supported.

COST: Therefore, at least two Network Interface Cards (NICs) per server and two switches across the cluster must be available to provide an HA interconnect.

As shown in the diagram, in the case loss of a single switch or card on either server, an interconnect path between nodes in the cluster would still be available. By design, Oracle itself provides no facilities to team NICs together to provide HA for the cluster interconnect. In reality, Oracle relies on external (to the DBMS) services to provide NIC-teaming and load-balancing across multiple NICs. The availability of such features depends on the operating system and 3rd party software you use.


of 24


RISK: For example, on Linux, Open Source software is available to team two NICs to a single IP address in order to provide HA for any network, including the customer interconnect.

Alternatively, network-driver-level load balancing and failover is available for customers using NICs from specific vendors such as Intel.

For customers who deploy VERITAS cluster file system for RAC which forms part of the Storage Foundation for Oracle RAC (SFOR) suite, global interconnect traffic can be routed over the interconnects used by VERITAS for inter-node heartbeats – in this case the VERITAS software is responsible for both load-balancing and failover for the interconnect across the available NICs, transparently to RAC.

COST: This is a very elegant and powerful solution which relies on the acquisition of SFOR. At the time of writing SFOR was only available for 10g RAC on Solaris and is an incremental cost.

3.5 Interconnect Bandwidth Considerations

The issue of bandwidth requirements for the interconnect traffic needs to be kept in mind when designing your cluster. Our experience is that the bandwidth of the interconnect itself it not usually a limiting factor in the performance of the cluster database. However, the actual traffic is dependent on the application and workload on the cluster database.

However, if you want to investigate the possibility for consolidation of multiple existing stand-alone instances onto a single RAC database, or even the possibility to run multiple RAC databases on a single cluster, there’s the potential for much more interconnect traffic.

RISK: It’s therefore important that the solution you choose has the capability to handle increases in interconnect traffic, and you need to incorporate that into your design. Today, there are currently few guidelines to determine how to fit your existing workloads into a cluster. For example, would you use 4 x 2 CPU nodes, or 2 x 4 CPU nodes?

Note that Oracle provides a CLUSTER_INTERCONNECTS initialization parameter which is supported on some operating systems – earlier documentation that stated t This parameter is actually designed to allow multiple network interfaces to be specified explicitly for interconnect traffic in order to increase bandwidth.

RISK: However, if one of the interfaces drops, then the cluster interconnect is considered to be unavailable – not really appropriate for an HA design.

3.6 Interconnect Latency Considerations

The latency of the interconnect, rather the bandwidth, is more likely to limit performance for more extreme workloads. Several vendor-specific lower latency protocols, software, and hardware solutions are available to provide significantly faster RAC interconnect performance than standard Gigabit Ethernet running over UDP or TCP. Some examples are provided in the following section for popular Oracle platforms.

Some solutions rely on purpose-built hardware cards such as HP Hyperfabric and Sun SCI – these offload network processing onto a local processor built into the card, rather than the host CPU. These solutions can free up resources for other processing as a result. In some cases, use of specialist hardware has a dependency on third part clusterware.


of 24


RISK/COST: Typically, the higher performance options lead to additional software and/or hardware cost. Before you choose such a solution you should keep in mind Oracle Support’s current position:

“Oracle Corporation has done extensive testing on the Oracle provided UDP libraries (and TCP for Windows). Based on this testing and extensive experience with production customer deployments, at this time Oracle Support strongly recommends the use of UDP (or TCP on Windows) for 9i and 10g RAC environments”

Operating System

Clusterware Network Hardware RAC Protocol

Oracle Clusterware

Hyperfabric UDP

Oracle Clusterware

Gigabit Ethernet UDP

HP-UX

HP ServiceGuard Hyperfabric HMP

Linux Oracle Clusterware


Microsoft Windows

Oracle Clusterware

Gigabit Ethernet TCP

Oracle Clusterware


Sun Cluster SCI Interconnect RSM

Sun Cluster Firelink interconnect

RSM

Sun Solaris

Veritas Cluster Gigabit Ethernet LLT

3.7 High Availability for the Storage

Alongside the HA requirements for the cluster interconnect, there is a requirement for any network storage (both SAN and NAS) to be provided in a way that protects the host from single points of failure in the storage architecture.

As a result, in the case of SAN for example, two host bus adapter (HBA) cards are used to connect a given host to the storage network, and at least two paths are provided to the actually storage array itself through multiple SAN switches and interface ports on the storage array.

In fact there’s nothing in this configuration inherent to RAC. Existing single instance Oracle configurations require the HA capabilities of the storage architecture in exactly the same way. But, due to the fact that RAC storage is shared, in the real world much tighter constraints are imposed on things like driver versions for the HBA cards, and storage/SAN/host compatibility matrixes compared to a single instance Oracle configuration. This implications of choosing various shared storage architectures are covered in the following section “Choosing a Storage Architecture for RAC”.


of 24


4. CHALLENGES IN ADOPTING RAC TECHNOLOGY

One of the main challenges with any new technology, including Oracle RAC, is deciding at which point in the life-cycle to adopt it, if at all. Typically, any announcement of new technology by the mainstream DBMS vendors is accompanied by an enormous amount of fanfare. The downside is that new technologies tend to have gaps, anomalies, and deficiencies when deployed in the real world. Some of these gaps were known at the time of original release, and perhaps didn’t get the publicity they should have done, and others have emerged over time.

This section sets out to provide some background on RAC configurations, goals, and technologies in the real world – along with a review of how RAC has changed since the original release (and why), and to what extent it meets its original goals in the current release, 10g.

4.1 Business Drivers for RAC Deployment

Typical studies include one or more of the following potential benefits for deployment of RAC:

• Significant price/performance benefits

Our experience is that this requires an Intel based solution in order to leverage the performance and prices gains from lower cost commodity servers from vendors such as HP, Dell, and IBM, to offset increased license costs from the RAC option.

• Lower Total Cost of Ownership

Many deployments and case studies existing demonstrating lower TCO when deploying RAC, however with the incremental effort and technology required for new adoptions we believe the correct approach here is to develop a technology strategy based around Availability, Scalability, Performance and Cost and then asses whether RAC actually delivers against those requirements for key services relating to DBMS provisioning. There are many incremental costs that appear to be lost in translation in the TCO studies we have reviewed.

• More Efficient System Utilization

More efficient system utilization is promised by grid-based architectures. However, our experiences are that this goal represents more of future benefit than reality at the moment. To demonstrate the gap between theory and practice, a following section shows the basic steps in adding a node to cluster in RAC today – it’s far from automatic and instantaneous.

• Better Scalability to Handle Growth

RAC does have the potential to scale by adding nodes to a cluster. However, given that most clusters we see today contain 2-nodes, we view this as more of a future benefit than reality. Also keep in mind that some vendors with a history of product deployment in business critical applications place relatively low limits on the number of RAC nodes their products support.

RISK: This is in stark contrast to the huge clusters you see may see demonstrated at exhibitions.

Finally, treat TPC-C benchmarks with caution – we would recommend potential RAC adopters to review the full disclosure reports available at www.tpc.org before you use the price/performance figures as basis for saving in your own organization. Very few


of 24


customers ever need to scale at anywhere near the architectures presented in these tests – are they applicable to you?


of 24


• Higher Availability

RAC can certainly provide higher availability than single instance solutions and traditional active-passive clustering when deployed correctly. The challenge of course is the correct deployment, adoption and integration of RAC with its new products, services and processes. In the early days of a RAC configuration it is entirely possible for human error or non familiarity to actually reduce reliability.

COST: For a true best of breed availability solution you may decide to purchase 3rd party clusterware and shared storage – which comes at higher cost. In this case, if lower total cost of ownership is one of your goals then you’ll need to do some careful calculations.

COST: We would always recommend considering external consultancy from experts in the field when deploying RAC for the first time. The project to adopt RAC as a business as usual process should be treated the same as any major adoption of a new data centre technology. Treating RAC as a simple extension to an existing Oracle capability will lead to inevitable problems, additional costs and risks to the end state deployment.

4.2 Cluster Configurations Deployed in the Real World

Our experience is that Oracle sites that have deployed or are considering migration to RAC fit in relatively few categories and the majority use a two-node configuration based on Linux, Unix or Windows:

• Those that deploy an existing HA solution such as VERITAS Cluster Server in a two node active-passive configuration. Oracle RAC enables these configurations to make better use of the existing hardware by running active-active for better day-to-day performance, and provide high-availability without the need to “failover” any infrastructure – potentially increasing reliability.

• Those that are looking to improve price/performance of their existing hardware footprint – in this case the attraction of Linux migration is clear for those on proprietary RISC architectures. Many publicly available studies have shown the very significant price performance gains from moving to RAC/Linux on commodity Intel (and more recently non-Intel) based servers.

• Those that run an enterprise-wide ERP product such as Oracle Financials. Once again the attraction of active-active configurations for larger servers (often 16+ CPUs) to provide performance and availability is a key factor. These systems are more likely to use proprietary interconnects for performance, and 3rd party clusterware and shared storage.

RISK: The issue of how to partition a given CPU resource across a number of server nodes is not obvious. For example, given 8 CPUs, would you deploy RAC on 2 x 4 CPU modes, or 4 x 2 CPU nodes. Possibly the first thing to consider, assuming high availability is a goal for your cluster, is to identify how your cluster workload would be allocated in the case of single server loss.

For the case of a two node cluster a simplistic analysis is relatively straight-forward. If your CPUs on each node run at say 35%, then you can lose one node and run the whole workload on a single node at a utilization of 70%. If the CPUs run at 45%, then loss of a node means a 90% utilization on the remaining node, which could cause performance degradation. In this


of 24


case, a 4 node cluster would be more appropriate to handle the single point of failure caused by loss of one node. However, if you run more than two nodes, and your workloads are partitioned between nodes, or you run multiple cluster databases, it becomes more complicated to design workload migration to other nodes up loss of a single node.

RISK/COST: Widespread adoption of RAC has been hampered by the difficulty for DBAs and other technical disciplines in the data centre in learning how to manage RAC, without requiring a fully fledged enterprise system with expensive network storage. Remember that the adoption of RAC in the “live” environments will have reciprocal costs and effort required throughout develop and testing environments.

4.3 Unbreakable Linux

One reason to deploy RAC on Linux rather a proprietary UNIX OS is that you potentially enable your OS and Oracle support under Oracle’s Unbreakable Linux program: in this case Oracle will provide Linux as well as Oracle support as one-stop-shop if you encounter problems.

Compare this to the traditional support model. When you run Oracle on a proprietary RISC based architecture, if operation system related issues arise (or those perceived to be by Oracle Support), you need to involve your OS vendor in the support process. This can lead to “blame-storming” and longer problem resolution. Given the additional infrastructure complexity introduced by RAC, the Unbreakable option for Linux appears increasingly attractive.

There are some caveats you should be aware of to take advantage of unbreakable Linux.

In order to receive Linux support from Oracle you must be running one of the Oracle-supported enterprise Linux variants such as Red Hat Enterprise Linux AS or ES, and you must maintain a support contract of the required level with the vendor.

What’s more likely to break “Unbreakable” is the requirement to run with an untainted kernel. In the Linux world “tainting” is something that happens to your operating system when you load kernel modules that weren’t released under the Open Source license agreement – keep in mind that although enterprise Linux variants are not free, they must provide source code for any changes they make to their distributions.

The following are not supported by Oracle under Unbreakable Linux:

1. recompiling the kernel, with or without source modifications,

2. loading of binary modules within a kernel

3. loading of any unsupported kernel modules

4. loading of any closed source drivers

Risk: Unfortunately is is not uncommon to have to introduce one or more of these unsupported items in order to deploy a practical RAC implementation


of 24


Historically, the most likely place where tainting is introduced have been through closed-source host software drivers required to access network storage such as SAN and NAS. Keep in mind that the shared storage mandated by RAC already imposes stricter constraints on your storage component compatibility, in terms of:

• HBA cards

• Drivers

• Firmware

• Storage switches

• Storage arrays.

RISK: As a result it’s a challenge to find a RAC-compatible and Oracle-Unbreakable-Linux supported storage architecture on Linux. If you run a non-Open-Source software stack, support is provided by the vendors rather than Oracle. This may result in the adoption of another storage architecture specifically for RAC, with its inherent additional cost implications and obviously also break the “Unbreakable Linux” value proposition.


of 24


5. CHOOSING A STORAGE ARCHITECTURE FOR RAC

Using shared storage for a cluster database is a feature unique to Oracle RAC at the time of writing.

RISK/COST: Along with the potential benefits comes very significant additional complexity.

The number one scenario that must be avoided at all costs in such a scenario is referred to as split brain, or network partitioning – when this occurs, a node that is no longer considered to be a member of the cluster (for example because it lost network contact with other nodes) continues to write to the shared database outside the control of the cluster, and corrupts the database.

This section provides some options for shared storage – the most difficult technical hurdle to overcome when deploying RAC – including Automatic Storage Management, Oracle’s new-for-10g technology

Before consideration technical solutions for shared storage, a set of requirements is helpful to evaluate solutions against. An ideal solution would provide:

1. High performance

2. Flexibility of management (especially to handle space growth/shrinkage)

3. Protection from media corruption

4. Protection from loss of network paths to storage.

Item 4 is typically provide in 2 different ways for SAN storage: the host is attached to the storage through redundant paths, through 2 host bus adapter card on the host. Failover between cards on loss of network path or card failure is either provided through SAN-vendor-specific software (such as EMC PowerPath) or through software features of a clustered file system (such as VERITAS cluster file system for RAC).

Oracle today recommends an approach to database physical layout referred to as SAME (stripe and mirror everything) using a smaller number of large volumes for flexibility of management and high performance.

5.1 Raw partitions

In the early days of Oracle RAC on the original release of 9i, it was mandatory to use raw partitions to present the shared storage for a RAC database.

RISK: The inflexibility of raw partitions for management of database space has been well documented for a long time: for example, if you need to increase database space, you must have partitions ready to add and there are limitations on the number of raw partitions per physical device, this requires careful ongoing management.

Performance on raw is typically good because it bypasses file-system buffering which saves CPU time.


of 24


5.2 Oracle Cluster File System (OCFS)

Given that the requirement to use raw was one reason why RAC deployment was held back in the real-world, OCFS was designed by Oracle to address the lack of cluster-wide file-system on Linux. OCFS has some other advantages. For example, it’s Open Source, so is available for enhancement by the Open Source community, and uses direct I/O thus bypassing any file system caching. Today, it’s also available for Windows.

RISK: However it also has some documented design limitations that you should be aware of: it’s not designed to be a generic cluster file system (until the long-awaited version 2 is announced) and therefore you can only use it for your Oracle database files, and archived redo logs. You can use a limited set of Linux commands on it, for example tar, dd, and cp but you must remember to use special command line options. For these reasons, it should not be treated as a general purpose cluster file system.

5.3 Oracle 10g Automatic Storage Management

RAC adoption has suffered by the challenge of providing shared storage across a cluster in a high performance and yet easily manageable form. To address this, Oracle has provided Automatic Storage Management (ASM) in 10g. ASM is a Oracle’s own redesign of a storage architecture, from the ground up, to support the requirements of an Oracle database for performance and availability in all configurations, including RAC. In effect, it potentially makes 3rd party volume managers obsolete for managing Oracle databases.

ASM introduces the concept of failure groups that enable a DBA to add storage according to availability requirements rather than through placement of files on a file system. For example, if storage redundancy is required to protect data from a single point of failure, the DBA can add space into a group with mirrored storage attributes. There is no need for the DBA to know whether the mirroring is provided through a single device from an internally mirrored storage array (external redundancy), or through a pair of devices mirrored explicitly through the details supplied to the ASM configuration. Note that if a pair of devices in a failure group are actually provided from two distinct underlying storage arrays, ASM can actually provide mirroring across the two arrays. For high performance, ASM also sets out to provide automatic load balancing as a given, easily assimilating additional devices into existing disk groups.

Whilst early in its lifecycle ASM does have obvious benefits in the minefield of storage management challenges for RAC.

RISK: However, by abstracting database storage away from something you can see (e.g. files on a file system) , Oracle has taken the database away from the people managing the storage in the past and turned it into a “black-box”. This is an area where products from mature storage vendors have a long and proven track records in the most extreme production environments, are you prepared to stop using those products and now replace them with ASM ?

COST: Before deploying RAC on ASM, we would strongly recommend extensive testing and training to reduce the risk of adopting and integrating ASM into your infrastructure.

RISK: You still need to keep in mind the advantages of dynamic multi-pathing provided by traditional storage management vendors. ASM doesn’t provide this essential feature at this time and you currently need to implement it separately.


of 24



of 24


6. PERFORMANCE AND PERFORMANCE MANAGEMENT OF RAC SYSTEMS

One of the compelling claims made at the time RAC was announced, was that you could take an existing single instance application and deploy it onto a RAC configuration and it would immediately scale well without any application changes or workload partitioning between nodes.

RISK: Our experience in the real world is that this isn’t always the case for OLTP type workloads although it’s not such an issue for CPU-intensive or direct-path load workloads.

Prior to any migration of an existing application into a RAC configuration it will require extensive analysis and volume testing to understand its probable performance characteristics post deployments.

In some cases your application can usually be modified to scale well across the cluster, or alternatively you can partition the workload to avoid performance issues associated with the global cache.

COST: Partitioning the workload to suit the behaviour of RAC obviates one of the immediate benefits – that being that it should just scale without any change to the application. The effort required to change the processing profile of a major application can be very significant, in doing so the customer would be expected to have a detailed understanding of which workloads are non-cooperative in the cluster.

RISK/COST: One of the main challenges for organizations deploying RAC is that the RAC architecture and particularly the cluster interconnect introduce the potential for a whole set of performance management challenges that simply don’t exist in a single instance configuration. These require additional skills and in some cases tools to manage this area.

COST: In order to address this, deep Oracle performance management skills are usually required that may require external consultancy.

Furthermore, if you choose to deploy your interconnect using some vendor specific (and therefore proprietary) interconnect for performance enhancements (as opposed to Oracle’s recommendation to use UDP/TCP and Gigabit Ethernet) the downside is that you can’t use operating system built-in command for monitoring: instead you need proprietary tools to be available to monitor interconnect performance.

whitepaper: real world oracle real application cluster ... challenges for oracle rac...real world...

Documents