data guard and fail safe

104
'LVDVWHU7ROHUDQW+LJK$YDLODELOLW\ Oracle Data Guard with Oracle Fail Safe An Oracle White Paper June 2002

Upload: dsavidicus

Post on 22-Nov-2014

412 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Guard and Fail Safe

��������������� ���� ������������ Oracle Data Guard with Oracle Fail Safe An Oracle White Paper June 2002

Page 2: Data Guard and Fail Safe
Page 3: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 1

Disaster-Tolerant High Availability

1 EXECUTIVE OVERVIEW .....................................................................................................3 2 INTRODUCTION .................................................................................................................3 3 CLUSTER CONFIGURATION ..............................................................................................6

3.1 Microsoft Cluster Service ..............................................................................................7 3.1.1 Install MSCS and Configure the Example Clusters...............................................7

3.2 Validate the Cluster Network Configuration ...............................................................9 4 ORACLE SOFTWARE CONFIGURATION...........................................................................9

4.1 Install and Configure Oracle9i Database Enterprise Edition....................................10 4.1.1 Create Initial Network Configuration Files .........................................................10 4.1.2 Create Primary Database .....................................................................................11 4.1.3 Start Instance and Listener Services and Connect to Primary Database ...........14 4.1.4 Specify Location for Oracle Data Guard Configuration Files ............................14 4.1.5 Specify Standby Archive Log File Destination....................................................14

4.2 Create Initial Primary/Standby Database Configuration ...........................................14 4.2.1 Configure Oracle Enterprise Manager ................................................................16 4.2.2 Start the Oracle Management Server...................................................................16 4.2.3 Start Oracle Intelligent Agent on All Cluster Nodes...........................................16 4.2.4 Open the Oracle Enterprise Manager Console ..................................................17 4.2.5 Discover Primary and Standby Cluster Nodes....................................................17 4.2.6 Set Preferred Credentials on Primary and Standby Nodes ................................18 4.2.7 Open Data Guard Manager .................................................................................18 4.2.8 Create the Initial Oracle Data Guard Configuration ..........................................19 4.2.9 Validate Initial Primary/Standby Configuration..................................................28 4.2.10 Optionally, Specify a Time Delay for Applying Archived Redo Log Files......30 4.2.11 Delete Initial Primary/Standby Configuration ..................................................31 4.2.12 Stop and Disable Default Oracle Intelligent Agents ........................................32

4.3 Install Oracle Fail Safe................................................................................................33 4.3.1 Open the Oracle Universal Installer ...................................................................33 4.3.2 Specify File Locations ..........................................................................................34 4.3.3 Select Oracle Fail Safe .........................................................................................35 4.3.4 Installation Types.................................................................................................36 4.3.5 Reboot Needed After Installation........................................................................37 4.3.6 Review Summary Information.............................................................................38 4.3.7 Enter Domain User Account for Oracle Services for MSCS...............................40 4.3.8 Confirm Installation and View Release Notes ....................................................41 4.3.9 Reboot Cluster Node............................................................................................41

Page 4: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 2

4.3.10 Install Oracle Fail Safe on Remaining Nodes ...................................................41 4.3.11 Verify the Primary and Standby Clusters ..........................................................41

4.4 Configure Database Virtual Servers ...........................................................................44 4.4.1 Configure Database Parameter Files ...................................................................44 4.4.2 Create Virtual Servers for Primary and Standby Databases ...............................45 4.4.3 Execute Verify Standalone Database Command ................................................49 4.4.4 Add Each Database to its Associated Virtual Server ..........................................51

4.5 Create Final Highly Available Primary/Standby Configuration ................................62 4.5.1 Discover Virtual Servers ......................................................................................62 4.5.2 Create Highly Available Primary/Standby Configuration...................................63 4.5.3 Verify Highly Available Primary/Standby Configuration ...................................70

5 OTHER CONFIGURATIONS .............................................................................................71 5.1.1 Benefits.................................................................................................................73 5.1.2 Trade-offs .............................................................................................................73

5.2 Single Active/Active Cluster .......................................................................................73 5.2.1 Benefits.................................................................................................................74 5.2.2 Trade-offs .............................................................................................................74

5.3 Two Active/Active Clusters ........................................................................................74 5.3.1 Benefits.................................................................................................................75 5.3.2 Trade-offs .............................................................................................................76

5.4 Multiple Primary Locations and Single Standby Location ........................................76 5.4.1 Benefits.................................................................................................................77 5.4.2 Trade-offs .............................................................................................................77

6 MAINTENANCE AND ADMINISTRATION EXAMPLES ....................................................78 6.1 Performing an Oracle Fail Safe (MSCS) Failover.......................................................78 6.2 Changing the SYS Database Account Password .......................................................80

6.2.1 Update Primary SYS Database User Account Password ....................................81 6.2.2 Update the Standby SYS Database User Account Password .............................87

6.3 Performing Rolling Upgrades.....................................................................................89 6.3.1 Upgrading Hardware or Operating System Software ........................................89 6.3.2 Upgrading Oracle Fail Safe or Oracle Application Software .............................91 6.3.3 Upgrading Oracle Database Software.................................................................91

6.4 Performing an Oracle Data Guard Failover or Switchover Operation ....................93 6.4.1 Disable Is Alive Polling .......................................................................................93 6.4.2 Perform the Role Transition Operation ..............................................................94 6.4.3 Reenable Is Alive Polling.....................................................................................94 6.4.4 Verify the Primary and Standby Virtual Server Groups .....................................94 6.4.5 Switchover Example ............................................................................................94

6.5 Performing Database Backups...................................................................................99 7 SUMMARY AND MORE INFORMATION........................................................................100

7.1 Oracle Product Documentation ...............................................................................100 7.2 Oracle9i Database High Availability and Disaster Recovery Web Site..................100 7.3 Oracle Fail Safe Web Sites .......................................................................................100 7.4 Oracle University Online Learning Web Site ..........................................................100 7.5 Oracle Support MetaLink Web Site .........................................................................100

Page 5: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 3

Disaster-Tolerant High Availability

1 EXECUTIVE OVERVIEW

When deploying an “always on” 7 x 24 x 365 mission-critical business system, it is essential to ensure both high availability and disaster tolerance. Many lower cost disaster recovery solutions (for example, the creation, offsite storage, and retrieval of system backups) do not meet the availability requirements for business-critical operations. This paper describes how deploying Oracle9i Database (release 9.2 or later) on commodity Windows clusters with a combination of Oracle Data Guard (release 9.2 or later) disaster tolerance features and Oracle Fail Safe (release 3.3 or later) high availability features provides easy-to-configure and cost-effective disaster-tolerant high availability.

2 INTRODUCTION

While there can be some overlap, the features and technologies used to provide high availability are generally distinct from those used to ensure disaster tolerance. High-availability solutions typically focus on protecting against individual component or system failures, while disaster-tolerance solutions typically focus on protecting against data corruption and site failures. Each can help to keep business-critical systems operational, but neither alone is sufficient to ensure the levels of near continuous operation required for most business-critical systems. For example, while redundant or clustered hardware can eliminate individual systems as points of failure, it does not protect against a disaster that incapacitates the site where the systems reside. Similarly, while a standby database solution, such as Oracle Data Guard, provides excellent disaster tolerance features, it may take time to switch operations from the primary site to a physically separate standby site (for example, you may first need to apply additional time delayed redo data to make the standby database current before it can be reconfigured as the new primary site). This is true not only when dealing with unexpected disasters and component failures, but also for the more common outages associated with planned maintenance and upgrades. Fortunately, Oracle supports many technologies that easily can be combined to provide the required levels of high availability and disaster tolerance.

This paper describes how to combine Oracle Data Guard with Oracle Fail Safe to provide an enhanced level of disaster-tolerant high availability for single-instance Oracle9i Database Enterprise Edition databases deployed on Windows clusters configured with Microsoft Cluster Service. The result is a complementary and easy-to-configure set of high availability and disaster tolerance features (shown in Table 1 below) that eliminates many potential sources of downtime.

Page 6: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 4

������� ��������

������� ������� ������� ������

• Provides basic failover clustering services that eliminate individual host systems as single points of failure

• Is included with Windows NT Enterprise Edition, Windows 2000 Advanced Server, and Windows 2000 Datacenter Server

• Supports clusters with up to two nodes for Windows NT Enterprise Edition and Windows 2000 Advanced Server; supports clusters with up to four nodes for Windows 2000 Datacenter Server

• Continuously monitors cluster resources such as disks, IP addresses, and databases for high availability

• Supports configuring a group of cluster resources into a “virtual server” where users access the resources in the group through a fixed (node independent) network address, regardless of which physical cluster node hosts the group

• Relies on RAID or other redundant storage to protect against media failure; does not protect against data corruption

�����!i �������� �"��������� "#�����

• When used with transparent application failover, allows client applications to automatically reconnect to the database if the connection fails, and optionally resume a SELECT statement that was in progress

• When used with the Fast-Start Fault Recovery feature, allows database administrators to specify a maximum duration for database recovery time (and thus guarantee recovery times in service level agreements)

• When used with Oracle9i Flashback Query, allows users to view data at different points in time (and potentially ‘undo’ committed operations to easily recover from user error)

• When used with Oracle9i LogMiner, allows database administrators to analyze the content of database files or to quickly and selectively undo or track erroneous user updates

• Allows all maintenance operations formerly associated with downtime (such as schema changes, reorganizations, or changes to memory and storage parameters) to be performed online with no disruption of service to users

Page 7: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 5

����� ���� $���# • Excellent protection from data corruption, site disasters, and

media failures; good protection from system and component failures; maintains multiple copies of data

• Maintains up to nine standby databases, each of which is a real-time or time-delayed copy of the production database, to protect against all threats—corruptions, human errors, and disasters

• Supports both physical and logical standby database configurations

• Includes Oracle Data Guard broker (with Data Guard Manager wizards) to automate complex creation and maintenance tasks and provide dramatically enhanced monitoring, alert, and control mechanisms

����� ���� ���� • Excellent protection from system and component failures; maintains single copy of data

• Provides a fast, easy, and accurate way to configure and verify Oracle resources on Windows clusters

• Works with Microsoft Cluster Service to monitor Oracle databases and applications for high availability and, when necessary, to automatically restart them on a surviving cluster node

• Automatically fails back Oracle applications and databases to a preferred node immediately, at a specific time, or not at all

• Supports planned failovers to permit rolling cluster upgrades or workload balancing

• Is a core high availability feature included with every Oracle9i Database and Oracle Applications 11i license for Microsoft Windows NT and Windows 2000

Table 1: High Availability and Disaster Tolerance Features

The components listed in Table 1 can be configured in a variety of ways to meet individual business requirements for availability and disaster tolerance. In general, hardware or software failures, upgrades, or other maintenance operations are efficiently handled through automatic MSCS virtual server failover from one cluster node to another (typically within tens of seconds to a minute or two), with no need for users to reconnect to a remote site or to convert a standby database into a primary database. Only when all cluster nodes at the primary site fail (or there is a loss or corruption of the database files) is it necessary to switch over operations to a standby database.

Figure 1 shows an example deployment (also referred to in this paper as the “example configuration”) with a primary database and a physical standby database each deployed on a separate Windows cluster. In this example configuration, both the primary cluster and the remote standby cluster are configured with Microsoft Cluster Service (MSCS). All database data, log, and control files are located on MSCS cluster disks. Oracle Data Guard is used to create the

Page 8: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 6

primary/standby configuration, while Oracle Fail Safe is used to configure each database in a highly available virtual server environment. In the example, the redo data from the primary database is transported to the physical standby database asynchronously. Note that many production environments deploy multiple standby databases with a combination of synchronous and asynchronous redo shipping (section % provides additional examples of disaster-tolerant high availability configurations). Also, some production environments, based on individual requirements, may include other Oracle high availability and disaster-tolerant technologies such as Oracle Data Guard logical standby databases, Oracle Advanced Replication, or Oracle Real Application Clusters.

Figure 1: Disaster-Tolerant High Availability Example Configuration

The remaining sections of the paper describe the steps required to configure the example disaster-tolerant high availability solution and provide examples of how to:

• Coordinate changing database passwords

• Perform a rolling upgrade using planned failovers from one cluster node to another

• Perform an Oracle Data Guard site switchover between primary and standby locations

• Write scripts to automatically back up the standby database

The configuration process described in the paper is designed to minimize risk of user error through the use of various wizards and automated configuration tools. (Manual configuration options are also supported, but are not described in the paper.) It is assumed that you have basic familiarity with the features and concepts associated with each component. Section & provides links to additional information.

3 CLUSTER CONFIGURATION

A cluster is a group of independent computing systems (nodes) that operates as a single virtual system. The component redundancy in clusters eliminates individual host systems as points of failure and provides a highly available hardware platform for deploying mission-critical databases and applications.

Page 9: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 7

Oracle Data Guard can be used to configure and manage Oracle9i Database primary and standby databases deployed in a variety of standalone and clustered configurations, including:

• Single-instance standalone databases

• Single-instance databases configured with Oracle Fail Safe release 3.3 and later (basic shared-nothing cluster configuration with cold failover)

• Multi-instance Oracle Real Application Clusters databases (scalable shared-data cluster configuration with warm failover)

Specifically, this paper describes how the Oracle Data Guard Manager and Oracle Fail Safe Manager wizards automate configuration and management of disaster-tolerant high availability solutions on shared-nothing Windows clusters configured with Microsoft Cluster Service.

3.1 Microsoft Cluster Service

Microsoft Cluster Service (MSCS) provides a basic shared-nothing cluster environment for Windows systems. Individual cluster nodes never share the same cluster resources and individual cluster resources such as disks, IP addresses, database instances and the like are always owned by and accessed exclusively through one cluster node at any given time. If a failure occurs, ownership of the affected resources are transferred, or failed over, to a surviving cluster node. Each cluster typically is configured with at least one private (cluster heartbeat) network used for internode cluster communications and at least one public network used for client access. Because individual nodes cannot share information in memory, or read and write to the same disks, workloads cannot scale across multiple nodes (as they can, for example, with Oracle Real Application Clusters). However, because most MSCS clusters are built from standard commodity components, these feature limitations are somewhat offset by lower hardware costs. Operating system restrictions currently limit MSCS clusters to two nodes for Windows NT Enterprise Edition and Windows 2000 Advanced Server, and to up to four nodes for Windows 2000 Datacenter Server.

3.1.1 Install MSCS and Configure the Example Clusters

Microsoft Cluster Service is easily installed on any cluster hardware configuration listed on the Microsoft hardware compatibility list (http://www.microsoft.com/hcl/default.asp, search for ��� '�#���� of type �������). Although the initial steps to begin installing and configuring MSCS differ based on the underlying operating system, the overall process is similar and takes only a few minutes per cluster node. The MSCS installation and cluster configuration process is described in detail in the documentation accompanying your Windows operating system software and also in the Installing MSCS lab module in the online course Introduction to Oracle Fail Safe, which is available through Oracle University Online Learning (http://www.oracle.com/education/oln/index.html).

Note that you must first install MSCS and create a working cluster before you can install Oracle Fail Safe. Other Oracle program executable software (database and application software) can be installed on a private disk on each cluster node before or after MSCS installation (refer to the Oracle Fail Safe Installation Guide for more information).

The example configuration uses two clusters, one for the primary database and one for the standby database. Each cluster consists of two identically configured nodes, and each node must have:

Page 10: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 8

• Sufficient private disk space and memory for the operating system and all required Oracle software

• Sufficient private disk storage (either on the system drive or on additional local drives) to create Oracle home directories for Oracle9i Database Enterprise Edition and Oracle Fail Safe

In addition, each cluster should be configured with at least two MSCS cluster disk resources (one for the MSCS cluster quorum resource and one for the database files). For optimal performance, databases are often deployed using multiple disk resources (for example, separate disk arrays for the data files associated with different table spaces and for the log and control files); however, for the example, a single array will be used on each cluster for all database files. In all cases, the physical configuration for the primary and standby databases must be identical (although the drive letters used for the cluster disks on each cluster can differ).

Note that although the primary and standby clusters do not have to be identical, using identical clusters makes administration and management easier. Figures 2 and 3 provide Microsoft Cluster Administrator screen views that encapsulate the initial states of the primary cluster (FS-150) and standby cluster (FS-240) used in the example configuration after MSCS has been installed and configured. Both clusters are two-node Windows 2000 Advanced Server clusters. Aside from some slight differences in the number of cluster drives, each cluster is configured similarly. For each cluster in the example configuration, disk H: is reserved for all the data files, such as the database parameter, data, log, archive log, and control files.

Figure 2: Initial State of Primary Cluster

Page 11: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 9

Figure 3: Initial State of Standby Cluster

3.2 Validate the Cluster Network Configuration

To ensure proper network name resolution on each cluster node, perform the following steps:

1. Update the hosts file in the \system32\drivers\etc operating system directory on each cluster node to add entries for each cluster node and each Cluster Name (MSCS cluster alias). For the example configuration, this includes FS-150, FS-151, FS-152, FS-240, FS-241, and FS-242. Also add additional entries for the database virtual servers that later will be created to host the primary and standby database (FS-153 and FS-245, respectively, for the example configuration).

2. From an MS-DOS command window, use the ping command to verify that each of the preceding network names resolves to the correct public IP address on each of the primary and standby cluster nodes. If any network names do not resolve correctly, refer to the Network Configuration Requirements appendix in the Oracle Fail Safe Concepts and Administration Guide for information on how to troubleshoot and correct the problem.

4 ORACLE SOFTWARE CONFIGURATION

While the specific requirements for individual business solutions differ, the following general setup guidelines are recommended when installing and configuring Oracle software on MSCS clusters:

• Create an Oracle home on a private disk (for example, the system disk) on each node for each Oracle product that you plan to install. When possible, to minimize downtime during future upgrades, use a separate Oracle home for each major component (for example, a separate Oracle home each for the database, application software, and Oracle Fail Safe). To allow applications to fail over, ensure that the Oracle homes on each system are named in the same way (for example, name the Oracle Fail Safe home on each system ofs_home and the database home on each system dbs_home).

Page 12: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 10

• Install all necessary Oracle product executables into the Oracle home (or homes) on each node. Generally, it is best to install all Oracle products that you plan to configure for high availability before you install Oracle Fail Safe (to ensure that Oracle Fail Safe installs the proper resource DLLs to manage the Oracle software resources on the system).

• Place the Oracle Data Guard configuration files and all database data files, control files, log files, and archive log files on cluster disks so that they can fail over from one cluster node to another, when necessary. Allocate storage resources carefully so that independent workloads can be configured into separate groups.

• If you are planning to use Oracle Fail Safe to configure primary and standby databases on MSCS clusters (as in the example configuration shown in Figure 1), create the initial primary/standby configuration using Data Guard Manager before you configure the databases with Oracle Fail Safe.

The installed locations of the Oracle software components required to create the example configuration are shown in Figure 4. Refer to your Oracle product documentation to ensure that you have allocated sufficient disk and memory resources on each cluster node for the products you plan to install. Note that each node must have sufficient resources to handle not only its own normal workload but also any additional workloads that potentially could fail over from other nodes.

Figure 4: Location of Software Components for Example Configuration

4.1 Install and Configure Oracle9i Database Enterprise Edition

Use the Oracle Universal Installer to install Oracle9i Database Enterprise Edition into an Oracle home directory on a private disk on each cluster node (four installations altogether for the example configuration). Note that Oracle Data Guard is installed automatically when you install Oracle9i Database Enterprise Edition. Use the same Oracle home name on each cluster node (dbs_home in the example). Because the database data, log, and control files must be located on cluster disks and not the private disks that contain the Oracle home installation directories, select the ���(��� ��� installation option on the Database Configuration installer window to perform each installation without creating a database.

4.1.1 Create Initial Network Configuration Files

After the database installations are complete, use the Oracle Net Configuration Assistant (NetCA) to create a default Oracle listener service and the initial network configuration files on each primary

Page 13: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 11

and standby cluster node. Optionally, you can use the following command line to execute NetCA in silent mode and create the required listener.ora, sqlnet.ora, and tnsnames.ora configuration files:

"Program Files\Oracle\jre\1.1.8\bin\jre.exe" -Duser.dir=C:\Oracle\network\jlib -classpath ";C:\Program

Files\Oracle\jre\1.1.8\lib\rt.jar;C:\Oracle\jlib\ewt3.jar;C:\Oracle\jlib\ewtcompat-

3_3_15.jar;C:\Oracle\network\jlib\NetCA.jar;C:\Oracle\network\jlib\netcam.jar;C:\Oracle\jlib\netcfg.jar;C:\

Oracle\jlib\help3.jar;C:\Oracle\jlib\oracle_ice5.jar;C:\Oracle\jlib\share.jar;C:\Oracle\jlib\swingall-

1_1_1.jar;C:\Program Files\Oracle\jre\1.1.8\lib\i18n.jar;C:\Oracle\jlib\srvm.jar;C:\Oracle\network\tools"

oracle.net.ca.NetCA /orahome C:\Oracle /orahnam OUIHome /instype typical /inscomp

client,oraclenet,javavm,server,ano /insprtcl tcp,nmp,tcps /cfg local /authadp NO_VALUE /nodeinfo NO_VALUE

/responseFile C:\Oracle\network\install\netca_typ.rsp

Replace each occurrence of C:\oracle with your ORACLE_HOME directory path, verify that the path for jre.exe is correct, and execute the command from the MS-DOS command prompt on each cluster node. To ensure that the network configuration files are created consistently and correctly on each cluster node, delete or rename any previously existing listener.ora, sqlnet.ora, or tnsnames.ora files in the Oracle home network\admin directory before you execute this command.

4.1.2 Create Primary Database

After completing the initial network configuration, use Oracle Database Configuration Assistant (DBCA) to create the primary database on the primary cluster using the selected cluster disks (Disk H: in the example). If the cluster disks selected for the database files are not already owned by the node where you are running DBCA, use Microsoft Cluster Administrator to move these disks to that node. Table 2 lists the input values used for the initial DBCA windows in the example:

�)�� *���� +���� ,��# �� "-����� �����������

Template Name General Purpose

Global Database Name testdb1.us.oracle.com

SID testdb1

Connection Option Dedicated Server Mode

Table 2: Initial DBCA Input Values Used for Example Primary Database

At the DBCA Initialization Parameters screen shown in Figure 5, make any necessary changes to ensure that the primary database is configured with ARCHIVELOG mode enabled and that all database data, log, and control files are correctly located on cluster disks. The following list of changes is typical for most databases:

1. Click the ������� tab, then click ���� .����� +�������� and create variables to specify the cluster disk locations to be used for the database files. The example uses a variable DB_FILES with value H:\oracle.

2. Under the ������� tab:

• Enable ������� �� �#�/

• Enable �������� ��������/

Page 14: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 12

• Replace ORACLE_BASE with the appropriate cluster disk location variable (DB_FILES for the example configuration) for all file locations.

3. Under the ���� .������ tab:

• Ensure that the ������ ������ ���������� ���� �������� box is checked.

• Replace ORACLE_BASE and ORACLE_HOME with the appropriate cluster disk location variable (DB_FILES in the example) for all file locations except for the initialization parameter file location, as shown in Figure 5.

4. Click ��� *�������0���� '���������, scroll through the parameter list, and check that the values assigned to all relevant database parameters use cluster disks. For the example, the value assigned to the control_files parameter was changed in three places to use DB_FILES instead of ORACLE_BASE.

5. Click 1�-� to continue.

Figure 5: DBCA Initialization Parameters Screen

In the Database Storage window, review all file locations and replace any ORACLE_HOME or ORACLE_BASE references with the appropriate cluster disk location as needed. For the example, all file locations were updated (control files, data files, and redo log files). After these changes are made, click 1�-� to continue.

Optionally, in the Creation Options window (in addition to creating the database), you can save the modified database configuration as a template. This step is recommended if you plan to create other similar disaster-tolerant high availability configurations in the future.

When all changes are complete, click ������ and review the Summary report to verify that all database files will be created on cluster disks. Figure 6 shows a portion of the Summary report for

Page 15: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 13

the primary database used in the example configuration. Note that all displayed file locations use the previously created DB_FILES file location variable instead of the default ORACLE_HOME or ORACLE_BASE file location variables.

Figure 6: DBCA Summary Report for Example Primary Database

At the end of the database creation process, you will be prompted to enter new passwords for the default SYS and SYSTEM database user accounts.

Page 16: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 14

4.1.3 Start Instance and Listener Services and Connect to Primary Database

From the Services Control Panel, start the Oracle primary database instance and listener services (if they are not already started). Depending on how your system is configured, you also may need to create entries for the database in the listener.ora and tnsnames.ora network configuration files before you can connect to the primary database. Update these files if necessary and then be sure to stop and restart the Oracle Listener and Oracle Intelligent Agent processes on any system where you changed these files. When the network environment is configured correctly, use SQL*Plus to connect to the database and verify that you are able to query the database successfully (for example, by executing SELECT * FROM ALL_USERS from the SYSTEM account).

4.1.4 Specify Location for Oracle Data Guard Configuration Files

Because the Oracle Data Guard configuration files must be accessible by whichever cluster node hosts the primary database virtual server, these files must be located on shared-nothing cluster disks (usually the same disks used for the database data, control, and log files). To specify the location to be used when these files are created, use SQL*Plus to connect to the primary database through the SYS database user account (as SYSDBA) and, at the SQL prompt, enter the following commands:

SQL> alter system set dg_broker_config_file1 = ‘<path>\dr1<instance_name>.dat’ scope=both;

SQL> alter system set dg_broker_config_file2 = ‘<path>\dr2<instance_name>.dat’ scope=both;

where <path> is the shared-nothing cluster disk location where you want these files to be created (H:\oracle\database in the example) and <instance_name> is the SID for the primary database (testdb1 in the example). If the cluster disk directories specified in <path> do not already exist, be sure to create them. The scope=both qualifier ensures that this change is written both to memory and to the database system parameter file (spfile) on disk.

4.1.5 Specify Standby Archive Log File Destination

The standby_archive_dest parameter for the primary database is used only if the database is later reconfigured as standby database. By default, it is set to %ORACLE_HOME%\RDBMS. However, because the standby archive log files must be accessible by whichever cluster node hosts the primary database virtual server, these files must be located on shared-nothing cluster disks (usually the same disks used for the database data, control, and log files). To specify the required cluster disk location for the standby archive log files, use SQL*Plus to connect to the primary database through the SYS database user account (as SYSDBA) and, at the SQL prompt, enter the following:

SQL> alter system set standby_archive_dest = ‘<path>’ scope=both;

where <path> is the shared-nothing cluster disk directory where you want these files to be created (H:\oracle\oradata\testdb1\standby_archive in the example). If the cluster disk directories specified in <path> do not already exist, be sure to create them. The scope=both qualifier ensures that this change is written both to memory and to the database system parameter file (spfile) on disk.

4.2 Create Initial Primary/Standby Database Configuration

Data Guard Manager automates the process of configuring primary and standby databases into a single easily managed disaster-tolerant solution. Because Data Guard Manager is integrated with

Page 17: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 15

Oracle Enterprise Manager and relies on the Oracle Intelligent Agent to perform database discovery, it is first necessary to configure Oracle Enterprise Manager and ensure that the Oracle Intelligent Agent is running on each cluster node. The steps to configure Oracle Enterprise Manager and to use Data Guard Manager to discover and configure the initial primary/standby configuration are described later in this section.

The example disaster-tolerant high availability solution makes use of databases hosted by node-independent virtual servers. To ensure that Oracle Enterprise Manager and Data Guard Manager correctly configure and discover the disaster-tolerant high availability configuration, the Oracle Data Guard Create Configuration Wizard is invoked twice, as summarized in the following sequence of events:

1) Use Oracle Enterprise Manager to discover the primary database using the default (node-specific) Oracle Intelligent Agent running on the primary cluster node that hosts the primary database.

2) Invoke the Oracle Data Guard Create Configuration Wizard to create the standby database and to create an initial Oracle Data Guard configuration with the primary database hosted by one of the primary cluster nodes and the standby database hosted by one of the standby cluster nodes.

3) Verify that this initial primary/standby configuration is working correctly and resolve any configuration issues.

4) Remove the initial Oracle Data Guard configuration information from the Data Guard Manager tree view.

5) Delete the primary and standby cluster nodes and associated resources from the Oracle Enterprise Manager tree view, then stop and disable the default Oracle Intelligent Agent on each node to ensure that there will not be any resource discovery conflicts later.

6) Use Oracle Fail Safe Manager to create a virtual server on the primary cluster for the primary database and a virtual server on the standby cluster for the standby database.

7) Add an Oracle Intelligent Agent to each virtual server so that Oracle Enterprise Manager and Data Guard Manager can discover the resources hosted by each virtual server. (Note that Oracle Fail Safe only allows an Intelligent Agent resource to be added to a virtual server group that already contains a database resource.)

8) Discover each virtual server and the highly available database it hosts using Oracle Enterprise Manager.

9) Use the Oracle Data Guard Create Configuration Wizard to create the final Oracle Data Guard configuration (with each database now hosted by a highly available virtual server).

Steps 1 through 5 in this list are covered in the remainder of this section, while steps 6 through 9 are covered in sections 2/2 and 2/%. Refer to the Oracle Data Guard product documentation listed in section & of this paper if your configuration differs from that used in the example.

Page 18: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 16

4.2.1 Configure Oracle Enterprise Manager

If you have already configured Oracle Enterprise Manager and the Oracle Management Server on a separate management system, you optionally can start Oracle Enterprise Manager on that system and skip ahead to section 2/3/2. Note that, depending on your environment, your Oracle Enterprise Manager tree view may differ from that shown in this paper.

If you have not already configured Oracle Enterprise Manager on another system, you optionally can configure Oracle Enterprise Manager now using the Enterprise Manager Configuration Assistant (EMCA). For production deployments, you should always put Oracle Management Server and the Oracle Enterprise Manager repository database on a separate system so that you will not lose access to the repository when you take one of the primary or standby sites offline. However, for purposes of illustration, one of the primary cluster nodes (FS-152) will be used in the example configuration.

Oracle Enterprise Manager is installed automatically when you install Oracle9i. From the system where you plan to configure Oracle Enterprise Manager, select:

����� �4 '������ �4 ����� � 5�����6���4

where <oracle_home> is the name of the previously created Oracle9i Database Enterprise Edition home (dbs_home for the example configuration). Then, to open the Oracle Enterprise Manager Configuration Assistant, choose:

����������� ��# �������� �� �4 "��������� ������� ����������� ���������

On the second EMCA window (Configuration Operation), select �������� ���� ����� ���������� ������. For the third window, choose ������ � ��( ��������. Choose ������ for the Create New Repository Options on the fourth window. Record the username and password information from the Create Repository Summary window for future use and click ������ to complete the Oracle Enterprise Manager configuration process.

4.2.2 Start the Oracle Management Server

The steps in section 2/3/7 should automatically start Oracle Management Server. However, if necessary, start Oracle Management Server from the command-line prompt by entering the command oemctl start oms. Note that the Oracle Management Server must be able to connect to the Oracle Enterprise Manager repository database in order to start. If you have difficulty starting the Oracle Management Server, verify that the repository database is configured correctly and that the corresponding instance and listener services are started.

4.2.3 Start Oracle Intelligent Agent on All Cluster Nodes

During the Oracle9i Database Enterprise Edition installation process, an Oracle Intelligent Agent process was created for each primary and standby cluster node. Issue the command agentctl status from the MS-DOS command line on each cluster node to determine the status of the Agent on each node. For any node where the Agent is not already started, start the Agent from an MS-DOS command window by issuing the command agentctl start. Note that in general, any time the configuration on a cluster node changes, you will need to stop and restart the Oracle Intelligent Agent on that node to allow the new changes to be discovered by the Agent.

Page 19: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 17

4.2.4 Open the Oracle Enterprise Manager Console

To open Oracle Enterprise Manager, choose:

����� �4 '������ �4 ����� � 5 �����6���4

where <oracle_home> is the name of the previously created Oracle9i Database Enterprise Edition home (dbs_home for the example configuration). Then, to open the Enterprise Manager Console, choose:

"��������� ������� �����

(Optionally, you can also open the Enterprise Manager Console by executing the command oemapp console from the command-line prompt.) When the login dialog opens, choose .��� � ��� ����� ���������� ������ and connect to an existing Oracle Management Server (for example, the one created in section 2/3/7). Do not choose .����� ����#����, because Data Guard Manager will not be available from the Enterprise Manager Console if you select this option. The default Oracle Management Server administrator login account is sysman with an initial default password of oem_temp; if prompted, be sure to change the default password and record the new value.

4.2.5 Discover Primary and Standby Cluster Nodes

Run the Enterprise Manager Discovery Wizard, also referred to as the Discovery Wizard, to discover each node of the primary and standby clusters and to gain access to the databases that you want to configure and administer with Data Guard Manager. To invoke the Discovery Wizard from the Enterprise Manager Console menu bar, choose:

1������� �4 ������� 1#��

Follow the directions in the Discovery Wizard to discover each of the nodes in the primary and standby clusters (FS-151, FS-152, FS-241, and FS-242 in the example configuration). When finished, all discovered nodes and databases are displayed in the Enterprise Manager navigator tree. For the example configuration, Oracle Enterprise Manager discovers and displays the following, as shown in Figure 7:

• On the node where the primary database was created (FS-151 in the example configuration), the wizard discovers the primary database (testdb1.us.oracle.com).

• In addition, if you optionally used EMCA to create a repository database on one of the cluster nodes, the Oracle Enterprise Manager repository database (OEMREP.us.oracle.com) is discovered on the node where it was created (FS-152 for the example configuration).

• On all cluster nodes, the wizard finds the Oracle home where you have installed Oracle9i Database Enterprise Edition.

Page 20: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 18

Figure 7: Oracle Enterprise Manager Tree View After Node Discovery

For other configurations, the Oracle Enterprise Manager tree view may display additional resources.

4.2.6 Set Preferred Credentials on Primary and Standby Nodes

You must set preferred credentials on each of the primary and standby cluster nodes to ensure Data Guard Manager can run remote processes to create the configuration. To set preferred credentials from the Enterprise Manager Console menu bar, select:

����������� �4 '���������� �4 '�������# ���#�������

For each cluster node, specify an account with administrator privileges on that system. Note that the selected account also must be granted logon as a batch job user rights for the system. After setting the preferred credentials, verify that you are able to use Oracle Enterprise Manager to successfully run a small test job on each cluster node (for example, execute a system dir command).

Although setting preferred credentials for databases is not required, you also might want to set preferred credentials (for example, the SYS account) for the primary database (and also later for the standby database when it is created).

4.2.7 Open Data Guard Manager

Once the preceding steps have been completed, you can open Data Guard Manager from the command-line prompt or from the Enterprise Manager Console:

• From the command-line prompt, enter oemapp dataguard.

• From the Oracle Enterprise Manager Console, use either of the following methods:

o Choose �� �4 �������� ����������� �4 ���� $���# �������/

o From the �������� ����������� drawer, move the cursor over the icons and select the ���� $���# ������� icon.

Page 21: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 19

After opening Data Guard Manager, you should see the initial screen view shown in Figure 8:

Figure 8: Data Guard Manager

4.2.8 Create the Initial Oracle Data Guard Configuration

The steps to create the initial Oracle Data Guard configuration are described in this section. To open the Create Configuration Wizard, right-click ����� ���� $���# ������������ in the navigator tree and choose ������ ����������� 8�0��#. Figure 9 shows the initial welcome screen for the Create Configuration Wizard.

Page 22: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 20

Figure 9: Create Configuration Wizard - Welcome

The wizard takes you through the following five steps:

1. Verify the initial Oracle Data Guard configuration requirements.

2. Provide a configuration name.

3. Choose a primary database.

4. Choose how you want to add a standby database:

• Import an existing standby database.

• Create a new physical or logical standby database.

5. Verify the information you supplied to the wizard and make changes, if necessary.

Each of these steps is described in more detail for the example configuration in the remainder of this section. Refer to the Data Guard Manager online help system and to the Oracle9i Data Guard Broker Concepts manual for complete information.

4.2.8.1 Verify the Initial Oracle Data Guard Configuration Requirements

Click ������� on the Create Configuration Wizard welcome page (see Figure 9) and review the checklist of setup requirements and information that is displayed. If necessary, make any additional changes required to set up the Oracle Data Guard environment on the primary and standby clusters. Click 1�-� to continue.

Page 23: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 21

4.2.8.2 Provide a Configuration Name

Enter a unique Oracle identifier for the name of the new Oracle Data Guard configuration. Figure 10 shows the Configuration Name window in which the example configuration name (testdb_config) has been entered. Click 1�-� to continue.

Figure 10: Create Configuration Wizard - Configuration Name

Page 24: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 22

4.2.8.3 Choose a Primary Database

Select the primary database from the list of discovered databases. As shown in Figure 11, the selected primary database for the example is TESTDB1.us.oracle.com. Accept the default primary site name; this site will be deleted and replaced by a new site after Oracle Fail Safe Manager configures the primary and standby databases for failover.

Verify that the cluster disks used by the primary database are owned by the node where the database was created. If necessary, use Microsoft Cluster Administrator to move the disks to this node. Ensure that the database instance is started and that you can connect to the database. Click 1�-� to continue.

Figure 11: Create Configuration Wizard – Choose Primary Database

Page 25: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 23

4.2.8.4 Create a New Physical Standby Database

The wizard allows you to create a new physical or logical standby database or to add an existing standby database. For the example, choose ������ � 1�( '������� ����#�� �������� (as shown in Figure 12) and click 1�-� to continue.

Figure 12: Create Configuration Wizard – Standby Creation Method

Page 26: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 24

At this point, the wizard verifies the primary database configuration and then displays a sequence of screens to collect the information required to create the new physical standby database. In the first of these screens, shown in Figure 13, specify the standby cluster node that currently owns the cluster disks where the standby database files will be located (node FS-241 for the example configuration) and accept the default site name. If necessary, use Microsoft Cluster Administrator to move the cluster disks to the selected node of the standby cluster.

Figure 13: Create Configuration Wizard – Standby Oracle Home

Page 27: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 25

In the next screen, specify the directory location where the standby database data files should be copied. To allow the database to be configured with Oracle Fail Safe, all files must be located on MSCS shared-nothing cluster disks. For the example, as shown in Figure 14, all database data files, log files, archive log files, and control files will be copied to the cluster disk directory H:\oracle\standby\. After entering the directory location, click 1�-�.

Figure 14: Create Configuration Wizard - Data file Copy Location

Page 28: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 26

Click 9�� when prompted to create the directory, as shown in Figure 15, and review the standby database options displayed in the next window, as shown in Figure 16. Click +��(:"#�� *�������0���� ���� and ensure that the two Data Guard Manager configuration file locations (specified using the dg_broker_config_file1 and dg_broker_config_file2 parameters) use the correct cluster disk directories (refer to section 2/7/2 for information on the corresponding location of these files for the primary database). For the example configuration, the paths for the Data Guard Manager configuration files were edited to use the directory H:\oracle\standby\. In most cases, you can accept the default values for all other parameters and information shown in the options window. Click 1�-� to continue.

Figure 15: Create Directory Dialog

Figure 16: Create Configuration Wizard - Standby Database Options

Page 29: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 27

4.2.8.5 Verify Wizard Information

Review the Create Configuration Wizard Summary window (shown in Figure 17) and verify that the information displayed for the primary and standby sites is correct. If you find an error, click )��; to move backward through the wizard screens and make the needed changes. When the information is correct, click ������. The wizard displays a report similar to that shown in Figure 18 that records progress while the configuration is created.

Figure 17: Create Configuration Wizard - Summary

Page 30: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 28

Figure 18: Create Configuration Wizard Progress Report

4.2.9 Validate Initial Primary/Standby Configuration

After closing the Create Configuration Wizard progress report, use Data Guard Manager to connect to the newly created configuration. You will be prompted to enter the database username and password required to connect to the configuration, as shown in Figure 19. Once connected to the configuration, expand the Data Guard Manager tree view and verify that the configuration properties are similar to those shown in Figure 20.

Page 31: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 29

Figure 19: Data Guard Manager Configuration Connection Dialog

Figure 20: Initial Configuration Tree View and Property Information

Page 32: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 30

4.2.10 Optionally, Specify a Time Delay for Applying Archived Redo Log Files

By default, a physical standby database automatically applies archived redo logs when they arrive from the primary database. A logical standby database automatically applies SQL statements once they have been transformed from the archived redo logs. But in some cases, you may want to create a time lag between the archiving of a redo log at the primary site and the applying of the redo log at the standby site. A time lag can protect against the application of corrupted or erroneous data from the primary site to the standby site. For example, if the problem is detected on the primary database before the logs have been applied to the standby database, administrators have the option to switchover operations to the unaffected standby database (where the problem has not yet propagated), effectively rolling back the clock to a point in time before the problem occurred.

To specify a time lag for applying redo logs at the standby site:

• Select the standby database in the Data Guard Manager tree view.

• Click on the '�������� tab.

• Locate the ��������� property and enter the desired redo log application delay in minutes.

• Click �����/

Changing the DelayMins property for a standby database updates the DELAY attribute of the corresponding LOG_ARCHIVE_DEST_n initialization parameter for the primary database. For the example configuration, a value of 30 minutes was entered, as shown in Figure 21.

Figure 21: Optionally, Specify a Time Delay for Applying Redo Logs

Page 33: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 31

4.2.11 Delete Initial Primary/Standby Configuration

At this point, Data Guard Manager has successfully created and configured the initial standby configuration. However, when the primary and standby databases are configured with Oracle Fail Safe, client access to these databases will change from using node-specific network addresses to node-independent virtual addresses and the initial Oracle Data Guard configuration and the database information stored in the Oracle Enterprise Manager repository will not be valid. Because of this, it is necessary to remove the initial Oracle Data Guard configuration from the Data Guard Manager tree view and to delete the initially discovered primary and standby cluster nodes and database resources from the Enterprise Manager tree view. Once the Oracle Fail Safe configuration process is completed, these tree views will be updated with the final disaster-tolerant high availability configuration (as described in sections 2/2 and 2/%).

To remove the initial configuration information, right-click the name of the initial Oracle Data Guard configuration (testdb_config in the example) and then click <���� in the pop-up menu. In the resulting window, as shown in Figure 22, ensure that the <���� ����� ���� $���# ����������� '���������� and <���� ��� ����������� �� ����������� options are chosen. This leaves each database in place, but stops transport and application of logs to the standby database.

Figure 22: Remove Oracle Data Guard Configuration Window

After the removal operation completes, exit Data Guard Manager. Then, from the Oracle Enterprise Manager console, delete the tree view entries for the primary and standby cluster nodes. This

Page 34: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 32

removes the nodes and all discovered targets hosted by the nodes from the Oracle Enterprise Manager repository database. (To delete a node from the tree view, right-click the node and then choose ������.) After the nodes are deleted from the Oracle Enterprise Manager tree view, exit the Oracle Enterprise Manager console.

4.2.12 Stop and Disable Default Oracle Intelligent Agents

Finally, to ensure that there will be no resource discovery conflicts in later steps, you must stop and then disable the default Oracle Intelligent Agent service (Oracledbs_homeAgent in the example) on each cluster node. To do this, open the Windows Services Control Window and right-click the Oracle Intelligent Agent service to open the properties page for the service, as shown in Figure 23.

Figure 23: Default Oracle Intelligent Agent Property Page

Note that, currently, the default Intelligent Agent discovers all resources hosted by a given system, regardless of whether they are accessed through a node-specific IP address or through a virtual server IP address. By disabling the default Intelligent Agent, you will not be able to discover any node-specific resources. However, you will be able to discover any highly available virtual servers created using Oracle Fail Safe that contain an Oracle Intelligent Agent resource. Because the highly available Intelligent Agent configured for each virtual server only monitors resources that are accessed through the IP addresses associated with that specific virtual server, there are no resource discovery conflicts between the Intelligent Agents in different virtual server groups (even when multiple virtual servers are hosted by the same physical cluster node). As shown in Figure 58, each

Page 35: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 33

discovered virtual server appears in the Oracle Enterprise Manager tree view as if it were a separate physical node.

4.3 Install Oracle Fail Safe

Oracle Fail Safe ships on a separate CD-ROM in the Oracle database media pack for Windows. Locate the CD-ROM and use it to install Oracle Fail Safe into a new Oracle home directory on a private disk on each node of the primary and standby clusters. Note that Oracle Fail Safe release 3.3 or later is required to create the disaster-tolerant high availability configurations discussed in this paper. The screen views in the sections that follow summarize the Oracle Fail Safe installation process. Refer to the online Oracle Fail Safe Installation Guide and Release Notes included on the Oracle Fail Safe distribution media for complete installation instructions.

4.3.1 Open the Oracle Universal Installer

Insert the Oracle Fail Safe CD-ROM into the CD-ROM drive of one of the primary or standby cluster nodes. From the initial Autorun screen, click *������:��������� '�#���� to open the Oracle Universal Installer Welcome screen shown in Figure 24. If the Autorun screen is not displayed on your system after the Oracle Fail Safe CD-ROM is inserted, you can start the Oracle Universal Installer using the setup.exe program located in the \install\Win32\ directory on the CD-ROM. Click 1�-� to continue.

Figure 24: Oracle Universal Installer: Welcome

Page 36: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 34

4.3.2 Specify File Locations

In the File Locations window, accept the default source location and specify the name and location (on a private disk) for the Oracle home directory where Oracle Fail Safe is to be installed. To ensure that software components can fail over correctly, the Oracle home where Oracle Fail Safe is installed must have the same name on each cluster node; for the example, the Oracle Fail Safe installation home is ofs_home, as shown in Figure 25. After entering the required information, click 1�-� to continue.

Figure 25: Oracle Universal Installer: File Locations

Page 37: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 35

4.3.3 Select Oracle Fail Safe

From the Available Products window, choose ����� ���� ���� (as shown in Figure 26) and then click 1�-� to continue.

Figure 26: Oracle Universal Installer: Available Products

Page 38: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 36

4.3.4 Installation Types

In the Installation Types window shown in Figure 27, choose ������, and then click 1�-� to continue.

Figure 27: Oracle Universal Installer: Installation Type

Page 39: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 37

4.3.5 Reboot Needed After Installation

A Reboot Needed After Installation window, similar to that shown in Figure 28, warns you to reboot the system after the installation is complete. Note that this window is not displayed if you have previously installed Oracle Fail Safe components from this release and the changes to the system path and Oracle resource DLL have been made and detected previously. Click 1�-� to continue.

Figure 28: Oracle Universal Installer: Reboot Needed After Installation

Page 40: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 38

4.3.6 Review Summary Information

Review the installation summary screen, which should be similar to that shown in Figure 29, and then click *������ to begin installing the selected software components. Note that if there is insufficient space to perform the installation, the text below ����� <�=��������� is displayed in red.

Figure 29: Oracle Universal Installer: Summary

Page 41: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 39

The Install window, shown in Figure 30, displays the progress of the installation, including the names of the files that are being installed.

Figure 30: Oracle Universal Installer: Install

Page 42: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 40

4.3.7 Enter Domain User Account for Oracle Services for MSCS

If the installation is successful, the Configuration Tools window and the Oracle Services for MSCS Account/Password dialog box are displayed, as shown in Figure 31.

In the Oracle Services for MSCS Account/Password dialog box, enter the domain, user name, and password of an operating system user account that has Administrator privileges. This is the account that Oracle Services for MSCS will be using. Oracle Services for MSCS runs as a Windows service (called OracleMSCSServices) under a user account that must be a domain user account (not the system account) that has Administrator privileges on all cluster nodes. The account must be the same on all cluster nodes, or you will receive an error message when you attempt to connect to a cluster using Oracle Fail Safe Manager.

Enter the information in the form Domain\Username, as shown in Figure 31, or if you are using Windows 2000, you optionally can enter a user principal name in the form Username@DnsDomainName.

Figure 31: Oracle Services for MSCS Security Information

Page 43: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 41

4.3.8 Confirm Installation and View Release Notes

At the end of the installation, the Oracle Universal Installer displays the window shown in Figure 32. Click *�������# '�#���� to confirm that Oracle Fail Safe has been successfully installed. Click <������ *�������� to view the Oracle Fail Safe Release Notes. Click "-�� to exit the installer.

Figure 32: Oracle Universal Installer: End of Installation

4.3.9 Reboot Cluster Node

If an installer screen instructing you to reboot after the installation is complete was displayed during the installation, reboot the cluster node. A reboot is required only for the initial installation of an Oracle Fail Safe release or if you have installed Oracle Fail Safe into a new Oracle home (on a node with multiple Oracle homes).

4.3.10 Install Oracle Fail Safe on Remaining Nodes

Repeat steps the installation steps described in sections 2/>/7 through 2/>/! on each additional primary and standby cluster node.

4.3.11 Verify the Primary and Standby Clusters

After Oracle Fail Safe has been successfully installed and each node of the primary and standby clusters has been rebooted (as described in section 2/>/!), open Oracle Fail Safe Manager on one of the cluster nodes by choosing the following from the Windows taskbar:

����� �4 '������ �4 ����� � 5 �����6���4 �4 ����� ���� ���� �������

Page 44: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 42

where <Oracle_Home> is the name of the Oracle home where you installed Oracle Fail Safe. Oracle Fail Safe Manager automatically opens the Add Cluster to Tree dialog box, as shown in Figure 33. (If the Add Cluster to Tree dialog box is not open, choose �## ������� � ��� from the ���� menu.) In the Cluster Alias box, enter the alias for the primary cluster (FS-150 in the example) and click ?.

Figure 33: Oracle Fail Safe Manager: Add Cluster To Tree

Oracle Fail Safe Manager displays an icon for the cluster in the tree view. Right-click the cluster icon, choose ������, and enter the Oracle Services for MSCS administrator account and password information in the resulting dialog box. Optionally, you can save this information by checking the ���� �� .��� '�������# ���#������� box, as shown in Figure 34. Click ? to connect to the cluster.

Figure 34: Oracle Fail Safe Manager: Connect to Cluster

Page 45: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 43

The first time you connect to a cluster after you install Oracle Fail Safe, Oracle Fail Safe Manager prompts you to run the Verify Cluster operation to validate the MSCS cluster environment, Oracle Fail Safe installation, and network configuration. The Verify Cluster operation displays its progress in a Verifying Cluster window, as shown in Figure 35. (Later, you can run the Verify Cluster operation at any time by choosing ����������� �4 +����� ������� from the Oracle Fail Safe Manager menu bar. This is especially useful if you later change your cluster configuration.)

Figure 35: Oracle Fail Safe Manager: Verify Cluster

If any problems are identified during the Verify Cluster operation, correct them and repeat the Verify Cluster operation until it is successful. Refer to the Oracle Fail Safe Concepts and Administration Guide for help in troubleshooting any cluster related problems identified in the status report.

Repeat the Add Cluster to Tree and Verify Cluster steps for the standby cluster (FS-240 in the example). After successfully connecting to and verifying each cluster, expand the Oracle Fail Safe Manager tree view to display the standalone resources on each cluster node. The tree view should now appear similar to the view shown in Figure 36 and should contain entries for the previously created primary and standby databases in the appropriate Standalone Resources folders. Note that

Page 46: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 44

Oracle Fail Safe performs its own independent resource discovery that does not make use of either the Oracle Intelligent Agent or Oracle Enterprise Manager.

Figure 36: Oracle Fail Safe Manager: Expanded Tree View

4.4 Configure Database Virtual Servers

At the current point in the configuration process, the primary and standby databases are each hosted by a specific cluster node and are each configured for access through a node-specific IP address. This section describes how to use Oracle Fail Safe Manager to configure each database so that it is hosted by a highly available virtual server and accessed through a node-independent virtual address.

4.4.1 Configure Database Parameter Files

Oracle Fail Safe supports either a single initialization parameter file located on the same cluster disks as the database data, log, and control files or allows you to use a separate initialization parameter file on each cluster node, provided that the path on each node is the same and that you manually ensure that any relevant changes are propagated to all copies of the initialization parameter file. Data Guard Manager expects to find an initialization parameter file or server parameter file in the Oracle home database directory. Because Oracle Data Guard may make changes to the content of the Oracle9i Database server parameter file (for example, during a site switchover), there is a potential for server parameter file synchronization issues if a separate copy of

Page 47: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 45

the server parameter file is maintained on each cluster node. The solution is to place a local initialization parameter file similar to that shown in Figure 37 in the <Oracle_Home>\database directory on each cluster node. It must contain a single line indicating the location of the server parameter file, which, in turn, is located on the cluster disks that contain the database data, control, and log files.

Figure 37: Parameter File for Primary Database

For the primary database, the Oracle Database Creation Assistant should already have placed the spfile<SID>.ora file on the correct cluster disk and created an init<SID>.ora file with the desired contents on the node where the database was created (where <SID> represents the SID for the primary database, testdb1 for the example). Verify that the contents of the init<SID>.ora file are similar to Figure 37 and place a copy of this file in the same <Oracle_home>\database directory location on all nodes of the primary cluster.

Oracle Data Guard created the initialization and server parameter files for the standby database in the <Oracle_home>\database directory on the node where the standby database was created. To configure these files for the standby cluster, perform the following steps:

1. Move the spfile<SID>.ora file (spfiletestdb12.ora for the example) to a directory on the cluster disk used for the database files (H:\oracle\standby\spfiletestdb12.ora for the example).

2. Rename the existing init<SID>.ora file to init<SID>.ora_old (which contains the information Oracle Data Guard used to create the spfile<SID>.ora file).

3. Create a new init<SID>.ora file (similar to that shown in Figure 36) that specifies the new location for the standby database spfile<SID>.ora file.

4. Place a copy of this new init<SID>.ora file in the same <Oracle_home>\database directory location on all nodes of the standby cluster.

After the database parameter files have been configured on the primary and standby clusters, record the locations of the initialization parameter files for the primary and standby databases; this information will be used during the database verification operation described in section 2/2/>.

4.4.2 Create Virtual Servers for Primary and Standby Databases

In the Microsoft Cluster Service environment, a virtual server is a group of resources that contains at least one virtual address (a network name resource and its associated IP address resource). The Oracle Fail Safe Create Group wizard collects the information needed to create an empty group and then optionally allows you to add one virtual address to the group (the Add Resource to Group wizard allows you to add additional virtual addresses after the group is created). As previously noted (during the cluster network configuration and validation steps described in section >/3), the

Page 48: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 46

network name that will be used for the primary database virtual server is FS-153 and the network name that will be used for the standby database virtual server is FS-245.

To create the virtual server for the primary database, click the tree view to select the node that currently hosts the primary database (FS-151 in the example). Then, from the Oracle Fail Safe Manager menu, select $���� �4 ������ to open the Create Group Wizard. In the initial window, enter the group name and, optionally, a description. For the example, as shown in Figure 38, the group is given the same name as the virtual server network name to make identification easier.

Figure 38: Create Group Wizard - Group Name

Page 49: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 47

In the Failback Policies window, accept the default '������ �������; option, as shown in Figure 39.

Figure 39: Create Group Wizard - Failback Policies

In general, for active/passive cluster configurations, preventing failback minimizes the number of failovers associated with unplanned outages. By contrast, for clusters with active/active configurations (each cluster node actively hosts database or application workloads), it usually is best to associate workloads (groups) with preferred nodes and to enable failback so that the overall cluster workload is automatically rebalanced when a previously failed node is restored to service.

After specifying the appropriate failback policy for your configuration, click ������. Review the summary screen, and if the information is correct, click ? to create the group. After the group is created, you are prompted to add a virtual address to the group. Click 9�� to open the Add Resource to Group Wizard Virtual Address window shown in Figure 40.

Page 50: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 48

Figure 40: Add Resource to Group Wizard - Virtual Address

Select the previously created Public network and enter the network name associated with the virtual address in the Host Name field (FS-153 in the example). When you click the IP Address field, Oracle Fail Safe should automatically fill in the value associated with the previously entered network name. Click ������ to continue, and then click ? to accept the information shown in the summary screen that follows.

Repeat this process on the standby cluster to create a virtual server for the standby database (for the example, the standby virtual server is called FS-245 and uses the previously identified network name FS-245). After each group has been created and populated with is associated virtual address resources, expand the Oracle Fail Safe Manager tree view to verify that the contents of the newly created primary and standby virtual server groups are similar to those shown in Figure 41.

Page 51: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 49

Figure 41: Tree View Showing Database Virtual Servers

4.4.3 Execute Verify Standalone Database Command

Before configuring the primary and standby databases, first run the Oracle Fail Safe Manager Verify Standalone Database command on each database. This command verifies that each database and its associated network configuration files are correctly configured for use with Oracle Fail Safe.

To execute this command, select the icon for the database in the tree view and choose ����������� �4 +����� ����#���� �������� from the Oracle Fail Safe Manager menu. Enter the requested information in the dialog box, as shown in Figure 42. Note that although the Service Name and Instance Name values for the primary and standby databases will differ, the Database Name value is the same for both databases (testdb1, for the example). If, as in the example, Use Operating System Authentication is selected, Oracle Fail Safe will automatically make any changes necessary to enable operating system authentication (if it is not already enabled). Unless you have

Page 52: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 50

specific reasons to the contrary, allow Oracle Fail Safe to make any recommended configuration changes when prompted. During the verification operation, Oracle Fail Safe displays a status report similar to that shown in Figure 43. Resolve any reported problems and repeat the verification process until it is successful for each database. Because a separate copy of the initialization parameter file has been copied to each cluster node, the FS-10288 warning message that appears in the status report can be ignored.

Figure 42: Verify Standalone Database Dialog

Figure 43: Verify Standalone Database Status Report

Page 53: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 51

4.4.4 Add Each Database to its Associated Virtual Server

The Oracle Fail Safe Manager Add Resource to Group Wizard automates the process of adding the resources associated with each database to their respective virtual servers. For the example configuration, the wizard is used twice for each virtual server: once to configure the database resource and then once more to add an Oracle Intelligent Agent resource.

4.4.4.1 Primary Database Virtual Server

In the tree view, right-click the primary database and choose �## � $��� (as shown in Figure 44) to open the Add Resource to Group Wizard.

Figure 44: Add Primary Database to Group

Page 54: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 52

In the first window, as shown in Figure 45, choose ����� �������� as the resource type (this will be selected by default) and select the name of the previously created group for the primary database virtual server (���7%> for the example configuration). Click 1�-� to continue.

Figure 45: Add Resource to Group Wizard – Resource Type

Page 55: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 53

In the next window, shown in Figure 46, enter the requested database identity information (the same information previously entered during the Verify Standalone Database operation) and click 1�-� to continue.

Figure 46: Add Resource to Group Wizard – Database Identity

Oracle Fail Safe Manager will display an informational message similar to that shown in Figure 47 to alert you that the specified initialization parameter file is not located on a cluster disk. Click ? to acknowledge the message. Note that the initialization parameter file for the database has been configured previously (in section 2/2/7) to ensure that the database can fail over between nodes.

Figure 47: Oracle Fail Safe Manager – Initialization Parameter File Location

Page 56: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 54

The wizard automatically detects the existence of the database password file on the node where the database was created. Click 9�� to have Oracle Fail Safe create a password file for the database on each cluster node and enter the requested password information for the SYS account as shown in Figure 48. Then click ������ to continue.

Figure 48: Add Resource to Group Wizard – Database Password

Review the summary information displayed by the wizard (as shown in Figure 49), and if all information is correct, click ? to begin adding the database to the group.

Page 57: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 55

Figure 49: Add Resource to Group Wizard – Summary

Before starting the operation, Oracle Fail Safe displays an informational message, shown in Figure 50, to alert the administrator that the database will be taken offline. Click 9�� to acknowledge the message and continue.

Figure 50: Confirm Add Database to Group

During the clusterwide configuration process, Oracle Fail Safe automatically detects all cluster disks used by the database and adds them to the group. Oracle Fail Safe also creates and adds a resource for the listener service and updates the Oracle Net files on all cluster nodes with the new virtual server information. In addition, Oracle Fail Safe creates and registers with the cluster software all appropriate dependencies between resources in the group.

The ongoing status of the clusterwide operation is displayed in a status report that optionally can be saved to disk for future reference or printing, as shown in Figure 51. If any errors are encountered during the configuration process, they are reported in the status report and the configuration operation is rolled back. As before, you can safely ignore the FS-10288 parameter file location warning. After reviewing the status report, click ���� to continue.

Page 58: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 56

Figure 51: Add Database to Group Status Report

To allow the virtual server to be discovered by Oracle Enterprise Manager and allow Data Guard Manager to create and manage the database, an Oracle Intelligent Agent must be added to the group. To perform this task, right click the database virtual server group in the Oracle Fail Safe Manager tree view and choose �## <������ � $���, as shown in Figure 52.

Page 59: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 57

Figure 52: Open the Add Resource to Group Wizard

From the initial window, choose ����� *���������� ����� as the resource type and the database virtual server group as the group (the group should be selected by default) as shown in Figure 53. Click 1�-� to continue, and click ? to acknowledge the message informing you that the Oracle Intelligent Agent must be installed on each cluster node to add an agent to a group.

Page 60: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 58

Figure 53: Add Resource to Group Wizard – Resource Type

In the next window, as shown in Figure 54, specify a cluster disk that the agent can use to store information. For the example, choose �@ (the disk used for the database files). The agent is installed as part of Oracle9i Database Enterprise Edition in the same home as the database (for the example, choose #��6��� as the Oracle home for the agent). After entering the information, click ������ to display the summary screen as shown in Figure 55. If the information is correct, click ? to continue.

Page 61: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 59

Figure 54: Add Resource to Group Wizard – Intelligent Agent Information

Figure 55: Add Resource to Group Wizard - Summary

Page 62: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 60

The ongoing status of the clusterwide operation is displayed in a status report that optionally can be saved to disk for future reference or printing, as shown in Figure 56. If any errors are encountered during the configuration process, they are reported in the status report and the configuration operation is rolled back. After reviewing the status report, click ���� to continue.

Figure 56: Add Intelligent Agent to Group Status Report

Page 63: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 61

4.4.4.2 Standby Database Virtual Server

Repeat the same process to configure the standby database. After the standby database configuration process is complete, expand the Oracle Fail Safe Manager tree view to verify that the contents of the primary and standby virtual server groups (in the example, FS-153 and FS-245, respectively) are similar to that shown in Figure 57. Note that the group for each database contains all the cluster resources associated with that database. Virtual server groups used for production deployments may contain additional resources (for example, additional disks associated with the database or Oracle Intelligent Agent or additional IP address and network name resources if multiple virtual addresses are configured for use with the database).

Figure 57: Tree View Showing Primary and Standby Virtual Servers

Page 64: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 62

4.5 Create Final Highly Available Primary/Standby Configuration

The final configuration task is to reestablish the Data Guard configuration for the now highly available primary and standby databases.

4.5.1 Discover Virtual Servers

To begin the process, start the Oracle Management Server and Oracle Enterprise Manager (as described in sections 2/3/3 and 2/3/2). Then, from the Oracle Enterprise Manager Console menu bar, choose 1������� �4 ������� 1#�� to open the Oracle Enterprise Manager Discovery Wizard. Follow the directions in the wizard to discover the primary and standby virtual servers (FS-153 and FS-245, for the example configuration). After discovery completes, the Oracle Enterprise Manager tree view should be similar to that shown in Figure 58.

Note that the way the primary and standby databases and virtual servers are named in the tree view may differ due to slight differences in the way the various wizards updated the database and network configuration information on each cluster (the virtual servers shown in Figure 58, for example, are named fs-153 and fs-245.us.oracle.com). This does not affect the primary/standby configuration process. Also, because not all Oracle Intelligent Agent releases are fully cluster aware, you may encounter errors if you attempt to discover both virtual servers and individual cluster nodes. For the example, because the default (node-specific) Oracle Intelligent Agent was disabled (refer to section 2/3/73), it is only possible to discover the primary and standby virtual servers.

Figure 58: Tree View Showing Discovered Virtual Server

Page 65: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 63

4.5.2 Create Highly Available Primary/Standby Configuration

In the same way as described in section 2/3/&, open the Data Guard Manager Create Configuration Wizard. Click 1�-� to proceed past the initial Welcome screen, enter the name you want to use for the configuration (testdb1_config, for the example) as shown in Figure 59, and click 1�-�.

Figure 59: Create Configuration Wizard – Configuration Name

Page 66: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 64

In the Primary Database window, choose the primary database (testdb1.us.oracle.com in the example), specify a site name (fs-153_site in the example), and click 1�-�, as shown in Figure 60.

Figure 60: Create Configuration Wizard – Primary Database

Page 67: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 65

In the Connect to Primary Database window, enter the primary database account information (SYS account in the example) and click 1�-�, as shown in Figure 61.

Figure 61: Create Configuration Wizard – Connect to Primary Database

Page 68: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 66

In the Standby Creation Method window, choose �## �� �-������ ����#�� #�������, as shown in Figure 62, and click 1�-� to continue.

Figure 62: Create Configuration Wizard – Standby Creation Method

Page 69: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 67

In the Add Existing Standby Database window, choose the standby database and site name, as shown in Figure 63 (testdb12 and FS-245_site, respectively, for the example configuration). Click 1�-� to continue.

Figure 63: Create Configuration Wizard – Add Existing Standby Database

Page 70: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 68

In the Connect to Standby Database window, as shown in Figure 64, enter the connection information for the standby database (SYS account for the example) and click 1�-� to continue.

Figure 64: Create Configuration Wizard – Connect to Standby Database

Data Guard Manager should detect that that the standby database was previously configured as a standby site and display the informational message shown in Figure 65. Click 9�� to acknowledge the message and proceed with creating the new Oracle Data Guard configuration.

Figure 65: Data Guard Manager – Informational Message

Page 71: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 69

Review the information in the Summary window (shown in Figure 66) and click ������ to begin creating the configuration.

Figure 66: Create Configuration Wizard – Summary

Data Guard Manager displays the configuration progress in a window similar to that shown in Figure 67. Click ���� to close the window after the configuration processing is complete.

Figure 67: Create Oracle Data Guard Configuration Progress Report

Page 72: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 70

4.5.3 Verify Highly Available Primary/Standby Configuration

Click the new configuration in the Data Guard Manager tree view to open the configuration connection information dialog box shown in Figure 68. Enter the requested account information (SYS for the example) and click ? to connect to the configuration.

Figure 68: Connect to Oracle Data Guard Configuration

After connecting to the cluster, expand the tree view and select the configuration to view the status of each component, as shown in Figure 69. If all components are functioning correctly, you have successfully configured the example disaster-tolerant high availability solution.

Page 73: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 71

Figure 69: Final Highly Available Oracle Data Guard Configuration

Optionally, based on your business requirements, you can perform additional steps to customize the example configuration. For example, if you plan to use maximum protection mode, use the Data Guard Manager Standby Redo Log Assistant to create the required standby redo log files for each database. To facilitate future management and administration operations, you also can update the Oracle Enterprise Manager preferred credentials for the primary and standby databases and virtual servers. For the primary and standby databases, choose ����������� �4 '���������� �4 '�������# ���#������� from the Oracle Enterprise Manager menu, select each database from the list of targets, and enter the account information for the corresponding SYS database user account. Similarly, for the primary and standby virtual servers, specify as preferred credentials a domain account with administrator privileges on each physical cluster node that potentially can host the virtual server (for example, specify the account used during Oracle Fail Safe installation in section 2/>/&).

5 OTHER CONFIGURATIONS

As noted in the introduction, Oracle Fail Safe and Oracle Data Guard can be combined in multiple ways to provide a range of disaster-tolerant high availability solutions. This section compares the example configuration described in this paper with three alternative disaster-tolerant high availability configurations. When designing any disaster-tolerant high availability solution, it is important to understand the trade-offs among the various possible configuration options, particularly with respect to the risks of data loss or interruption of service. Refer to the Oracle Fail Safe and Oracle Data Guard documentation listed in section &/7 for complete information.

Page 74: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 72

5.1 Two Active/Passive Clusters

This is the configuration described earlier in the paper; it adds additional availability to a typical Oracle Data Guard primary/standby configuration by replacing each standalone system with an active/passive cluster and using Oracle Fail Safe to configure the primary and standby databases so that they can fail over between cluster nodes, as shown in Figure 70.

Figure 70: Two Active/Passive Clusters

Page 75: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 73

5.1.1 Benefits

• The geographic separation between clusters enhances disaster tolerance

• Most planned and unplanned outages are handled efficiently through instance failover from one cluster node to another

• The supported rolling upgrade scenarios of hardware and some software (described in section A/3) require only a single MSCS failover between cluster nodes (two MSCS failovers are required for active/active clusters)

• Oracle Data Guard site failover or switchover to a standby location required only if all primary cluster nodes are incapacitated

5.1.2 Trade-offs

• Distance between nodes makes asynchronous redo shipping best solution, but introduces a risks of data loss and data divergence between the primary and standby databases

• Passive cluster nodes (nodes B and D in Figure 70, for example) typically perform no useful work during normal operations

5.2 Single Active/Active Cluster

Similarly, as shown in Figure 71, for the price of an additional disk array (all required software is already licensed on all cluster nodes), you can add a basic level of disaster tolerance to a typical active/passive Oracle Fail Safe configuration by creating a second database and using Oracle Fail Safe and Oracle Data Guard to create an active/active cluster configuration with one node hosting the primary database virtual server and the other node hosting the secondary database virtual server. Because both databases are located on the same cluster (but on different disk arrays to ensure data protection), synchronous redo shipping can be used with minimal performance impact to keep both copies of the data identical with no risk of data loss if the primary database fails.

Figure 71: Single Active/Active Cluster

Page 76: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 74

5.2.1 Benefits

• This configuration provides an inexpensive way to enhance an Oracle Fail Safe deployment to protect against media failure and data corruption (just add a second disk array)

• There is no risk of data loss or data divergence between the primary and standby databases (when configured using the Oracle Data Guard maximum availability mode)

• An Oracle Data Guard site failover or switchover to a standby location is required only if all primary cluster nodes are incapacitated

5.2.2 Trade-offs

• Primary database shuts down when network access to the standby database is interrupted

• Instance failover times from one cluster node to another can be slower than for active/passive clusters because not all resources on the failover node are available to Microsoft Cluster Service to process the failover

• Because there is no time delay for the application of redo logs to the standby databases, there is no protection from human error or other sources of data corruption

• Supported rolling upgrade scenarios of hardware and some software (described in section A/3) require two MSCS failovers between cluster nodes (only one MSCS failover is required for active/passive clusters)

• Disaster tolerance is limited to protection from media failure (two copies of the data) and to a basic level of protection from local area disasters (like floods or fires) largely based on the degree of geographic separation between the cluster nodes and disk arrays. Because the maximum separation between components in an MSCS fibre channel cluster is currently on the order of 7-10 kilometers, this configuration does not protect against wide area disasters (like hurricanes or earthquakes).

5.3 Two Active/Active Clusters

In Figure 72, a combination of synchronous and asynchronous redo shipping is used with multiple standby databases on local and remote active/active clusters. The standby database located on the same cluster as the primary database can be kept current with the primary database through synchronous redo shipping, while each of the standby databases on the remote cluster can be updated at different time delays through asynchronous redo shipping. In addition, the standby databases can be used to offload reporting and backup operations from the primary database so that more resources are available to support end users. This is a more complex solution, but provides the best protection from data loss, data corruption, and site disasters from among the solutions presented in this section.

Page 77: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 75

Figure 72: Two Active/Active Clusters

5.3.1 Benefits

• Efficiently offloads reporting and backup operations from the primary database. For example, Standby Database 1 could be configured as a logical standby database available at all times for read-only reporting and Standby Databases 2 and 3 could be configured as physical standby databases also available for periodic database backups and occasional additional reporting

Page 78: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 76

• Makes full use of all cluster nodes and maintains multiple copies of data at different locations for enhanced disaster protection

• Combination of synchronous and asynchronous redo transport ensures that one copy of the data (Standby Database 1) is always fully synchronized with the primary database, while the remaining remote standby databases (Standby Database 2 and Standby Database 3) are maintained at different time delays to protect against data corruptions

• Oracle Data Guard site failover or switchover to a standby location is required only if all primary cluster nodes are incapacitated

5.3.2 Trade-offs

• Is more complex to configure (four active databases)

• There is risk of data loss or divergence if both nodes of the primary cluster fail

• Instance failover times from one cluster node to another can be slower than for active/passive clusters because not all resources on the failover node are available to Microsoft Cluster Service to process the failover

• Supported rolling upgrade scenarios of hardware and some software (described in section A/3) require two MSCS failovers between cluster nodes (only one MSCS failover is required for active/passive clusters)

5.4 Multiple Primary Locations and Single Standby Location

Figure 73 shows a configuration in which Oracle Data Guard, Oracle Fail Safe, and Real Application Clusters provide complementary features that together help you to implement cost-effective high availability, disaster protection, and scalability. Each primary location uses a two-node active/active MSCS cluster configured with Oracle Fail Safe. Each node is configured as the preferred node for a primary database virtual server. For each primary cluster, if a cluster node fails, the surviving node will host both of the virtual servers for the primary databases configured on that cluster.

During normal operations, all MSCS cluster nodes actively service clients. To amortize the hardware and management costs associated with multiple standby databases, a single shared Oracle Real Application Clusters standby location hosts all the standby databases. Oracle Real Application Clusters allows the standby cluster to scale easily (by adding more nodes and disks) if additional standby databases are later added to the standby location. The single standby location also can be used to consolidate corporate reporting and database backup operations. The number and distribution across the cluster nodes of the Oracle Real Application Clusters database instances associated with each standby database can be adjusted based on availability requirements and on the reporting and backup workloads associated with the database (for example, in Figure 73, Instances 1a and 1b are configured for Standby Database 1 and Instances 4a, 4b, and 4c are configured for Standby Database 4, which has a larger reporting requirement).

Page 79: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 77

Figure 73: Multiple Primary Locations and Single Standby Location

5.4.1 Benefits

• All nodes for each primary cluster perform useful work

• A single scalable standby location simplifies disaster recovery planning and also consolidates reporting and database backup operations

• The return on hardware and software investment is optimized

• Oracle Data Guard site failover or switchover to a standby location is required only if all primary cluster nodes are incapacitated

5.4.2 Trade-offs

• Adds an additional level of complexity (configuring an Oracle Real Application Clusters standby location currently requires manual Oracle Data Guard configuration steps and also requires use of cluster hardware certified for use with Oracle Real Application Clusters)

• Instance failover times from one primary cluster node to another can be slower than for active/passive clusters because not all resources on the failover node are available to Microsoft Cluster Service to process the failover

Page 80: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 78

• Supported rolling upgrade scenarios of hardware and some software (described in section A/3) require two MSCS failovers between cluster nodes (only one MSCS failover required for active/passive clusters)

6 MAINTENANCE AND ADMINISTRATION EXAMPLES

Features from both Oracle Fail Safe and Oracle Data Guard can affect the way some maintenance and administrative operations are performed. For example, because Microsoft Cluster Service monitors both the primary and standby databases for high availability, these databases are automatically restarted if they are stopped using a normal database shutdown command. In general, any time you plan to use Data Guard Manager or any other administrative tool to perform operations that could affect access to a primary or standby database or for which you want to disable the possibilities of an automatic restart or failover, you should first use Oracle Fail Safe Manager to disable Is Alive polling temporarily for the database or to take the database offline (which also stops Is Alive polling). Similarly, any time you plan to perform operations with Oracle Fail Safe Manager that could affect access to a primary or standby database, you should first use Data Guard Manager to temporarily disable Oracle Data Guard monitoring for that database. This includes not only operations such as cold database backups, but also administrative operations that need to be performed while users continue to access the database or any operations that could affect query response times during the periodic Is Alive polling of the database by MSCS.

The topics in this section provide step-by-step instructions for five typical maintenance and administrative operations:

• Performing an Oracle Fail Safe (MSCS) failover

• Changing the database SYS account password

• Performing rolling upgrades

• Performing an Oracle Data Guard site failover or switchover

• Performing database backups

To minimize the risk of unanticipated downtime, Oracle corporation recommends that you first rehearse planned administrative or maintenance operations on identically configured test systems before performing the operations on business-critical production systems.

6.1 Performing an Oracle Fail Safe (MSCS) Failover

An Oracle Fail Safe (MSCS) failover between cluster nodes can be unplanned (for example, as a result of an unexpected component failure) or planned (for example, while performing a rolling upgrade of hardware or software or to rebalance workloads across the cluster nodes). Unplanned failovers are handled automatically by MSCS (based on the user-specified failover policy for each group). Planned failovers are initiated manually by the administrator from within Oracle Fail Safe Manager by selecting a group in the tree view and choosing $���� �4 ��� � � ��������� 1#�.

Because clients access databases and other resources configured with Oracle Fail Safe through a fixed (node-independent) virtual server address, failovers usually appear to clients as a brief interruption in service that is in many ways like an instant node reboot. Unless Oracle9i Database features such as transparent application failover (TAF) are in use, any uncommitted work is lost

Page 81: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 79

(rolled back) and the client needs to reconnect to the database after the instance restarts on the new node. With TAF, reconnection and resumption of interrupted SELECT statement execution is automatic. For more information about using TAF with databases configured with Oracle Fail Safe, refer to the white paper Cluster-Aware ODBC and OCI Client Applications for Oracle Fail Safe Solutions (available through the Oracle Technology Network at http://otn.oracle.com/tech/windows/failsafe/).

From a management perspective, Data Guard Manager is currently not “cluster-aware”, and several issues related to MSCS failovers require special attention. Depending on the specific circumstances associated with the failover (planned, unplanned, and so forth), one or more of the following may apply:

• Any time you plan to perform operations with Oracle Fail Safe Manager that could affect access to a primary or standby database (such as a planned failover), you should first use Data Guard Manager to temporarily disable Oracle Data Guard monitoring for the database.

• Any time you complete operations with Oracle Fail Safe Manager and have previously used Data Guard Manager to disable Oracle Data Guard monitoring for a database, you should use Data Guard Manager to reenable Oracle Data Guard monitoring for the database.

• Data Guard Manager uniquely identifies each database in a configuration by the combination of the database name and the name of the physical node hosting the database at the time it was added to the configuration. Data Guard Manager stores this information in its configuration files and does not update the information after the configuration is created. This means that, for databases also configured with Oracle Fail Safe, Data Guard Manager can only access a database when the same physical node that hosted the virtual server when it was initially added to the configuration hosts the virtual server for that database. Note that this does not affect the day-to-day operation of the primary/standby configuration (redo shipping and redo application are all based on the virtual server), but that it does affect how Data Guard Manager can be used to view and manage the configuration. Following is a partial list of known limitations:

o To connect to a primary or standby site using Data Guard Manager after a virtual server failover has occurred, currently you must choose from between the following two options:

�� Use the Oracle Fail Safe Manager Move to a Different Node command to move the virtual server group for the database to the physical node associated with that database in the Connect Through field of the Data Guard Configuration Connect Information screen (for example, the Connect Through field shown in Figure 68 associates the primary database testdb1 with physical node FS-151). Reconnect to the configuration using Data Guard Manager and review the status of the primary and standby sites and databases. Reenable any sites or databases that Data Guard Manager disabled as a result of the failover. If the physical node associated with the database in the Data Guard Manager configuration files is not available to host the virtual server, then the next option must be used.

Page 82: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 80

�� Delete the existing configuration (following the steps described in section 2/3/77) and then re-create the configuration (following the steps described in section 2/%/3) so that the Data Guard Manager configuration files are updated with the current physical host information for the primary and standby databases.

o Oracle Enterprise Manager cannot discover any Data Guard configurations that are also configured using Oracle Fail Safe. Because of this, to manage the configuration with Data Guard Manager, always connect to the same Oracle Enterprise Manager repository that was used when the Data Guard configuration was initially created.

o Data Guard Manager operations that assume that Oracle Enterprise Manager has discovered the Data Guard configuration or the physical cluster nodes are not fully functional. For example,

�� It is not possible to submit event tests from Data Guard Manager.

�� It is not possible to view or monitor Data Guard log files from within Data Guard Manager.

�� Some steps in the Data Guard Manager Verify configuration operation will not succeed.

6.2 Changing the SYS Database Account Password

If you did not enable operating system authentication for the primary and standby databases (refer to section 2/2/>) when configuring these databases, then Oracle Fail Safe uses the SYS database account password information you provided when each database was added to its respective virtual server (refer to section 2/2/2) to connect to the database during management operations and for Is Alive polling. You may also have enabled both operating system authentication and the SYS database account (for remote client access) as in the example configuration. Depending on your configuration, SYS database account password information may be maintained in multiple places, including:

• The database password file on each cluster node

• Oracle Enterprise Manager preferred credentials

• Oracle Fail Safe database property information

If you use the SYS database user account and change the SYS account password for any database configured with Oracle Fail Safe, it is critical to ensure that this change is synchronized across all places noted in the preceding list to ensure uninterrupted operation. Note specifically that database password files are stored on private disks and that changes made to the password file on one cluster node are not automatically applied to the corresponding file on the other cluster nodes.

Note also that the database password synchronization process for primary/standby configurations described in this section differs from the standard database password synchronization process because the SYS database user password cannot be modified while a physical standby database is in managed recovery or read-only mode. For more information on changing the SYS database account

Page 83: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 81

password and on synchronizing password files on multiple cluster nodes, refer to the Oracle Fail Safe Concepts and Administration Guide.

6.2.1 Update Primary SYS Database User Account Password

Locate and right-click the primary database in the Data Guard Manager configuration tree view, as shown in Figure 74. Then, from the pop-up menu, choose �������. This will temporarily disable Oracle Data Guard monitoring of the database and prevent unnecessary alerts when Oracle Fail Safe Manager is used to fail over the database virtual server in the steps that follow. Choose ���� �4 "-�� to close Data Guard Manager.

Figure 74: Data Guard Manager: Disable Primary Database

Page 84: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 82

Using Oracle Fail Safe Manager, connect to the primary cluster, and as shown in Figure 75, choose <������� �4 ,�#��� �������� '���(�# to open the Update Database Password Wizard.

Figure 75: Open Oracle Fail Safe Manager Update Database Password Wizard

In the first window, select the primary database, as shown in Figure 76. Click 1�-� to continue.

Figure 76: Update Database Password - Choose Databases

Page 85: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 83

In the second window, enter the current password for the SYS database user account and then enter the new password for the SYS account twice, as shown in Figure 77. Click ������ to continue.

Figure 77: Update Database Password - Enter Password Information

The wizard displays a summary screen similar to that shown in Figure 78. Review the summary screen and, if the information is correct, click ? to continue.

Figure 78: Update Database Password – Summary

The wizard displays a status window during the update process similar to that shown in Figure 79. Click ? to close the Finished Updating Passwords window and ���� to close the status window.

Page 86: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 84

Figure 79: Update Database Password - Status

At this point, the SYS user password for the primary database has been updated in the Oracle Fail Safe metadata and in the password file on the cluster node that currently hosts the primary database virtual server. To avoid unnecessary downtime, Oracle Fail Safe normally defers updating the password file on the other cluster nodes. However, because it is possible that an Oracle Data Guard site failover or switchover may occur (which would place the database in a standby recovery or read-only role) before the database virtual server is hosted by another node, it is necessary to ensure that all password files for the primary database are updated. To update the password files on the other cluster nodes, perform the following steps:

1. Right-click the database virtual server in the Oracle Fail Safe Manager tree view and choose ��� � � ��������� 1#�, as shown in Figure 80.

2. Click 9�� when prompted to confirm the Move Group operation, as shown in Figure 81.

3. After the Move Group operation completes, click ? to close the Clusterwide Operation Status message.

4. Review the status report to confirm that the primary virtual server group and the resources it contains are online on the other cluster node, as shown in Figure 82.

5. Click ���� to close the status report.

6. If necessary, repeat these steps for any additional nodes in the cluster.

7. After the password file has been updated on all cluster nodes, use Oracle Fail Safe Manager to move the primary virtual server group back to the initial cluster node so that the physical node hosting the group matches the node expected by Data Guard Manager (refer to the section A/7 for additional information about this requirement).

8. Optionally, to verify that Oracle Fail Safe updated the database password files, confirm that the primary database password file (<Oracle_Home>\database\PWDtestdb1.ora in the example) was modified recently on each node.

Page 87: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 85

Figure 80: Oracle Fail Safe Manager - Move to a Different Node

Figure 81: Confirm Move Group

Page 88: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 86

Figure 82: Move Group Status Report

To complete the password configuration process, perform the following steps:

1. From the Oracle Enterprise Manager main menu, choose ����������� �4 '���������� �4 '�������# ���#������� and update the preferred credentials for the primary database with the new SYS database user account password information (as SYSDBA). Click ? to apply this change.

2. From the Oracle Enterprise Manager main menu, choose �� �4 �������� ����������� �4 ���� $���# ������� to open Data Guard Manager.

3. Select the configuration from the Data Guard Manager tree view and connect to the configuration.

4. Expand the primary site in the Data Guard Manager tree view, right-click the primary database, and choose "�����, as shown in Figure 83. Click 9�� when prompted to confirm that you want to enable the primary database.

Page 89: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 87

Figure 83: Data Guard Manager - Enable Primary Database.

6.2.2 Update the Standby SYS Database User Account Password

For each logical standby database, repeat the steps described in section A/3/7 to update the SYS database user account password on all nodes in the standby cluster. Note that logical standby databases are fully functional databases that are usually administered and managed in the same way as primary databases, with the exception that the tables replicated on the logical standby database are read-only.

For each physical standby database (such as the physical standby database in the example configuration), however, the SYS database user account password information cannot be updated while the standby database is operating in managed recovery or read-only modes. To change the SYS account password for a standby database, you must choose one of the following options:

• Option 1 (updates SYS account password and retains any other entries in the password file)

1. Perform a site switchover (refer to section A/2) to convert the standby database to the primary database.

2. Update the SYS database user password, as described in section A/3/7.

3. Perform a site switchover to return the primary and standby databases to their original sites (refer to section A/2). Step 3 is optional for configurations where the physical locations of the primary and standby databases are not important.

Page 90: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 88

• Option 2 (creates a new password file on each node that contains only the SYS account)

1. From the Data Guard Manager tree view, right-click the standby database and choose �������. Exit from Data Guard Manager.

2. Locate the standby database password file on the cluster node that currently hosts the standby database virtual server and record the path and file name information for this file (C:\oracle\database\PWDtestdb12.ora in the example configuration). Then rename the file (for example, to C:\oracle\database\PWDtestdb12_old.ora).

3. Open an MS-DOS command window on the node that currently hosts the standby database virtual server and enter the following command: ���(# ����B5�����4 ����(�#B5����(�#4 The <fname> variable is the original location and name of the password file on that node (C:\oracle\database\PWDtestdb12.ora in the example configuration) and the <password> variable is the new password for the SYS database user account.

4. From the Oracle Fail Safe Manager tree view, right-click the standby database virtual server group and choose ��� � � ��������� 1#�. Click ? to acknowledge the Confirm Move Group informational message.

5. Repeat Steps 2 and 3 on the new cluster node.

6. Repeat Steps 4 and 5 for each additional node (if the cluster has more than two nodes).

7. After the password file has been updated on all standby nodes, choose <������� �4 ,�#��� �������� '���(�# to open the Update Database Password Wizard. In the first window, select the standby database and click 1�-� to continue. In the second window, enter the new SYS password in the Old Password, New Password, and Confirm New Password fields. Note that when all three password fields contain the same value, as in this case, Oracle Fail Safe Manager will verify that the password is valid and update the SYS account password information stored by Oracle Fail Safe, but will not attempt to update the password file on any of the cluster nodes. Click ������ to continue. Review the summary screen and, if the information is correct, click ? to continue. Click ? to close the Finished Updating Passwords window and ���� to close the status window.

8. Use Oracle Fail Safe Manager to move the standby virtual server group back to the initial cluster node so that the physical node hosting the group matches the node expected by Data Guard Manager (refer to the section A/7 for additional information about this requirement).

9. From the Oracle Enterprise Manager main menu, choose ����������� �4 '���������� �4 '�������# ���#������� and update the preferred credentials for the standby database with the new SYS database user account password information (as SYSDBA). Click ? to apply this change.

Page 91: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 89

10. From the Oracle Enterprise Manager main menu, choose �� �4 �������� ����������� �4 ���� $���# ������� to open Data Guard Manager.

11. Select the configuration from the Data Guard Manager tree view and connect to the configuration.

12. Expand the configuration in the Data Guard Manager tree view, right-click the standby site, and choose "�����. Click 9�� when prompted to confirm that you want to enable the standby site. Then right-click the standby database and choose "�����/ Click 9�� when prompted to confirm that you want to enable the standby database.

13. On each standby cluster node, delete the old password file temporarily renamed in Step 2.

6.3 Performing Rolling Upgrades

Performing a Oracle Data Guard site switchover can reduce downtime during hardware upgrades and some software upgrades; however there is still noticeable downtime and a significant number of manual steps involved when performing upgrades in this manner. Furthermore, the Oracle9i Data Guard Concepts and Administration manual specifically warns against using a switchover operation to perform a rolling upgrade of Oracle database server software. The topics in this section describe how combining Oracle Data Guard with Oracle Fail Safe overcomes many of these limitations and allows rolling upgrades to be performed in some cases with only a minute or two of downtime and without the need for a Oracle Data Guard site switchover.

The following guidelines apply to all rolling upgrade scenarios described in this section:

• If the upgrade will affect databases monitored by Oracle Data Guard, right-click on the database and choose �������. After the upgrade is complete, repeat these steps, but instead choose "����� to reenable Oracle Data Guard monitoring.

• To minimize impact on users, wait for a quiet period in cluster operations before proceeding with the upgrade process.

• When upgrading Oracle product software, do not begin the installation procedure while any Oracle Fail Safe Manager operations or MSCS Cluster Administrator operations are in progress.

• Upgrade only one node in a cluster at a time.

• If you are upgrading Oracle database software, consider performing a database backup prior to any major upgrade.

• For active/passive cluster configurations, you can eliminate a failover if you begin the rolling upgrade on the passive node.

• To ensure minimal downtime and to identify any potential issues with other software that might be running on the cluster, Oracle Corporation recommends that you test any upgrade operations on an identically configured test cluster before you upgrade the production cluster.

6.3.1 Upgrading Hardware or Operating System Software

In most cases, hardware or operating system software upgrades can be performed without an Oracle Data Guard site switchover. Table 3 summarizes the rolling upgrade process for hardware or operating system upgrades. Note also the following restrictions:

Page 92: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 90

• If a hardware upgrade will interfere with access to the cluster disks in the database virtual server group, you may need to shut down the database during the upgrade process or consider performing an Oracle Data Guard site switchover.

• If any software updates are required to ensure that the database release remains compatible with the hardware or operating system after the upgrade, refer to section A/>/> for information about additional steps that may be required.

���� ��; � �������

1 Change the database virtual

server group failback

attributes to the Prevent

Failback mode.

Oracle Fail Safe Manager Follow the instructions in the Oracle Fail Safe Manager online

help. Changing the failback attributes prevents the group from

failing back to the node while it is being rebooted or when the

cluster service is restarted.

2 Perform a planned failover by

moving all groups on the

node being upgraded to

another node.

Oracle Fail Safe Manager Choose ���������������� ���������������. (See the

instructions in the Oracle Fail Safe help for more information.)

By moving all groups to another node, you can work on the

current node. When moving a group that contains a database

with this method, Oracle Fail Safe will perform a checkpoint

operation prior to moving the group.

3 Exit Oracle Fail Safe Manager. Oracle Fail Safe Manager Choose ������������ to exit Oracle Fail Safe Manager.

4 Perform the hardware or

operating system upgrade

Various Follow the instructions provided by your hardware or operating

system vendor.

5 Run the Verify Group

operation on all groups.

Oracle Fail Safe Manager Select ������������������������������� to check all resources in

all groups and confirm that they have been configured correctly.

If you upgraded Oracle database software, the Verify Group

operation will update the tnsnames.ora file. If prompted, click

Yes. Otherwise, the Oracle database might not come online after

you add it to a group.

6 Repeat steps 2 through 5 on

the other server node or

nodes in the cluster.

Various No comments.

7 Run the Verify Cluster

operation.

Oracle Fail Safe Manager This step verifies that there are no discrepancies in the software

installation, such as with the release information on each node in

the cluster.

8 Restore the failback policy

attributes on the groups.

Oracle Fail Safe Manager Follow the instructions in the Oracle Fail Safe Manager online

help to set the failback policy for all groups in the cluster.

9 Fail back groups, as

necessary, by moving groups

back to the other node or

nodes.

Oracle Fail Safe Manager Perform a planned failover to move the groups back to the

preferred node. This rebalances the workloads across the cluster

nodes. Refer to the instructions in the Oracle Fail Safe Manager

online help regarding moving a group to a different node.

Table 3: Hardware or Operating System Rolling Upgrade Process

Page 93: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 91

6.3.2 Upgrading Oracle Fail Safe or Oracle Application Software

Generally, the rolling upgrade steps described in the Oracle Fail Safe Installation Guide for upgrading Oracle Fail Safe or other Oracle software apply, provided that the software being upgraded is installed in a different home from the Oracle database software and that no scripts or other changes must be applied to the database as part of the upgrade process.

Note that additional steps beyond those listed in Table 3 are required when performing rolling upgrades of Oracle software. Refer to the Oracle Fail Safe Installation Guide for complete details. If any software updates are required to the database to ensure that it remains compatible with the Oracle Fail Safe or other Oracle application software being upgraded, refer to section A/>/> for information about additional steps that may be required.

6.3.3 Upgrading Oracle Database Software

Both Oracle Fail Safe and Oracle Data Guard currently impose restrictions when performing upgrades of Oracle database software. The downtime required during database upgrades varies, based on the nature of the upgrade being performed. In many cases, Oracle Fail Safe can help to reduce the overall downtime experienced by end users during upgrades of Oracle database software by allowing program executable files to be upgraded on one cluster node while users continue to work on another cluster node. Refer to the database upgrade information provided in the Oracle Fail Safe Installation Guide and the Oracle9i Database documentation set for more information. Also, several upgrade-related Support Notes for Standby databases are available through Oracle MetaLink and are listed in section &/% of this paper.

6.3.3.1 Applying a Database Software Patch

In general, if there are no changes to the in-memory or on-disk structure of the database and if the initial three fields of the database version do not change (for example, during the application of a software patch to the program executable software), then the rolling upgrade process outlined in section A/>/3 can be used to minimize downtime. Perform each step in parallel on the primary and standby clusters so that the state of the software on the primary and standby cluster nodes remains consistent. Note that you will not need to take the database out of the virtual server group during the rolling patch application process.

6.3.3.2 Upgrading to a New Database Version

If any of the initial three fields of the database version changes (for example, from release 9.0.1 to release 9.0.2), or if any scripts or changes must be applied to the database structure during the upgrade process, then additional steps that may include periods of database downtime are required. The specific upgrade steps also depend on whether any nologging changes to the database are required. In general, you will need to unconfigure the database virtual servers, upgrade the databases, and then reconfigure the database virtual servers. If nologging changes to the database are required, then the additional step of refreshing or re-instantiating the standby databases may also required. Table 4 summarizes the major tasks in the database upgrade process.

Page 94: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 92

���� ��; � �������

1 Prior to upgrading, check for the

existence of nologging

operations and update the

standby database if necessary.

Various Refer to section �� � !�"#�� ����!$�%�#�������&���������

�'('��)���*� ����)��"��#����� of the Oracle9i Data Guard

Concepts and Administration Release 2 (9.2) manual for further

details.

2 Perform a hot backup or cold

backup of the primary database.

Also back up the initialization

parameter files, server parameter

files, and Oracle Data Guard

configuration files for the

primary and standby databases.

Various This step ensures that you can recover the original configuration

if necessary.

3 Remove the Oracle Data Guard

Configuration from the Data

Guard Manager and Oracle

Enterprise Manager tree views.

Data Guard Manager and

Oracle Enterprise

Manager

Follow the same steps previously described in section ! + �� All

entries for the primary and standby databases and virtual servers

should be removed from the tree views.

4 Remove the primary database

from the primary database

virtual sever.

Oracle Fail Safe Manager In the Oracle Fail Safe Manager tree view, locate the primary

cluster, select the primary database, and choose %�����#������

%�,������,������.

5 Remove the standby database

from the standby database

virtual sever.

Oracle Fail Safe Manager In the Oracle Fail Safe Manager tree view, locate the standby

cluster, select the primary database, and choose %�����#������

%�,������,������. If there are multiple standby databases,

perform this step for each standby database.

6 Exit Oracle Fail Safe Manager. Oracle Fail Safe Manager Choose ������������ to exit Oracle Fail Safe Manager.

7 Upgrade the program executable

files in the primary and standby

database Oracle home

directories and upgrade the

primary database.

Various Follow the documented upgrade or migration instructions for

your database releases. To identify the specific upgrade steps for

your configuration, review the information in:

• The Oracle Fail Safe Concepts and Administration Guide

section on -��� ����� �� ���" ���� � � ���.��������'� #���

� � � ���-��� ���&����� ��

• Oracle Support Note 165296.1

• The Oracle9i Database Migration manual

In general, you should first upgrade the program executable files

on the primary and standby cluster nodes and then apply any

required upgrade scripts to the primary database. During the

upgrade process, redo shipping between the primary and

standby sites may need to be temporarily deferred. Additional

steps, such as those described in Oracle Support Note 165296.1,

may be needed to resynchronize the primary and standby

databases.

Page 95: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 93

8 Refresh or reinstantiate each

standby databases

Various Most of the time, you will not need to reinstantiate the standby

database. If nologging changes are made during execution of the

primary database upgrade script, you can check the v$datafile

view to identify any data files with nologging changes that need

to be “refreshed” in the standby database. If there are no

nologging changes, then any changes made to the primary

database should automatically have been applied to the standby

databases. If you lose or corrupt an archive file or if you do

point in time recovery or resetlogs on the primary database, then

re-instantiation of each standby database is required. If you

encounter any problems while attempting to “refresh” standby

database files or if changes made to the primary database are not

correctly replicated to the standby databases, remove and then

reinstantiate each standby database (follow the steps previously

described in section ! +�to reinstantiate each standby database).

9 Reconfigure the disaster tolerant

high availability configuration

Data Guard Manager and

Oracle Fail Safe Manager

Perform the steps described previously in sections ! !�and ! /�to

complete the upgrade.

Table 4: Database Upgrade Process Overview

6.4 Performing an Oracle Data Guard Failover or Switchover Operation

If you are using a primary/standby configuration and you intend to perform an Oracle Data Guard failover or switchover operation, you must first disable Oracle Fail Safe Is Alive polling for the affected databases. When you perform a failover or switchover for a physical standby configuration, the Oracle Data Guard software takes the affected primary and standby databases offline. However, Oracle Fail Safe strives to keep every database it monitors online and may interfere with the operation by attempting to bring the affected databases back online before the failover or switchover completes. Similarly, during a failover or switchover for a logical standby configuration, there can be no other active sessions connected to either database (such as the connection associated with MSCS Is Alive polling).

Therefore, before performing a failover or switchover, you must first disable Is Alive polling for the primary database (if it is still accessible) and the standby database that directly participates in the role transition. When the failover or switchover is complete, reenable Is Alive polling so that Oracle Fail Safe can resume monitoring these databases for failures. Sections 6.4.1 through 6.4.4 describe the steps that must be followed during a failover or switchover operation in more detail.

6.4.1 Disable Is Alive Polling

For each database that will be directly involved in the role transition, disable Is Alive polling by performing the following steps:

1. Select the database in the Oracle Fail Safe Manager tree view.

2. Click the �������� tab.

3. In the �������� '����� box, select �������#/

Page 96: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 94

4. Click �����.

5. Wait one Is Alive polling interval to ensure that this state change takes effect before you initiate any switchover or failover commands.

6.4.2 Perform the Role Transition Operation

Use Oracle Data Guard Manager to perform the switchover or failover operation.

6.4.3 Reenable Is Alive Polling

For each database that was directly involved in the role transition, reenable Is Alive polling by performing the following steps after the switchover or failover is complete:

1. Select the database in the Oracle Fail Safe Manager tree view.

2. Click the �������� tab.

3. In the �������� '����� box, select "�����#/

4. Click �����/

Note also that Oracle Fail Safe automatically reenables Is Alive polling each time the group containing the database is moved to another cluster node.

6.4.4 Verify the Primary and Standby Virtual Server Groups

After completing a switchover or failover, use the Oracle Fail Safe Manager Verify Group menu command (click ����������� �4 +����� $���) to verify the primary and standby database virtual server groups. Correct any reported problems and rerun the Verify Group command until it is successful. As before, you can safely ignore the FS-10288 parameter file location warning.

6.4.5 Switchover Example

Follow the procedure described in section 6.4.1 to disable Is Alive polling on the primary and standby databases that will be participating in the switchover operation (testdb1 and testdb12 in the example). Then choose �C��� �4 �(������� from the Oracle Data Guard Manager menu to start the Data Guard Manager Switchover Wizard, as shown in Figure 84.

Page 97: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 95

Figure 84: Starting the Switchover Wizard

Click Next to acknowledge and close the initial Welcome window. Oracle Data Guard Manager then checks for active user sessions on the primary database and may display a Check Open Sessions Dialog window similar to that shown in Figure 85. Click ������� to disconnect the users identified in the window and proceed with the switchover operation.

Figure 85: Check Open Sessions Dialog (Primary Database)

Page 98: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 96

In the next window, select the standby site that will become the new primary site. For the example configuration, this is fs-245_site, as shown in Figure 86. Click 1�-� to continue.

Figure 86: Switchover Wizard –Select Standby Site

If there are any active user sessions connected to the selected standby database, Oracle Data Guard Manager will display a Check Open Sessions Dialog window similar to that shown in Figure 87. Click ������� to disconnect the users identified in the window and proceed with the switchover operation.

Page 99: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 97

Figure 87: Check Open Sessions Dialog (Standby Database)

After disconnecting the active sessions on the standby database, the Switchover Wizard will display a summary window similar to that shown in Figure 88. If the information displayed in the window is correct, click Finish to begin the switchover operation.

Figure 88: Switchover Wizard – Summary

During the switchover operation, a window similar to that shown in Figure 89 records progress. When the switchover is complete, click ���� to close the status report window. Oracle Data Guard Manager now displays the updated configuration, as shown in Figure 90. For the example configuration, testdb12 now has the primary role, while testdb1 has the standby role.

Page 100: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 98

Figure 89: Switchover Wizard – Status Report

Figure 90: Data Guard Configuration After Switchover

After verifying that the switchover was successful, reenable Is Alive Polling on each database following the steps described in section A/2/> and execute the Verify Group command on each database virtual server, as described in section A/2/2.

Page 101: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 99

6.5 Performing Database Backups

Because backup operations may involve taking the database offline and may consume significant system resources that could affect query response times, it is necessary to stop Oracle Fail Safe from performing Is Alive polling during backup operations. This can be done interactively using the Oracle Fail Safe Manager menu commands described in the previous section or through the Oracle Fail Safe Manager FSCMD command-line interface. The following sample script can be modified for your environment to automate database backups. In the script, replace [dbname], [groupname], and [clustername] with the name of your database, virtual server, and cluster, respectively. For the example configuration, the respective values for the standby database are ����#�73/��/�����/��, ���32%, and ���32D.

REM This sample script shows how to perform an automated backup REM operation on a database configured with Oracle Fail Safe. REM REM 1. Move the group FS Group1 that contains the database to REM the node on which the backup operation will run. REM REM Alternatively, you can map a network drive for each REM cluster disk to allow the backup software to access REM the drives through a virtual server address regardless REM of which cluster node currently owns them. fscmd movegroup [groupname] /node=[nodename] /cluster=[clustername] REM 2. Disable Is Alive polling for the database resource. fscmd disableisalive [dbname] /cluster=[clustername] REM 3. Begin the backup operation here. [insert appropriate backup commands here] REM 4. Reenable Is Alive polling for the database resource. fscmd enableisalive [dbname] /cluster=[clustername] REM The backup operation is complete.

Refer to the Oracle9i Data Guard Concepts and Administration manual for information on the specific RMAN commands used to back up databases configured with Oracle Data Guard (for example, refer to section 7D/7/7D ������� 7D@ ,���� � ����#�� �������� � )��; ,� ��� '������ ��������). Note that you will typically only backup the primary database control file and will offload all other backup operations to a standby database. To ensure that your backup software can always access the database disks, you may find it helpful on the system where the backup software will be run to map each cluster disk used by the database as a node-independent network drive using the virtual server (for example, in the example, create E@ as FF���32%F�G).

Page 102: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 100

7 SUMMARY AND MORE INFORMATION

Oracle Fail Safe and Oracle Data Guard can be effectively combined to create efficient, cost-effective, and easily configured disaster-tolerant high availability solutions. Anyone with business-critical databases deployed on Windows systems should consider the combination of Oracle Fail Safe and Oracle Data Guard to meet their high availability and disaster-tolerance requirements.

7.1 Oracle Product Documentation

• Oracle9i Database Administrator’s Guide for Windows

• Oracle9i Data Guard Broker

• Oracle9i Data Guard Concepts and Administration

• Data Guard Manager quick tour and online help system

• Oracle Fail Safe Concepts and Administration Guide

• Oracle Fail Safe Installation Guide

• Oracle Fail Safe quick tour, tutorial, and online help system

7.2 Oracle9i Database High Availability and Disaster Recovery Web Site

• http://otn.oracle.com/deploy/availability

7.3 Oracle Fail Safe Web Sites

• http://www.oracle.com/ip/deploy/database/features/failsafe/

• http://otn.oracle.com/tech/windows/failsafe/

7.4 Oracle University Online Learning Web Site

• http://www.oracle.com/education/oln/index.html

o Introduction to Oracle Fail Safe self-paced eStudy class, Database Administration Capacity, Availability, and Recovery track

o OCP - Oracle9i New Features for Administrators Exam Prep - Module 9: Data Guard eClass, Database Administration Capacity, Availability, and Recovery track

7.5 Oracle Support MetaLink Web Site

• http://metalink.oracle.com/

o Support Note 165304.1: Downgrading from 9i with Standby Database in Place

o Support Note 165296.1: Upgrading to 9i with Standby Database in Place

Page 103: Data Guard and Fail Safe

Disaster-Tolerant High Availability Page 101

Page 104: Data Guard and Fail Safe

Disaster-Tolerant High Availability:

Oracle Data Guard with Oracle Fail Safe

June 2002

Author: Laurence Clarke

Contributing Authors: Vivian Schupman, Ingrid Stuart

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores, CA 94065

U.S.A.

Worldwide Inquiries:

Phone: +1.650.506.7000

Fax: +1.650.506.7200

www.oracle.com

Oracle is a registered trademark of Oracle Corporation. Various

product and service names referenced herein may be trademarks

of Oracle Corporation. All other product and service names

mentioned may be trademarks of their respective owners.

Copyright © 2002 Oracle Corporation

All rights reserved.