fail safe

15
8.1 MSCS Cluster Administrator Displays Problems With Fail-Safe Resource Types Sometimes, after completing an Oracle Fail Safe installation, you see problems with the fail-safe resource types (such as databases) in MSCS Cluster Administrator. MSCS Cluster Administrator denotes the problem by displaying an Oslash symbol (Ø) over the resource type name. If this occurs, follow these steps: 1. If you forgot to restart the cluster nodes after installing Oracle Fail Safe, do so now. 2. Make sure that the PATH environment variable includes the Oracle Services for MSCS path. (In the MS-DOS command prompt, enter PATH.) The Oracle Services for MSCS path (ORACLE_HOME\fs\fssvr\bin) must be included. If it is not included, add it, and then restart the nodes on which the Oracle Services for MSCS path is missing. 3. Make sure that the Oracle Fail Safe resource DLL, FsResOdbs.dll, is installed in ORACLE_HOME\fs\fssvr\bin. If the resource DLL is not there, reinstall Oracle Fail Safe. 4. Use Oracle Fail Safe Manager to verify the cluster (on the Troubleshooting menu, select Verify Cluster), then restart each cluster node, one at a time. The Verify Cluster command automatically verifies registration of Oracle resource DLLs. You must not restart all cluster nodes. After you restart one node, check MSCS Cluster Administrator to see if the Oslash symbol has been removed from the resource type names. If the Oslash symbol is gone, you must not restart all cluster nodes. A comman d-line interface (FSCMD) for managing the cluster through batch programs or scripts Hardware o Microsoft cluster nodes, each with one or more local (private) disks where executable application files are installed.

Upload: cristian-leiva-l

Post on 13-Apr-2015

31 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: fail safe

8.1 MSCS Cluster Administrator Displays Problems With Fail-Safe Resource Types

Sometimes, after completing an Oracle Fail Safe installation, you see problems with the fail-safe resource types (such as databases) in MSCS Cluster Administrator. MSCS Cluster Administrator denotes the problem by displaying an Oslash symbol (Ø) over the resource type name.

If this occurs, follow these steps:

1. If you forgot to restart the cluster nodes after installing Oracle Fail Safe, do so now.2. Make sure that the PATH environment variable includes the Oracle Services for MSCS

path. (In the MS-DOS command prompt, enter PATH.) The Oracle Services for MSCS path (ORACLE_HOME\fs\fssvr\bin) must be included. If it is not included, add it, and then restart the nodes on which the Oracle Services for MSCS path is missing.

3. Make sure that the Oracle Fail Safe resource DLL, FsResOdbs.dll, is installed in ORACLE_HOME\fs\fssvr\bin.

If the resource DLL is not there, reinstall Oracle Fail Safe.

4. Use Oracle Fail Safe Manager to verify the cluster (on the Troubleshooting menu, select Verify Cluster), then restart each cluster node, one at a time. The Verify Cluster command automatically verifies registration of Oracle resource DLLs. You must not restart all cluster nodes. After you restart one node, check MSCS Cluster Administrator to see if the Oslash symbol has been removed from the resource type names. If the Oslash symbol is gone, you must not restart all cluster nodes.

A command-line interface (FSCMD) for managing the cluster through batch programs or scripts

Hardwareo Microsoft cluster nodes, each with one or more local (private) disks where executable

application files are installed.o Private (heartbeat) interconnect between the nodes for intracluster communications.o Public interconnect (Internet, Intranet, or both) to the local area network (LAN) or wide

area network (WAN).o NTFS formatted disks on the shared storage interconnect (SCSI or Fibre Channel).

All data files, log files, and other files that need to fail over from one node to another are located on these cluster disks.

Note:

See the documentation for your cluster hardware for information about using redundant

Page 2: fail safe

hardware, such as RAID, to further ensure high availability.

o Additional redundant components (UPS, network cards, disk controllers, and so on). Software (installed on each node)

o Microsoft Windowso Oracle Services for MSCSo Oracle Fail Safe Manager (installed on one or more cluster nodes, one or more client

workstations, or both)o One or more of the following resources that you want to make highly available, such as:

Oracle single-instance databases Oracle HTTP Servers Oracle applications or third-party applications that can be configured as Windows

generic services

Figure 1-4 Hardware and Software Components Configured with Oracle Fail Safe

 

Oracle Fail Safe high-availability solutions use Microsoft cluster hardware and Microsoft Cluster Server (MSCS) software.

A Microsoft cluster is a configuration of two or more independent computing systems (called nodes) that are connected to the same disk subsystem.

Page 3: fail safe

Microsoft Cluster Server (MSCS)  software, included with Microsoft Windows software, lets you configure, monitor, and control applications and hardware components (called resources) that are deployed on a Windows cluster.

2.1.4 The Quorum Resource

The quorum resource maintains the configuration data (metadata) necessary for recovery of the cluster in case of a power outage or damage to data in memory. The quorum resource is accessible to other cluster resources so that all cluster nodes have access to the cluster metadata. The quorum resource performs these services:

Determines which cluster node controls the cluster

Stores logging information necessary to recover the cluster from a failure

Maintains access to the most current cluster metadata

The quorum resource can be owned by only one cluster node at a time. If a cluster node becomes isolated (cannot communicate with the other cluster nodes because of a network failure, for example), then the node that gains control of the quorum resource takes over the workload of the isolated node as though a failover had occurred.

To view the location of the quorum resource and the maximum size of the quorum log, select the cluster in the Oracle Fail Safe Manager tree view, then click the Quorum tab. To change the location of the quorum resource or the maximum size of the quorum log, open MSCS Cluster Administrator, then in the File menu select Properties, then click the Quorum tab.

2.2.1 Resources

A cluster resource is any physical or logical component that is available to a computing system and has the following characteristics:

It can be brought online and taken offline.

It can be managed in a cluster.

It can be hosted by only one node in a cluster at a given time, but can be potentially owned by another cluster node. (For example, a resource is owned by a given node. After a failover, that resource is owned by another cluster node. However, at any given time only one of the cluster nodes can access the resource.)

2.2.2 Groups

Page 4: fail safe

A group is a logical collection of cluster resources that forms a minimal unit of failover. During a failover, the group of resources is moved to another cluster node. A group is owned by only one cluster node at a time. All resources required for a given workload (database, disks, and other applications) should reside in the same group.

For example, a group created to configure an Oracle database for high availability using Oracle Fail Safe might include the following resources:

All disks used by the Oracle database

An Oracle database instance

One or more virtual addresses, each one consisting of:

o An IP address

o A network name

An Oracle Net network listener that listens for connection requests to databases in the group

An Oracle Intelligent Agent that manages communications between Oracle Enterprise Manager and the databases in the group

Note that when you add a resource to a group, the disks it uses are also included in the group. For this reason, if two resources use the same disk, they cannot be placed in different groups. If both resources are to be fail-safe, both must be placed in the same group.

Each node in the cluster can own one or more groups. Each group is composed of an independent set of related resources. The dependencies among resources in a group define the order in which the cluster software brings the resources online and offline. For example, a failure causes the Oracle application or database (and Oracle Net listener) to be brought offline first, followed by the physical disks, network name, and IP address. On the failover node, the order is reversed; MSCS brings the IP address online first, then the network name, then the physical disks, and finally the Oracle database and Oracle Net listener or application.

2.2.4 Resource Types

Each resource type (such as a generic service, physical disk, Oracle database, and so on) is associated with a resource dynamic-link library (DLL) and is managed in the cluster environment using this resource DLL. There are standard MSCS resource DLLs as well as custom Oracle resource DLLs. The same resource DLL may support several different resource types.

Page 5: fail safe

MSCS provides resource DLLs for the resource types that it supports, such as IP addresses, physical disks, generic services, and many others. (A generic service resource is a Windows service that is supported by a resource DLL provided in MSCS.)

Oracle Fail Safe uses many of the MSCS resource DLLs to monitor resource types for which Oracle Fail Safe provides custom support, such as Oracle HTTP Server and generic services.

Oracle provides a custom DLL for the Oracle database resource type. MSCS uses the Oracle resource DLL to manage the Oracle database resources (bring online and take offline) and to monitor the resources for availability.

Oracle Fail Safe provides the following resource DLL files to enable MSCS to communicate with and monitor Oracle database resources:

FsResOdbs.dll provides functions that enable MSCS to bring an Oracle database online or offline and check its status through Is Alive polling.

FsResOdbsEx.dll provides a resource administration extension DLL file that is used by the MSCS Cluster Administrator to display the properties of the Oracle database resource.

For example, when you use Oracle Fail Safe Manager to add an Oracle database to a group, Oracle Fail Safe creates the database resource and an Oracle listener resource.

Figure 2-4 shows how Oracle Fail Safe Manager displays resource types. Note that the Oracle HTTP Server resource type is displayed as an Oracle HTTP Server in Oracle Fail Safe Manager and as a generic service in MSCS Cluster Administrator.

Because Oracle Fail Safe has more information than MSCS about Oracle cluster resources, Oracle recommends that you use Oracle Fail Safe Manager (or the FSCMD command) to configure and administer Oracle databases and applications.

Figure 2-6 Accessing Cluster Resources Through a Virtual Server

Page 6: fail safe

Description of the illustration virtualserver.gif

2.4 Allocating IP Addresses for Virtual Addresses

When you set up a cluster, allocate at least the following number of IP addresses:

One IP address for each cluster node

One IP address for the cluster alias (described in Section 2.5)

One IP address for each group

For example, the configuration in Figure 2-6 requires five IP addresses: one for each of the two cluster nodes, one for the cluster alias, and one for each of the two groups. (Note that you can specify multiple virtual addresses for a group; see Section 4.7 for details.)

See the Oracle Fail Safe Installation Guide for more information about allocating IP addresses for your Oracle Fail Safe environment.

Client applications do not use the cluster alias when communicating with a cluster resource. Rather, clients use one of the virtual addresses of the group that contains that resource.

4. Run the fssvr command qualifier, /GETSECURITY, which is provided by OracleFail Safe on each cluster node. The /GETSECURITY qualifier displays securityinformation about the system where the command is run.The command and its associated output should be similar to the following:

Page 7: fail safe

fssvr /getsecurity

Step 7 Configuration Tools Window and Associated Dialog Box: Enter a domainuser account for Oracle Services for MSCS.If the installation is successful, then the Configuration Tools window and the OracleServices for MSCS Account/Password dialog box open. In the Oracle Services forMSCS Account/Password dialog box enter:1. A value in the Domain\Username box for a user account that has AdministratorprivilegesFor example, if you are using the NEDCDOMAIN and your user name iscluadmin, then enter NEDCDOMAIN\cluadmin.2. The password for the account in the Password and Confirm Password boxesOracle Services for MSCS uses the account you specify to access the cluster. OracleServices for MSCS runs as a Microsoft Windows service (called OracleMSCSServices)under a user account that must be a domain user account (not the system account) thathas Administrator privileges on all nodes of this cluster. The account must be the sameon all nodes of this cluster, or you will receive an error message when you attempt toconnect to a cluster using Oracle Fail Safe Manager.

Page 8: fail safe

3.1.1 Start Oracle Fail Safe ManagerAfter the installation is completed, start Oracle Fail Safe Manager from the MicrosoftWindows taskbar by selecting Programs (or All Programs) from the Windows Startmenu, then Oracle - ORACLE_HOME, then Oracle Fail Safe Manager. (ORACLE_HOME is the name of the Oracle home where you installed Oracle Fail Safe.)When Oracle Fail Safe Manager opens, usually the Add Cluster to Tree dialog box alsoopens, as shown in Figure 3–1. If the Add Cluster to Tree dialog box does not open,from the File menu, select Add Cluster to Tree. In the Cluster Alias box, enter the aliasfor the cluster and then click OK.

Save as local preferred credentialsSelect to have Oracle Fail Safe Manager save the account information you haveentered to a text file, ORACLE_HOME\fs\fsmgr\FsClusters.txt on thesystem from which you are running Oracle Fail Safe Manager. The password issaved in an encrypted format. This lets you disconnect and reconnect to the cluster(from your current system) without having to specify the account informationeach time a reconnection is requested.

Cluster Alias, User Name, Password, and Domain should all be entered. The Save asLocal Preferred Credentials option is not a required choice on any Microsoft Windowssystem. If you do not specify a user name, password, or domain, Oracle Fail Safeattempts to connect to the cluster using the account with which you logged on to theserver node.Once a connection to the cluster is made, the Oracle Fail Safe Manager main windowexpands the tree view.

Page 9: fail safe

3.1.4 Verify the OracleMSCSServices Service EntryOn successful installation and verification of Oracle Services for MSCS, the ServicesControl Panel on each cluster node must include a new service entry namedOracleMSCSServices.To verify the OracleMSCSServices entry in the Services Control Panel:1. Open the Windows Services Window.2. Scroll down to the Oracle service listings and locate the OracleMSCSServicesentry.The Startup status for OracleMSCSServices is displayed as Started on the nodewhere the Cluster Group resides, and it is displayed as Manual on the othercluster nodes.3. Perform steps 1 and 2 on each cluster node.

3.1.5 Verify That Oracle Services for MSCS Is in the Cluster GroupThe Oracle Services for MSCS service is maintained by MSCS. On successfulinstallation of Oracle Services for MSCS on each cluster node, start MSCS ClusterAdministrator and verify that it includes Oracle Services for MSCS as a resource in theCluster Group (the group containing the MSCS resources critical to cluster operation).To verify that Oracle Services for MSCS is listed as a resource, start MSCS ClusterAdministrator, then click Cluster Group in the Cluster Administrator tree view toselect it, and, in the right-hand side pane, locate the Oracle Services for MSCS entry inthe Name column, as shown in Figure 3–3.

Figure 3–3 Oracle Services for MSCS in the Cluster Administrator Window

3.1.6 Verify That Oracle Resource DLLs Are Registered with MSCSAfter installing Oracle Services for MSCS on all cluster nodes and verifying the cluster,start the MSCS Cluster Administrator and verify that it includes the cluster resourcetypes for Oracle Fail Safe.For example, if you have the database installed on the cluster nodes, start the MSCSCluster Administrator. Then select Resource Types in the Cluster Administrator treeview, and, in the right-hand side pane, locate the Oracle Database and Oracle TNSListener entries in the Display Name column.

Page 10: fail safe

Manually Registering Oracle Resource DLLFilesOracle Fail Safe provides resource dynamic-link library (DLL) files for the OracleDatabase and Oracle TNS Listener. The DLL files enable the Cluster Service tocommunicate and manage the Oracle Database and listener resources. Other Oracleresources that do not require specialized DLL files are managed as Generic Services.The following topics are discussed in this appendix:■ Oracle Resource DLL Files■ Registering and Unregistering the Oracle Database Resource DLL Files

C.1 Oracle Resource DLL FilesOracle Services for MSCS includes the resource DLL files shown in Table C–1. Thesefiles enable MSCS to communicate with and manage the Oracle resource types.As with other cluster resources, you can apply all advanced properties of controllingthe failover parameters to these Oracle resources. You can control:■ How often MSCS should poll the Oracle resource health (Looks Alive, Is Alivepolling intervals)■ Whether a database resource should be restarted when it fails, and, if so, howmany times MSCS should attempt to restart it before failing over to the other node■ How long MSCS should wait before declaring failure of the resource (pendingtimeout) during the startup and shutdown of the resourceTable C–1 Oracle Resource DLL FilesFile Type DescriptionFsResOdbs.dll Oracle Database, OracleTNS Listener, and Oracleresource type DLL fileProvides functions to allow the cluster to bring an Oracleresource online or offline and check the health of theresource through Is Alive polling. When the resource isonline, the Oracle resource DLL guarantees that theresource is accessible by the client. Otherwise, the Is Alivepolling fails.FsResOdbsEx.dll Oracle Database resourceadministration extensionDLL fileUsed by MSCS Cluster Administrator to display theproperties of the Oracle Database resource.FsResTnsLsnrEx.dllOracle TNS Listenerresource extension DLL fileUsed by MSCS Cluster Administrator to display theproperties of the Oracle TNS Listener resource.

Page 11: fail safe

C.2 Registering and Unregistering the Oracle Database Resource DLLFilesTypically, the Oracle Fail Safe Verify Cluster operation automatically verifies theOracle Database and listener resource DLL files and their registration with the MSCSsoftware. If the Verify Cluster operation finds that the DLL files are notregistered, it registers them with the MSCS software. Using the Verify Clusteroperation is the preferred method for registering DLL files.However, if you find that the Oracle resource DLL files are not registered properly,then you can use the commands in Section C.2.1 and Section C.2.2 to manually registeror unregister them.

C.2.1 Oracle Resource DLL FilesTo register the Oracle Database resource DLL files, use the following commands:fssvr /register "Oracle Database" FsResOdbs.dllfssvr /register "Oracle TNS Listener" FsResOdbs.dllTo unregister the Oracle Database resource DLL files, use the following commands:fssvr /unregister "Oracle Database"fssvr /unregister "Oracle TNS Listener"

C.2.1 Oracle Resource DLL FilesTo register the Oracle Database resource DLL files, use the following commands:fssvr /register "Oracle Database" FsResOdbs.dllfssvr /register "Oracle TNS Listener" FsResOdbs.dllTo unregister the Oracle Database resource DLL files, use the following commands:fssvr /unregister "Oracle Database"fssvr /unregister "Oracle TNS Listener"

C.2.2 Oracle Resource Administrator Extension DLL FilesTo register the Oracle Database resource administrator extension DLL files on thecluster nodes, use the following commands:fsregadm /r FsResOdbsEx.dllfsregadm /r FsResTnsLsnrEx.dllTo unregister the Oracle Database resource administrator extension DLL files on thecluster nodes, use the following commands:fsregadm /u FsResOdbsEx.dllfsregadm /u FsResTnsLsnrEx.dllIf MSCS Cluster Administrator is installed on a node that is not a member of a cluster,you need to register the Oracle Database resource administrator extension DLL withthe cluster so that you can view Oracle Database resource parameters from the MSCSCluster Administrator. To register, use the fsregadm command. You must issue thecommand from the bin directory where Oracle Fail Safe Manager is installed (becauseOracle Fail Safe Manager is not in the path environment variable).For example:

fsregadm /r /c Cluster1 FsResOdbsEx.dllfsregadm /r /c Cluster1 FsResTnsLsnrEx.dllYou must specify the cluster name with the /c option; otherwise the command willfail.