db2 9 hadr build a standby database

DB2 9 HADR: Build a Standby Database Using SnapMirror® and Create a Read/Write Copy of the Standby Using FlexClone™ (SnapMirror and FlexClone Technologies available on NetApp FAS or IBM N Series storage system)

Dale McInnis, IBM Toronto Lab; Jawahar Lal and Roger Sanders, Network Appliance

January 22, 2007 | TR-3494

Executive Summary DB2 customers using NetApp FAS or IBM N Series storage systems at the back end must take advantage of advanced features of Data ONTAP® such as SnapMirror, FlexVol™, FlexClone, and RAID-DP™ to ensure data availability and reliability. These features combined with HADR deliver highest ROI. SnapMirror allows building a standby database in few quick and easy steps. Similarly, the FlexClone technology allows creating an exact copy of the standby database without consuming any additional space.

2

Table of Contents

1. Introduction......................................................................................................................3 1.1. Purpose and Scope.................................................................................................................. 3 1.2. DB2 High-Availability Disaster Recovery (HADR) ................................................................... 3 1.3. Automatic Client Rerouting ...................................................................................................... 4 1.4. Basic HADR Commands.......................................................................................................... 5 1.5. SnapMirror Technology............................................................................................................ 5 1.6. Aggregates and RAID-DP........................................................................................................ 7 1.7. FlexVol Technology.................................................................................................................. 8 1.8. FlexClone Technology ............................................................................................................. 8 1.9. The db2relocatedb Command................................................................................................ 10

2. Commonly Used HADR Terms .....................................................................................11

3. Requirements and Assumptions..................................................................................12 3.1. General Assumptions............................................................................................................. 12 3.2. Environment Assumptions ..................................................................................................... 13 3.3. Security and Access Issues ................................................................................................... 13 3.4. HADR Setup Requirements ................................................................................................... 13

4. Infrastructure .................................................................................................................14 4.1. Architecture – Build a Standby............................................................................................... 14 4.2. Network Architecture.............................................................................................................. 14 4.3. Create a Read/Write Copy of the Standby Database ............................................................ 15

5. Configuration .................................................................................................................15 5.1. Configure Storage System..................................................................................................... 15 5.2. SnapMirror.............................................................................................................................. 16

6. Installing DB2 9..............................................................................................................17

7. Create a DB2 Database .................................................................................................17

8. Build the Standby Database Using SnapMirror ..........................................................21

9. Configure HADR ............................................................................................................23 9.1. Configuring the Standby Database Server ............................................................................ 23 9.2. Configuring the Primary Database Server ............................................................................. 25

10. Create a Writable Copy of the Standby Database ......................................................26 10.1. Select a Database Server from Which to Access the Cloned Database ............................... 26 10.2. Clone a Standby Database .................................................................................................... 27

11. Recommendations.........................................................................................................31

12. Summary ........................................................................................................................33

13. References .....................................................................................................................33

3

1.

1.1.

1.2.

Introduction In today’s world, businesses have their operations around the globe and need to operate 24x7 to serve customers' needs. In order to stay on top of competition and meet customer expectations, the computing system serving the business has to be 100% reliable. To help, DB2 has integrated the high-availability disaster recovery (HADR) feature into DB2 version 8.2 and higher. The HADR and automatic client rerouting capability features enable customers to protect their DB2 databases against prolonged downtime.

The DB2 HADR feature provides immense data protection from both partial and full site failures. In case of failure, the standby database can be made available to clients across the network until the primary database is repaired and returned to service. Recovery may include recovery from corruption, natural disaster at the source site, accidental deletion, sabotage, etc. In the event of disaster, all applications traffic is rerouted to the standby database at the disaster recovery site for as long as necessary to recover the primary site. Once the primary database is repaired, it can rejoin as a standby and perform catch-up with the new primary database. The role of the new standby database can be switched back to the primary database by a takeover operation.

The complementing technologies such as SnapMirror, FlexVol, FlexClone, and RAID-DP can be used in conjunction with DB2 HADR to take the availability and reliability to new heights. SnapMirror, FlexVol, FlexClone, and RAID-DP are advanced technologies available on NetApp FAS or IBM N Series storage systems.

Purpose and Scope The scope of this document is limited to building a DB2 HADR environment and creating a read/write copy of the standby database. This document demonstrates how to use SnapMirror to build the standby database for an HADR configuration and create a read/write copy of the standby using FlexClone technology.

SnapMirror makes it possible to replicate a database at a remote physical location known as disaster recovery (DR) site. For the DB2 HADR configuration purpose, SnapMirror is used to build and initialize a standby database on the DR site. Once the standby database is constructed, the DB2 HADR functionality takes over the responsibility of keeping the standby database up to date with its primary database.

DB2 High-Availability Disaster Recovery (HADR) DB2 high-availability disaster recovery, henceforth specified as HADR, is an alternative solution to traditional high-availability disaster recovery and is offered by IBM as a feature of DB2. The HADR replicates data from a source database or primary database to a target or a standby database. The HADR provides protection for both partial and complete site failures. Combined with the new automatic client reroute capability, the HADR provides transparency to the application regardless of the failure type such as hardware, network, or software. The failure may result from a natural disaster such as earth quack, flood, or a human error. The HADR provides multiple levels of protection, allowing flexibility in the environment. Additionally, DB2 provides an easy to use wizard that allows the entire configuration to be set up in a matter of minutes.

HADR supports the following three modes of synchronization:

• synchronous • near synchronous • asynchronous

By specifying a synchronization mode, customers can choose the level of protection they want from potential loss of data. For details on the synchronization modes, refer to the data recovery and high-availability manual on the IBM Web site.

Applications can only access the current primary database. The standby database is kept up-to-date by rolling forward log data that is generated on the primary database and shipped to the standby database server.

The HADR is available as part of the DB2 Enterprise Server Edition at no extra charge. Users of DB2 Express and DB2 Workgroup Server Editions can add the HADR to their servers by purchasing the DB2 HADR Option.

The HADR has some limitations. The following list summarizes some of the HADR restrictions and limitations:

• The HADR is not supported on a partitioned database.

http://www-306.ibm.com/software/data/db2/udb/support/manualsv9.html

4

1.3.

• The primary and standby databases must have the same operating system version and the same version of DB2, except for a short time during a rolling upgrade.

• The DB2 release on the primary and standby databases must be the same bit size (32 or 64 bit). • Reads on the standby database are not supported. Clients cannot connect to the standby database. • Log archiving can only be performed by the current primary database. • Normal backup operations are not supported on the standby database. • Operations that are not logged, such as changes to database configuration parameters and to the

recovery history file, are not replicated to the standby database. • Load operations with the COPY NO option specified are not supported. • Use of data links is not supported. • For a database’s transaction logs, raw I/O (raw disk devices) is not supported.

Automatic Client Rerouting The automatic client rerouting capability is a feature of DB2 that protects applications against communication failure with the database server so that applications continue to work with minimum or no interruption. If the automatic client rerouting feature is enabled and a communication failure is detected, all the application’s traffic from the primary server is transparently rerouted to an alternative server. In case of HADR, the alternative server is the standby database server. Figure 1 illustrates the automatic client rerouting feature in an HADR environment where all the application connections are rerouted to the standby database after a failure is detected on the primary database.

Figure 1) Automatic client reroute and HADR.

Automatic client rerouting is only possible when an alternate database location has been specified at the server. If the alternate database location had not been specified, then the client applications will receive error message SQL30081, and no further attempts will be made to establish a connection with the server. If you set up and configured HADR using the HADR setup wizard in the Control Center, the automatic client rerouting feature is enabled by default.

5

1.4.

•

•

•

1.5.

The automatic client reroute is only supported with TCP/IP. The authentication type must be the same on the primary and the standby databases; otherwise, applications will not be able to connect to the standby database. On communication failure, all the session resources such as global temporary tables, identity, sequence, cursors, and server options for federated systems are lost. Applications have to reestablish session resources in order to continue the work.

Basic HADR Commands In order to manage an HADR environment, you need to know the following three basic HADR commands:

• START HADR • STOP HADR • TAKEOVER HADR

The START HADR command is used to start the HADR operations for a given database. If the database has not been already activated, then this command activates the database. The database assumes the role of primary or standby based on the role specified in the command. The syntax for the command is:

start hadr on [DBName] as [primary <by force> | standby]

Where DBName identifies the name of the database on which the HADR operation is to be started.

Note: Parameters shown in angle brackets (< >) are optional; parameters or options shown in square brackets ([ ]) are required and must be provided; a comma followed by ellipses (…) indicates that the preceding parameter can be repeated multiple times.

The command STOP HADR is used to stop the HADR operation for the primary or the standby database. The database configuration parameters related to HADR remain unchanged so that the database can easily be reactivated as an HADR database. The command syntax: stop hadr on [DBName]

Where DBName identifies the name of the database on which the HADR operation is to be stopped.

Note: If you want to stop the HADR operation on a given database but you still want the database to maintain its role as either a primary or a standby database, do not issue the STOP HADR command. Instead, issue the DEACTIVATE DATABASE command. If you issue the STOP HADR command, the database will become a standard database and might require reinitialization in order to resume operations as an HADR database.

The TAKEOVER HADR command can be issued on a standby database only. This command instructs the standby database to take over the role of the primary database and start acting as the new primary. The syntax for this command is:

takeover hadr on [DBName] <by force>

Where DBName identifies the name/alias assigned to the current standby database.

You can manage the HADR from the CLI or the HADR Manage menu available in the DB2 Control Center.

SnapMirror Technology SnapMirror technology is a feature of Data ONTAP. The technology provides an efficient way to replicate data, over network, from one NetApp FAS or IBM N Series storage system to another for backup and disaster recovery purposes. The storage system from which data is transferred is referred to as the SnapMirror source, and the storage system to which data is transferred is referred to as the SnapMirror destination. A SnapMirror source and its corresponding destination can reside on the same storage system or on two separate storage systems that are miles apart (provided that both storage systems are able to communicate with each other over a network).

After an initial baseline transfer of the entire data set, the subsequent updates only transfer new and changed data blocks from the source to the destination, which makes SnapMirror highly efficient in terms of network

6

bandwidth utilization. At the end of each replication event, the SnapMirror target volume becomes an identical copy of the SnapMirror source volume. The destination file system is available for read-only access, or the mirror can be “broken” to enable writes to occur at the destination. After the mirror has been broken, it can be reestablished by replicating the changes made at the destination back onto the source file system.

A SnapMirror environment is established by adding a second NetApp FAS or IBM N Series storage system to the network and configuring it as a SnapMirror destination. Figure 2 illustrates the initial asynchronous synchronization of a simple SnapMirror environment. In this example, a Snapshot™ copy is created at the start of the synchronization process and copied to the mirror destination. Then, all corresponding data blocks are copied to the destination in the background while data continues to change at the source.

Figure 2) Initial asynchronous synchronization of a simple SnapMirror environment.

During the subsequent synchronization process, another Snapshot copy is created and copied to the destination. Then, only the data blocks that have been added or changed since the last synchronization are copied to the destination N Series storage system. This has no effect on the blocks and Snapshot copy transferred during the previous synchronization process. Figure 3 illustrates a subsequent asynchronous synchronization of the SnapMirror environment shown in Figure 2.

Figure 3) Subsequent asynchronous synchronization of a simple SnapMirror environment.

Architecturally, SnapMirror software is a logical extension of the WAFL® file system, and in particular, the Snapshot feature. Using Snapshot, you can create a read-only copy of an entire storage appliance volume. Two sequential Snapshot copies can then be compared and the differences identified. Since this comparison

7

takes place at the block level, only the changed blocks need be sent to the mirror target. By implementing the update transfers asynchronously, data latency issues inherent with remote synchronous mirroring techniques are eliminated. The elegance of these two design features becomes particularly apparent when running mirror pairs over WAN topologies.

SnapMirror is a very powerful feature and can be used to meet various database replication needs. For the HADR purpose, we will use SnapMirror to create a standby database on a DR site.

For further information on SnapMirror technology, refer to the technical report SnapMirror Deployment and Implementation Guide (www.netapp.com/tech_library/ftp/3390.pdf) on the NOW™ (http://now.netapp.com) Web site.

1.6. Aggregates and RAID-DP An aggregate is simply a pool of disks that are composed of one or more RAID groups. A RAID group is a collection of one or more data disks, along with a parity disk, and a double-parity disk if RAID-DP is used. The minimum number of disks allowed in a single RAID group on a NetApp FAS or IBM N Series storage system is 2 if a RAID4 configuration is used and 3 if a RAID-DP configuration is used.

The maximum number of disks allowed for a RAID-DP configuration is 28 (26 data disks and 2 parity disks); the maximum number allowed for a RAID4 configuration is 14 (13 data disks and 1 parity disk). The default RAID group type used for an aggregate is RAID-DP, but can be changed to RAID4.

Figure 4 illustrates two aggregates and their underlying RAID groups. In this example, the first aggregate (Aggregate A) consists of 2 RAID groups that are using RAID-DP. The second aggregate (Aggregate B) consists of 1 RAID group that is using RAID4. Aggregate A has a total of 12 data disks available for storage; Aggregate B has a total of 7 data disks.

Figure 4) Aggregate and underlying RAID groups.

If necessary, additional disks can be added to an aggregate after it has been created. However, disks must be added such that the RAID group specification for the aggregate remains intact. For example, in order to add disks to the first aggregate (Aggregate A) shown in Figure 4, a minimum of 3 disks would be required - one data, one parity, and one double-parity. These three disks would be placed in a third RAID group.

http://www.netapp.com/tech_library/ftp/3390.pdf

http://now.netapp.com/

8

1.7.

1.8.

FlexVol Technology A FlexVol volume is simply a storage container that can be sized according to how much data is to be stored in it, rather than on the physical size of the disk drives used. FlexVol volumes are logical storage containers; therefore, they can be sized, resized, managed, and moved independently of their underlying physical storage. FlexVol volumes are not bound by the limitations of the disks on which they reside. Each volume depends on its containing aggregate for all its physical storage, that is, for all storage in the aggregate's disks and RAID groups. Because the FlexVol volume is managed separately from the aggregate, you can create small FlexVol volumes (20 MB or larger) and you can increase or decrease the size of FlexVol volumes in increments as small as 4 KB.

A system administrator can reconfigure relevant FlexVol volumes at any time. The reallocation of storage resources does not require any downtime, and is transparent to users regardless of whether a file system is used by the FlexVol volume or a LUN mapped to a host in a block's environment. Furthermore, reallocation of storage resources is non-disruptive to all clients connected to the FlexVol volume being resized.

A FlexVol volume uses all the disk spindles available for the containing aggregate and, therefore, delivers improved performance. FlexVol capacity can also be over-allocated where the set capacity of all the flexible volumes on an aggregate exceeds the total physical space available. Increasing the capacity of one FlexVol volume does not require changing the capacity of another FlexVol volume in the aggregate or changing the capacity of the aggregate itself.

FlexClone Technology FlexClone is a very powerful feature introduced in Data ONTAP 7G that adds a new level of agility and efficiency to storage operations by allowing an individual to create an instant clone of a flexible volume (FlexVol volume). A FlexClone volume is a writable point-in-time image of a FlexVol volume or another FlexClone volume. With FlexClone, it takes only a few seconds to create a clone of a FlexVol volume, and such a volume can be created without interrupting access to the parent volume the clone is based on. The clone volume uses space very efficiently, allowing both the original FlexVol volume and the FlexClone volume to share common data, storing only the data that changes between the original volume and the clone. This provides a huge potential saving in storage space, resources, and cost. In addition, a FlexClone volume has all the features and capabilities of a regular FlexVol volume, including the ability to be grown or shrunk and the ability to be the source of another FlexClone volume.

The technology that makes this all possible is integral to how Data ONTAP manages storage. Data ONTAP uses the WAFL (Write Anywhere File Layout) file system to manage disk storage. New data that gets written to a volume doesn’t have to go on a specific spot on the disk; it can be written anywhere. WAFL then updates the metadata to integrate the newly written data into the right place in the file system. If the new data is meant to replace older data, and the older data is not part of a Snapshot copy, WAFL will mark the blocks containing the old data as “reusable.” This can happen asynchronously and does not affect performance. Snapshot copies work by making a copy of the metadata associated with a volume. Data ONTAP preserves pointers to all the disk blocks currently in use at the time a Snapshot copy is created. When a file is changed, the Snapshot copy still points to the disk blocks where the file existed before it was modified, and changes are written to new disk blocks. As data is changed in the parent FlexVol volume, the original data blocks stay associated with the Snapshot copy rather than getting marked for reuse. All the metadata updates are just pointer changes, and the storage system takes advantage of locality of reference, NVRAM, and RAID technology to keep everything fast and reliable.

You can think of a FlexClone volume as a transparent writable layer in front of a Snapshot copy. A FlexClone volume is writable, so it needs some physical space to store the data that is written to the clone. It uses the same mechanism used by Snapshot copies to get available blocks from the containing aggregate. Whereas a Snapshot copy simply links to existing data that was overwritten in the parent, a FlexClone volume stores the data written to it on disk (using WAFL) and then links to the new data as well. The disk space associated with the Snapshot copy and FlexClone is accounted for separately from the data in the parent FlexVol volume.

When a FlexClone volume is first created, it needs to know the parent FlexVol volume and a Snapshot copy of the parent to use as its base. The Snapshot copy can already exist, or it can get created automatically as part of the cloning process. The FlexClone volume gets a copy of the Snapshot copy metadata and then updates its metadata as the clone volume is created. Figure 5 illustrates how a FlexClone volume looks just after it is created.

9

The syntax for the command to create a FlexClone volume is:

vol clone create [FlexCloneName] –s none –v [FlexVolName] <SnapshotName>

Where

FlexCloneName identifies the name assigned to FlexClone volume being created. • • •

FlexVolName identifies the name assigned of the parent FlexVol volume. SnapshotName identifies the name assigned to a Snapshot copy of a parent FlexVol volume.

Once a FlexClone volume is created, the parent FlexVol volume can change independently of the FlexClone volume because the Snapshot copy is there to keep track of the changes and prevent the original parent’s blocks from being reused while the Snapshot copy exists. Figure 6 illustrates the how FlexVol and FlexClone volumes can grow independently from each other. The Snapshot copy is read-only and can be efficiently reused as the base for multiple FlexClone volumes. Space is used very efficiently, since the only new disk space used is either associated with the small amounts of metadata or updates and/or additions to either the parent FlexVol volume or the FlexClone volume. (FlexClone volumes leverage the Data ONTAP WAFL file system to only store changed blocks.) Initially, a clone and its parent share the same storage; more storage space is consumed only when one volume or the other is changed.

10

Figure 6) FlexVol and FlexClone data addition/update are independent of each other.

For the storage administrator FlexClone is just like any FlexVol volume that has all of the properties and capabilities of a FlexVol volume. Using the CLI, FilerView®, or DataFabric® Manager, one can manage FlexVol volumes, Snapshot copies, and FlexClone volumes—including getting their status and seeing the relationships between the parent, Snapshot copy, and clone. A FlexClone volume has the following restrictions: • The main limitation is that Data ONTAP forbids operations that would destroy the parent FlexVol volume

or base Snapshot copy while dependent FlexClone volumes exist. • Management information in external files (e.g., /etc) associated with the parent FlexVol volume is not

copied. • Quotas for the clone volume get reset rather than added to the parent FlexVol volume. • LUNs in the cloned volume are automatically marked offline until they are uniquely mapped to a host

system.

Lastly, a FlexClone volume can be split from the parent volume to create a fully independent volume. However, splitting a clone requires adequate free space in the aggregate to copy the shared blocks. Splitting the clone to create a fully independent volume also uses resources. While the split is occurring, free blocks in the aggregate are used to copy blocks shared between the parent and the clone. This incurs disk I/O operations and can potentially compete with other disk operations in the aggregate. The copy operation also uses some CPU and memory resources, which may impact the performance of a fully loaded storage appliance. Data ONTAP addresses these potential issues by completing the split operation in the background and sets priorities in a way that does not significantly impact foreground operations. It is also possible to manually stop and restart the split operation if some critical job requires the full resources of the storage system.

Combined with the DB2 HADR, FlexClone can deliver a read/writable copy of the standby database on the DR site that initially consumes no extra space. The cloned database will not have any impact on the performance of the primary database and the HADR relationship remains intact.

1.9. The db2relocatedb Command The db2relocatedb command allows a DBA to change the location of one or more tablespace containers or an entire database, without having to perform a backup and a redirected restore operation. It also provides a way to rename a database and/or change the instance to which a database belongs, per specifications in a configuration file that is provided by the user. When executed, this command makes the necessary changes to the DB2 instance and the appropriate database support files. From the database cloning perspective, it is used

11

to rename the cloned database, change the DB2 instance with which the clone is associated, and change the tablespace container metadata for the tablespace containers that are associated with the clone.

To rename a database clone and update the metadata for its tablespace containers, you would execute the following command on the database server:

db2relocatedb -f [ConfigFile]

Where • ConfigFile identifies the name of a configuration file that contains information that is needed to alter

the DB2-specific metadata stored in files associated with a database.

You need to specify new and old metadata for the clone in a configuration file; the configuration file must adhere to the following format:

DB_NAME=OldName,NewName DB_PATH=OldPath,NewPath INSTANCE=OldInst,NewInst NODENUM=NodeNumber LOG_DIR=OldDirPath,NewDirPath CONT_PATH=OldContPath1,NewContPath1 CONT_PATH=OldContPath2,NewContPath2 ----- STORAGE_PATH= OldStoragePath1,NewStoragePath1 STORAGE_PATH= OldStoragePath2,NewStoragePath2

For example, a simple configuration file might consist of the following lines:

DB_NAME=mydb,mydbcl DB_PATH=/mnt/sdbdata,/mnt/sdbdata_cl INSTANCE=db2inst1,db2instc NODENUM=0 LOG_DIR=/mnt/sdblogs,/mnt/sdblogs_cl STORAGE_PATH =/mnt/sdbdata/*,/mnt/sdbdata_cl/*

It is important to note that if the database has automatic storage enabled, you must specify changes to the location of the database storage paths (using the STORAGE_PATH parameter), not the tablespace containers (using the CONT_PATH parameter).

2. Commonly Used HADR Terms Before we dive into the details of the solution presented here, it is important to become familiar with some common high-availability and disaster recovery terms. 1) Disaster Recovery (DR)

The term disaster recovery is normally used to describe the situation where the entire site, which is serving mission critical data, goes down. To protect against disaster situations such as flood, fire, earthquake, and human acts such as terrorism, a computing environment needs to have a backup arrangement on a secondary site located a significant distance away. The secondary site is known as the disaster recovery or DR site. In the event of disaster, the computing infrastructure at the DR site is used to serve the business needs.

2) High Availability (HA)

High availability is an architecture that maximizes data availability. The HA is a subcategory of the DR. The ultimate disaster-tolerant system is classed as a high-availability (HA) system. HA systems are designed to eliminate application downtime by using redundant hardware and networking components and specialized application and operating system software. An HA system can seamlessly route around failures in the computing infrastructure without affecting end-user access to data. An HA system must be able to: • Process transactions efficiently, without substantial performance degradation (or even loss of availability). • Recover quickly, in case of hardware or software failures or when disaster strikes systems. • Transfer workload from one system to another automatically in the event of a hardware or software failure.

This workload transfer is known as failover capability.

12

3. 3.1.

3) HADR - Primary and Standby Database

HADR replicates database changes from one database to a second database on a remote location. The database where data changes are transferred from is known as the primary database and the second database where data changes are transferred to is known as a standby. In the event of a disaster when the primary database goes down, the standby database takes over the role of the primary database and all application traffic is rerouted to the standby database. The old primary can rejoin the HADR as standby after damage is repaired and it is activated.

4) Failover Failover capability allows automatic transfer of a workload from one system to another in the case of a failure. The workload transfer is transparent to the clients connected to the system. Failover strategies are usually based on clusters of systems. A cluster is a group of connected systems that work together as a single system. Each physical machine within a cluster contains one or more logical nodes. Clustering allows servers to back each other up when failures occur, by picking up the workload of the failed server.

Failover software can use heartbeat monitoring or keep-alive packets between systems to confirm availability. Heartbeat monitoring involves system services that maintain constant communication between all the nodes in a cluster. If a heartbeat is not detected, failover to a backup system starts. End users are usually not aware that a system has failed.

The two most common failover strategies on the market are known as idle standby and mutual takeover, although the configurations associated with these terms might vary depending on the vendor.

• Idle Standby - In this configuration, one system is used to run a DB2 instance, and the second system is “idle”, or in standby mode, ready to take over the instance if there is an operating system or hardware failure involving the first system. Overall system performance is not impacted, because the standby system is idle until needed. DB2 HADR operates in this mode.

• Mutual Takeover - In this configuration, each system is the designated backup for another system. Overall system performance can be impacted, because the backup system must do extra work following a failover: it must do its own work plus the work that was being done by the failed system.

5) Archive Logging

Archive logging is a database feature that enables retention of transaction logs (the retained transaction logs are known as archive logs). Using archive and active logs, a database roll-forward recovery is possible to any point in time before failure occurred, rather than only to the point in time of a full backup. The archived logs can be moved off line and still be used for roll-forward recovery.

6) Application-Coordinated Snapshot Copy

An application-coordinated Snapshot copy is a Snapshot copy that is created manually after suspending all database writes. The write suspend mode guarantees the database consistency; therefore, a database recovery is guaranteed from an application-coordinated Snapshot copy.

Requirements and Assumptions General Assumptions

In order to take maximum benefit of the procedures and steps described in this document, it is assumed that readers are familiar with the following:

• Commands and operations of Data ONTAP • Administration and operations of a DB2 instance and database • DB2 utilities such as db2inidb and db2relocatedb • UNIX® system administration commands • File and block access protocols such as Fibre Channel and iSCSI

It is assumed that the storage systems used are loaded with Data ONTAP 7.0 or later and are licensed for NFS, FCP, iSCSI, FlexClone, and SnapMirror. Additionally, license keys for SnapMirror sync are required for both source and destination storage systems, if synchronous SnapMirror is to be used.

13

It is also assumed that the AIX hosts used to access the primary database and the standby database have the following software and utilities installed and configured:

• DB2 9 Enterprise Server Edition • For a SAN environment, a supported HBA, SanSurfer utility, and host attach kit

Host attach kits can be downloaded from the NOW Web site (now.netapp.com) or from the IBM Web site (www-03.ibm.com/servers/storage/support/allproducts/downloading.html).

3.2. Environment Assumptions This technical document covers building a standby database using SnapMirror technology and creating a read/writable copy of the standby database using FlexClone technology. The database storage containers for the databases used for this test reside on NetApp FAS or IBM N Series storage systems. The sample scripts and steps in this technical report assume the following:

• The primary database’s tablespace containers reside on a storage system named primstore. • The standby database’s tablespace containers reside on a storage system named stndstore. • The database host system used to access the primary database is primhost. • The database host system used to access the standby database is stndhost. • The name of the database on the primary and standby database server is mydb. • The name of the cloned database on standby database server is mydbcl. • The name of the aggregate on the storage systems is dbaggr01 (on both of the storage systems). • The name of the FlexVol volume used to store the primary database’s table data is pdbdata. • The name of the FlexVol volume used to store the primary database’s overhead files is pdb. • The name of the FlexVol volume used to store the primary database’s transaction logs is pdblogs. • The name of the FlexVol volume used to store the primary database’s archive logs is pdbarch. • The name of the FlexVol volume used to store the standby database’s overhead files is sdb • The name of the FlexVol volume used to store the standby database’s table data is sdbdata. • The name of the FlexVol volume used to store the standby database’s transaction logs is sdblogs. • The name of the clone volume that is created from the FlexVol volume named sdb is sdb_cl. • The name of the clone volume that is created from the FlexVol volume named sdbdata is sdbdata_cl. • The name of the clone volume that is created from the FlexVol volume named sdblogs is sdblogs_cl. • The mount point names used to mount FlexVol volumes for the primary and standby databases are

/mnt/mydb, /mnt/dbdata and /mnt/dblogs. • The mount point names used to mount the cloned volumes are /mnt/mydb_cl, /mnt/dbdata_cl and

/mnt/dblogs_cl.

The scripts contained in this document may require significant modifications to run under your version of UNIX.

3.3. Security and Access Issues You need to make sure that each FlexVol volume used for the DB2 database’s data and transaction logs has its security style set to UNIX. The security style can be updated by executing the following command on the storage system:

qtree security [FlexVolPath] unix

Where • FlexVolPath identifies the flexible volume path on the storage system that is used for the database.

For example, to update the security style of a FlexVol volume named pdbdata that resides on storage system named primstore, you would execute the following command:

rsh primstore qtree security /vol/pdbdata unix

Repeat this step and change the security style for all the volumes to be used for the database.

3.4. HADR Setup Requirements

http://now.netapp.com/

http://www-03.ibm.com/servers/storage/support/allproducts/downloading.html

14

4. 4.1.

The basic requirements for HADR setup are:

• The primary and standby databases must have the same database name and the instance name may or may not be the same.

• Since buffer pool operations are also replayed on the standby database, it is important that the primary and standby databases have the same amount of memory.

• Tablespace must be identical on the primary and standby databases. Properties that must be identical include the tablespace type (DMS or SMS), tablespace size, container path, container size, and container file type.

• As much as possible, the configuration parameters for the database manager and database should be identical for the primary and standby databases.

Infrastructure Architecture – Build a Standby

There are many ways to build a standby database for an HADR environment. This document describes use of the Network Appliance™ SnapMirror technology to put together a standby database. As stated earlier, the SnapMirror feature is available on NetApp FAS or IBM N Series storage systems running a supported version of Data ONTAP. Figure 7 illustrates the basic architecture to build a standby database in an HADR environment using SnapMirror technology.

Figure 7) HADR basic architecture to build a standby database using SnapMirror technology.

4.2. Network Architecture In order to produce this technical document, we used two AIX database servers that had access to two storage system over LAN. Figure 8 illustrates the basic network architecture used for our HADR test environment.

15

Figure 8) Network architecture for a basic HADR environment.

4.3. Create a Read/Write Copy of the Standby Database DB2 HADR is based on active-passive architecture. The standby database can’t be accessed until the take-over operation is performed and its role is switched to primary. Customers who use NetApp FAS or IBM N Series storage systems for database storage can take advantage of FlexClone technology and create a clone of the standby database without impacting the HADR relationship. The standby database clone can be created in a matter of seconds without any additional storage space requirement. Figure 9 illustrates how a clone database looks just after creation.

Figure 9) A cloned database just after its creation.

5. 5.1.

Configuration Configure Storage System

16

1)

2)

Update /etc/hosts file.

The storage systems must be able to communicate with the database server and vice versa. A storage system can communicate to a database server if there is an entry in its /etc/hosts file for the database server or alternatively, if it uses some other host name resolution techniques like NIS or DNS. By default, the /etc/hosts file is checked first for host name resolution. The easiest way to update the /etc/hosts file on the storage system is by using FilerView. Entries made in the /etc/hosts file should look similar to the following one:

[HostIP] [HostName]

Where • HostIP identifies the IP address assigned to the database server. • HostName identifies the name assigned to the database server.

For example, to add an entry for a database server named primhost and that has IP address 172.17.32.112, you would add the following line to the /etc/hosts file on the storage system:

172.17.32.112 primhost

Repeat this step on the database servers and storage systems to update all appropriate /etc/hosts files.

Enable ‘rsh’ access for the database servers.

In order to use the ‘rsh’ (remote shell) command from the database server, you need to perform two steps. First, enable the ‘rsh’ option on the storage system by executing the following command:

options rsh.enable on

Then, add the database host and user name entry to the /etc/hosts.equiv file found on the storage system. The entry to this file looks somewhat similar to the following:

[HostName] [UserName]

Where • HostName identifies the name assigned to the database server. • UserName identifies the user name who needs rsh access to the storage system.

For example, to allow ‘rsh’ command execution from a database server named primhost for a user named db2inst1, you would add the following line to the /etc/hosts.equiv file on the storage system:

primhost db2inst1

5.2.

1)

•

SnapMirror Configuring SnapMirror for HADR purposes is very straightforward, and it can be done by completing the following simple steps:

Apply license key for SnapMirror.

SnapMirror is a licensed product; therefore, you need to apply the appropriate license key for SnapMirror on the storage systems used in your HADR environment. You can add license keys by executing the following command on a storage system:

license add [KeyCode]

Where KeyCode identifies the license key required for SnapMirror.

17

For example, to apply a license key code ABCQW1234, you would execute the following command on the storage system:

license add ABCQW1234

2)

•

Allow access to SnapMirror destination storage.

In order to start SnapMirror replication the SnapMirror destination storage system must have access to the SnapMirror source storage system. You can grant access by updating an option named snapmirror.access. The default setting for the option snapmirror.access is legacy but you can change this by executing the following command on the source storage system:

options snapmirror.access host=[StorageSystemName]

Where StorageSystemName identifies the name assigned to the SnapMirror destination storage system.

For example, you would execute the following command on the SnapMirror source storage system to allow SnapMirror access to a storage system named stndstore:

options snapmirror.access host=stndstore

3)

4)

Identify the source and target volumes.

Identify the volumes that are used for the primary and standby databases to store data and transaction logs. The volumes that are used for the primary database are going to be the source and the volumes that are used for the standby database are going to be the targets for SnapMirror.

Enable SnapMirror.

Finally, the SnapMirror feature must be enabled on the both source and the destination storage systems by executing the following command:

options snapmirror.enable on

6. Installing DB2 9 Install DB2 9 on the primary and standby database servers by following the steps described in the “Quick Beginnings for DB2 Servers” guide. This guide can be downloaded from the IBM Web site. If you already have working DB2 instances, you can skip this step.

For our test environment, we installed DB2 9 for AIX on two AIX servers named primhost and stndhost and configured primhost for the primary database and stndhost for the standby database.

7. Create a DB2 Database In order to create a DB2 database on a NetApp FAS or IBM N-Series storage system, you need to perform a few configuration steps on the database server and the storage system. For detailed configuration steps, refer to the technical report “IBM DB2 UDB Enterprise Server Edition V8 for UNIX: Integrating with a NetApp Storage System” (www.netapp.com/library/tr/3272.pdf). After completing the appropriate configuration steps on the database server and the storage system used, you will need to create the following storage objects:

• An aggregate named dbaggr01 (create one aggregate on each storage systems used) • Flexible volumes named pdbdata,pdb, pdblogs, and pdbarch within the aggregate dbaggr01 on

the storage system that is used by the primary database • Flexible volumes named sdbdata,sdb, and sdblogs within the aggregate dbaggr01 on the storage

system that is used by standby database


http://www.netapp.com/library/tr/3272.pdf

18

For SAN environments, you will need to perform the following additional steps:

• Create a LUN named /vol/pdb/pdb within the FlexVol volume pdb. • Create a LUN named /vol/pdbdata/data within the FlexVol volume pdbdata. • Create a LUN named /vol/pdblogs/logs within the FlexVol volume pdblogs. • Create a LUN named /vol/sdb/sdb within the FlexVol volume sdb. • Create a LUN named /vol/sdbdata/data within the FlexVol volume sdbdata. • Create a LUN named /vol/sdblogs/logs within the FlexVol volume sdblogs. • Create an igroup named primhost_fcp_igp for the database server primhost. • Create an igroup named stndhost_fcp_igp for the database server stndhost. • On the storage system named primstore, create mappings for the LUNs named /vol/pdb/pdb,

/vol/pdbdata/data, and /vol/pdblogs/logs to the igroup named primhost_fcp_igp, using ID 0,1 and 2 respectively.

• On the storage system named stndstore, create mappings for the LUNs named /vol/sdb/sdb /vol/sdbdata/data, and /vol/sdbdata/logs to the igroup named stndhost_fcp_igp, using ID 0 ,1 and 2 respectively.

After you have created storage space on the storage system, you need to mount the FlexVol volumes or LUNs to the database server. For detailed information on integrating NetApp FAS or IBM N Series storage system, refer to the technical paper “IBM DB2 UDB Enterprise Server Edition V8 for UNIX: Integrating with a NetApp Storage System” (www.netapp.com/library/tr/3272.pdf) on the NetApp Web site. After completing above described steps, the storage system should be ready to receive a DB2 database, which you can create by completing the following steps:

1) Create a DB2 instance.

If an instance does not already exist, log in as the user root on the database server and create a DB2 instance by executing the following command:

[DB2Dir]/instance/db2icrt –u [FencedID] [InstanceName]

Where • DB2Dir identifies the directory where the DB2 software was installed:

– On AIX, HP-UX, and Solaris™ operating systems, the default DB2 9 installation directory is /opt/IBM/db2/V9.

– On Linux® operating systems, the default installation directory is /opt/ibm/db2/V9. • FencedID identifies the ID of the user under which fenced user-defined functions and fenced stored

procedures will run. • InstanceName identifies the name that is to be assigned to the new instance.

For example, to create a database instance named db2inst1, you would execute the following command on the database server:

/opt/IBM/db2/V9/instance/db2icrt –u db2inst1 db2inst1

2) Create a database.

DB2 9 allows you to create database and the default tablespaces on separate locations. With automatic storage enabled, you can specify different paths for the database and the default tablespaces. In order to do so, you need log in as the database instance owner and execute the following command:

db2 “create database [DBName] automatic storage YES on [Mount1] dbpath [Mount2]”

Where • DBName identifies the name that is to be assigned to the new database once it has been created. • Mount1 identifies the drive or path where the default tablespaces for the new database is to be created. • Mount2 identifies the drive or paths where new database is to be created.


19

For example, to create a database named mydb on a path named /mnt/mydb and default tablespaces on another path named /mnt/dbdata, you would execute the following command on the database server:

db2 “create database mydb automatic storage yes on /mnt/dbdata dbpath /mnt/mydb”

In the example illustrated above, /mnt/data and /mnt/mydb are the mount points for the volumes on the storage system. You can specify multiple paths or drives for tablespace storage locations in the create db command.

3) Change the transaction log file path

As a good practice, transaction logs and data files should be stored on separate disks. In fact, when a DB2 database is stored on a NetApp FAS or IBM N Series storage system volume, the transaction log files for the database should be stored on a separate volume. In order to change the location for transaction log files, you must execute the following command on the database server:

db2 update db cfg for [DBName] using NEWLOGPATH [NewLogLocation]

Where • DBName identifies the name assigned to the database whose log file storage location is to be changed. • NewLogLocation identifies the new location where the database’s transaction log files are to be stored.

For example, to change the log directory from the default location to a directory named /mnt/dblogs, you would execute the following command on the database server:

db2 update db cfg for mydb using NEWLOGPATH /mnt/dblogs

Note: The new log path setting will not become effective until all the users are disconnected and the database is deactivated. When the first connection is made after reactivating the database, the database manager will move the transaction log files to the new location.

Note: In order to provide higher level of protection for your database transaction log files, it is recommended that you mirror log files to another drive. In case your primary data file get corrupted or lost you can use mirror log files to recover the database. The transaction log files can be mirrored by configuring MIRRORLOGPATH database configuration parameter. If database has MIRRORLOGPATH configured, DB2 will create active log files in both the log path and the mirror log path. All log data will be written to both paths. You can set MIRRORLOGPATH configuration parameter by executing the following command:

db2 update db cfg for [DBName] using MIRRORLOGPATH [MirrorLocation]

Where • DBName identifies the name assigned to the database whose log file mirror location is to be configured. • MirrorLocation identifies the new location where the transaction log files are to be mirrored.

For example, to mirror database logs to a directory named /mnt/logmrr, you would execute the following command on the database server:

db2 update db cfg for mydb using MIRRORLOGPATH /mnt/logmrr

In NetApp FAS or IBM N Series storage environment, you need to create additional volumes on the storage system for storing the mirrored transaction log files.

4) Change the logging method from circular to archive and make other configuration changes.

By default, DB2 uses a circular logging mechanism for the database, which doesn’t support roll-forward recovery. However, most production databases require roll-forward capability. Archive logging mode supports roll-forward recovery. In order to switch a DB2 9 database’s logging mode from circular to archive, you need to update the primary log archive method (LOGARCHMETH1) database configuration parameter. This configuration parameter can be updated by executing the following command on the database server:

db2 update db cfg for [DBName] using LOGARCHMETH1 DISK:[ArchiveDir]

20

Where • DBName identifies the name assigned to the database whose logging method is to be changed. • ArchiveDir identifies the directory (location) where archived transaction log files are to be stored.

For example, to switch logging mode from circular to archive for a DB2 9 database named mydb and place the archive log files in a directory named /mnt/dbarch/mydb, you would execute the following command:

db2 update db cfg for mydb using LOGARCHMETH1 DISK:/mnt/dbarch/mydb

If you want to retain a second copy of archive log files on another disk, you need to update the secondary log archive method (LOGARCHMETH2) database configuration parameter. This configuration parameter can be updated by executing the following command:

db2 update db cfg for [DBName] using LOGARCHMETH2 DISK:[ArchiveDir]

Where • DBName identifies the name assigned to the database for which duplex logging is to be enabled. • ArchiveDir identifies the directory (location) where the second copy of the archived transaction log files

is to be stored.

For example, to store and maintain a second set of archive logs for a DB2 database named mydb in a directory named /mnt/dbarch2/mydb, you would execute the following command on the database server:

db2 update db cfg for mydb using LOGARCHMETH2 DISK:/mnt/dbarch2/mydb

As a best practice for HADR database, you should update two additional parameters named INDEXREC and LOGINDEXBUILD by executing the following command on the database server:

db2 update db cfg for mydb using INDEXREC access db2 update db cfg for mydb using LOGINDEXBUILD on

For further information on the INDEXREC and LOGINDEXBUILD parameters refer to section 11 of this document.

5) Back up the primary database.

After the logging method is changed from circular to archival, the database is placed into backup-pending state and can no longer be used until a full offline backup copy of the database has been created. An offline backup copy of the database can be created by executing the following commands on the database server:

db2 force application all db2 backup database [DBName] to [BackupDir]

Where • DBName identifies the name assigned to the database that is to be backed up. • BackupDir identifies the directory (location) where backup images are to be stored.

For example, to create an offline backup copy of the database named mydb, you would execute the following command on the database server:

db2 force application all db2 backup database mydb to /dbbackup/mydb

After completing these steps, the primary database is ready for use. You can now begin to creating database objects and start using the database.

21

8.

1)

Build the Standby Database Using SnapMirror In an HADR configuration, the standby database plays the role of hot standby. All the log changes generated at the primary database are shipped to the standby database; the standby database keeps processing the log records it receives from the primary to stay in synchronous state with the primary.

In order to build a standby database for an HADR configuration, you need to reconstruct the exact copy of the primary database on a DR site. Customers have the following options:

– Restore the primary database from a backup copy on the DR site. – Use split mirror functionality combined with the DB2 initialization tool named db2inidb. – Use SnapMirror to replicate the primary database’s data and transaction log files to the DR site. In order

to take advantage of this option you need to use NetApp FAS or IBM N Series storage systems for database storage.

This document describes how to use SnapMirror to build the standby database. To learn more about other methods that can be used to build a standby database, refer to DB2 documentation on the IBM Web site.

To build a standby database using SnapMirror technology, you need to complete the following steps:

Temporarily suspend writes to the primary database.

The SnapMirror data replication process is based on Snapshot technology. Therefore, you need to make sure that at the time the Snapshot copies are created the database is in a consistent state. To place the primary database in a consistent state, you need to temporarily suspend all disk write operations to the primary database. Database write operations can be suspended by executing the following command on the database server:

db2 set write suspend for database

The SET WRITE SUSPEND FOR DATABASE command causes the DB2 database manager to suspend all write operations to tablespace containers and log files that are associated with the current database. Read-only transactions continue uninterrupted. Users can establish new connection to the database. If the new connection require flushing of dirty buffer pool pages the connection may appear hanged. The Snapshot creation process can be completed very quickly, so the database doesn’t have to stay in write suspend mode for more than a few seconds.

2) Replicate database’s data and transaction logs to DR site.

After database I/O has been suspended, you need to initialize the SnapMirror process for each volume that is used by the primary database. The SnapMirror initialization process creates a Snapshot copy of the source volumes and transfers it to the destination volume. After the Snapshot copy transfer is completed, SnapMirror starts transferring data in the background. The source volume can continue normal operation without any kind of interruption. You can initialize the SnapMirror process by executing the following command on the SnapMirror destination storage system:

snapmirror initialize –S [SourceStorageSystem]:[SourceVolumeName] [StandbyStorageSystem]:[DestinationVolumeName] Where • SourceStorageSystem identifies the name assigned to the storage system that is used by the primary

database. • StandbyStorageSystem identifies the name assigned to the storage system that is used by the

standby database. • SourceVolumeName identifies the name assigned to the volume that is used by the primary database. • DestinationVolumeName identifies the name assigned to the volume that is used by the standby

database.

For example, to initialize the one-time data transfer for the volume named pdbdata that resides a storage system named primstore to a volume named sdbdata that resides on a storage system named stndstore, you would execute the following command on the standby database server:

22

rsh stndstore snapmirror initialize –S primstore:pdbdata stndstore:sdbdata

In the above example, stndstore is name of the storage system where FlexVol volumes for the standby database reside.

Repeat this step for each FlexVol volume that is used for the primary database’s data and transaction logs.

3) Resume writes for the primary database.

After initializing SnapMirror process for each FlexVol volume, connect to the primary database and resume I/O by executing the following command on the database server:

db2 set write resume for database

4) Check the SnapMirror status

The SnapMirror process keeps replicating data in the background. You may to keep checking the status of the transfer by executing the following command on the standby database server:

rsh stndstore snapmirror status

The output from the above command should look similar to the following: Source Destination State Lag Status primstore:pdbarch stndstore:sdbarch Snapmirrored 0:01:51 Idle primstore:pdbdata stndstore:sdbdata Snapmirrored 0:00:51 Idle primstore:pdblogs stndstore:sdblogs Snapmirrored 0:01:11 Idle

In the above output, the state snapmirrored indicates that data replication for that particular relationship has been completed.

5) Break SnapMirror relationship.

SnapMirror destination volumes are restricted and not available for writes. In order to make these volumes available for write operations, you need to break the SnapMirror relationship after SnapMirror has completed the transfer of data. You can break a SnapMirror relationship by executing the following command on the SnapMirror destination storage system: snapmirror break [StandbyStorageSystem]: [DestinationVolumeName]

Where • StandbyStorageSystem identifies the name assigned to the storage system that is used for the

standby database. • DestinationVolumeName identifies the name assigned to a volume that is used by the standby

database.

For example, to break the SnapMirror relationship for a volume named sdbdata that resides on the destination storage system, you would execute the following command on the standby database server:

rsh stndstore snapmirror break stndstore:sdbdata

Repeat this step for each volume that is used for the primary database’s data or transaction logs.

6)

7)

Mount the volumes on the standby database server.

Create mount points on the standby database server and mount the volumes that have data created with SnapMirror. The mount points used for mounting the volumes must have the same name as the mount points for volumes on the primary database server. If mount point names are different, then you have to use the db2reloactedb command to relocate the tablespace containers.

Change the ownership of the file system.

23

In order to operate DB2 successfully, the DB2 instance owner should have ownership of the file systems on the FlexVol volume that is mounted on the database server. The ownership can be changed by executing the following command on the database server:

chown –R [InstanceOwner]:[InstanceOwnerGroup] [FileSystem]

Where • InstanceOwner identifies the name assigned to the user who owns the database instance. • InstanceOwnerGroup identifies the name assigned to the user’s group that owns the database

instance. • FileSystem identifies the name of the files system whose ownership is changed.

For example, to change ownership of the file system mounted on the mountpoint named /mnt/dbdata, you would execute the following command on the second database server:

chown –R db2inst1:db2adm /mnt/dbdata

8) Catalog the standby database.

Next, you need to catalog the standby database by executing the following command on the database server:

db2 “catalog database [DBName] as [DatabaseAlias] on [FileSystem]”

Where • DBName identifies the name assigned to the database that is being cataloged. • DatabaseAlias identifies the alias name assigned to the database that is being cataloged. • FileSystem specifies the path on which the database being cataloged resides.

For example, to catalog a standby database named mydb that resides on the file system named /mnt/dbdata, you would execute the following command on the database server:

db2 “catalog database mydb as mydb on /mnt/mydb”

9) Verify the database. After performing the above steps, you need to check the mirrored database for architectural correctness. You can do so by executing the following command on the database server:

db2dart [DBName] /db

Where • DBName identifies the name of the mirrored database used for the test environment.

For example, to test the mirrored database named mydb, you would execute the following command on the database server:

db2dart mydb /db

The db2dart utility inspects the entire database for architectural correctness and generates a detailed report. The report is generated in the <$HOME>/sqllib/db2dump/DART0000/ directory. The report has a summary at the end of the report. You can read the summary and find out the errors, if any.

9.

9.1. 1)

Configure HADR In order to set up HADR, you need to configure both the primary and standby database servers. The following section describes the configuration steps on the each database server.

Configuring the Standby Database Server Update the /etc/services file.

24

2)

• •

•

Log in as the root user and add the following lines to the /etc/services file on the standby database server:

DB2_HADR_1 55001/tcp DB2_HADR_2 55002/tcp

Enable the automatic client reroute feature.

To enable the automatic client reroute feature, execute the following command on the standby database server:

db2 update alternate server for database [DBName] using hostname [PrimaryHostName] port [PortNumber];

Where DBName identifies the name assigned to the HADR database. PrimaryHostName identifies the name assigned to the database server that is being used to access the primary database. PortNumber identifies the port number that has been assigned to the database instance service on the primary database server.

For example, to enable automatic client reroute for a standby database named mydb on a database server stndhost to a database mydb on a database server named primhost, you would execute the following command:

db2 update alternate server for database mydb using hostname primhost port 60000;

3) Update HADR configuration parameters.

Next, you need to update the standby database’s HADR configuration parameters by executing the following commands on the standby database server:

db2 update db cfg for mydb using HADR_LOCAL_HOST stndhost db2 update db cfg for mydb using HADR_REMOTE_HOST primhost db2 update db cfg for mydb using HADR_LOCAL_SVC DB2_HADR_2 db2 update db cfg for mydb using HADR_REMOTE_SVC DB2_HADR_1 db2 update db cfg for mydb using HADR_REMOTE_INST db2inst1 db2 update db cfg for mydb using HADR_SYNCMODE NEARSYNC db2 update db cfg for mydb using HADR_TIMEOUT 120

4) Stop the DB2 database.

The HADR configuration parameter changes don’t take effect until the database is stopped and restarted. You need execute the following command on the database server to stop the database:

db2 force application all db2 deactivate db mydb

5) Place the standby database in roll-forward pending state.

In order to start HADR, the standby database must be in roll-forward pending state. Execute the following command to place the standby database in roll-forward pending state:

db2rfpen on mydb

6) Start HADR on the standby database.

Next, you need to start HADR on the standby database. This can be done by executing the following command on the standby database server:

25

db2 start hadr on database mydb as standby

9.2. 1)

2)

• •

•

Configuring the Primary Database Server Update the /etc/services file.

Log in as the root user and add the following lines to the /etc/services file on the primary database server:

DB2_HADR_1 55001/tcp DB2_HADR_2 55002/tcp

Enable the automatic client reroute feature.

To enable the automatic reroute feature, you need to execute the following command on the standby database server:

db2 update alternate server for database [DBName] using hostname [PrimaryHostName] port [PortNumber];

Where DBName identifies the name assigned to an HADR database. PrimaryHostName identifies the name assigned to the database server that is being used to access the standby database. PortNumber identifies a port number assigned to the database instance service on the standby database server.

For example, to enable automatic client reroute on the primary database server named primhost to a database server named stndhost, you would execute the following command:

db2 update alternate server for database mydb using hostname stndhost port 60000;

3) Update HADR configuration parameters.

You need to update the primary database’s HADR configuration parameters by executing the following commands on the primary database server:

db2 update db cfg for mydb using HADR_LOCAL_HOST primhost db2 update db cfg for mydb using HADR_REMOTE_HOST stndhost db2 update db cfg for mydb using HADR_LOCAL_SVC DB2_HADR_1 db2 update db cfg for mydb using HADR_REMOTE_SVC DB2_HADR_2 db2 update db cfg for mydb using HADR_REMOTE_INST db2inst1 db2 update db cfg for mydb using HADR_SYNCMODE NEARSYNC db2 update db cfg for mydb using HADR_TIMEOUT 120

4) Stop the DB2 database.

HADR configuration parameter changes don’t take effect until the DB2 database is stopped and restarted. This can be done by executing the following commands on the primary database server:

db2 force application all db2 deactivate db mydb

5) Start HADR on the primary database.

Next, you need to start HADR on the primary database by executing the following command:

db2 start hadr on database mydb as primary

26

10.

10.1.

1)

Create a Writable Copy of the Standby Database Select a Database Server from Which to Access the Cloned Database

You need to select a database server that will be used to access the cloned database. You have the following choices:

The database server that is used to access the standby database

In this choice, you can use an existing DB2 9 instance or create a new one. If you are going to use the existing DB2 instance, you can skip the following steps. However, if a new DB2 instance is desired, you will need to complete the following steps:

(i). Switch the authority to the user root and create a user named db2inst2 on the database server by executing the following command:

useradd –c "DB2 clone db instance owner" -u 710 –g db2adm -G db2adm db2inst2 –p db2inst2

The new user will own the DB2 instance that will be used to access the cloned database.

(ii). Next, create a new DB2 instance using the same DB2 command used to create the instance used for the standby database. To do that, you need to execute the following command on the database server:

[DB2InstallationPath]/instance/db2icrt –u [FencedUser] [InstanceName]

Where • DB2InstallationPath identifies the directory where the DB2 9 code was installed. • FencedUser identifies the ID of the user under which fenced user-defined functions and fenced

stored procedures will run. • InstanceName identifies the name that is to be assigned to the new instance.

For example, if the DB2 9 software was installed in the /opt/IBM/db2/V9 directory, you would execute the following command on the database server to create a new DB2 instance named db2inst2:

/opt/IBM/db2/V9/instance/db2icrt –u db2inst2 db2inst2

(iii). Verify the instance was created successfully by executing the following command on the database server:

/opt/IBM/db2/V9/instance/db2ilist –a

The output from the above command should look similar to the following:

db2inst1 32 /opt/IBM/db2/V9 db2inst2 32 /opt/IBM/db2/V9

2)

3)

A database server other than the standby database server, which has a version of DB2 installed on it that is different from the standby database’s DB2 version

In this case, you need to install the same DB2 9 software as the standby database on the database server and create a database instance as described in section 10.1 choice 1. The new DB2 instance you created can have the same name as the standby database instance provided there is no other instance with the same name on this server.

A database server, other than the standby database server, that doesn’t have DB2 software installation on it

27

10.2.

1)

In this case, you need to install the same DB2 9 code on the database server as the standby database and create a database instance as described in section 10.1 choice 1. The new instance name can be the same as the production DB2 instance.

Clone a Standby Database To clone a standby database you will need to complete the following steps:

Ensure the standby database is in consistent state.

The true state of the HADR pair is maintained on the primary system. As such, the only way to ensure that the primary and standby are in sync is to check the status of the pair from the primary system. Issue the following on the primary system snap_get_hadr mydb

Ensure that the HADR_STATE is “PEER” and the HADR_CONNECT_STATUS is “CONNECTED”

DBNAME HADR_ROLE HADR_STATE HADR_SYNCMODE HADR_CONNECT_STATUS -------- --------- -------------- ------------- ------------------- mydb PRIMARY PEER SYNC CONNECTED

One it has been confirmed that the HADR pair are in sync then the transmission of the log files to the standby must be suspended to ensure there is no physical I/O in progress while the Snapshot is being created.

The most efficient way to suspend the physical I/O on the standby is to issue a set write suspend on the primary system, see section 8 Build the Standby Database Using SnapMirror for details.

2) Create Snapshot copies of the database FlexVol volumes. Next, create a Snapshot copy of each FlexVol volume that is used by the standby database. A Snapshot copy can be created by executing the following command on the storage system:

snap create –V [VolName] [SnapName]

Where • VolName identifies the name assigned to the FlexVol volume the Snapshot copy is created for. • SnapName identifies the name that is to be assigned to the Snapshot copy.

For example, to create a Snapshot copy named sdbdata_snap01 for a FlexVol volume named sdbdata, you would execute the following command from the standby database server:

rsh stndstore snap create –V sdbdata sdbdata_cl_snp01

It is recommended that you develop a naming convention and assign a meaningful name to the Snapshot copies that are created for cloning purpose.

3) Resume the I/O on the primary system

Once Snapshot copies are created, you need to resume write operations on the primary database by executing the following command on the database server:

db2 set write resume for database

4) Clone the FlexVol volumes. Next, create a clone of each FlexVol volume using the Snapshot copies created in section 10.2 step (2). A clone of a FlexVol volume can be created by executing the following command:

vol clone create [CloneVol] –s [volume|file|none] –b [ParentVol] <ParentSnap>

28

Where • CloneVol identifies the name to be assigned to the new clone volume. • ParentVol identifies the name assigned to the parent FlexVol volume. • ParentSnap identifies the name assigned to the Snapshot copy that is to be used as the base for

the clone volume.

For example, to create a clone volume of a FlexVol volume named sdbdata using the Snapshot copy named sdbdata_cl_snp.01, you would execute the following command from the standby database server:

rsh stndstore vol clone create sdbdata_cl –s none –b sdbdata sdbdata_cl_snp01

As long the clone exists, Data ONTAP don’t permit any operation that destroys the parent FlexVol volume or the Snapshot copy that is used as the base for the clone volume.

Note: A Snapshot copy is not required to create a clone of a FlexVol volume. If you don’t explicitly create a Snapshot copy and specify it when executing the vol clone command, a Snapshot copy will be implicitly created and used for the clone volume. A Snapshot copy created implicitly will have a system assigned name. We recommend explicitly creating a Snapshot copy and assigning it a meaningful name before using it to create a clone volume.

For example, to create a clone volume of a FlexVol volume named sdbdata without specifying a Snapshot copy name, you would execute the following rsh command from the standby database server:

rsh stndstore vol clone create sdbdata_cl –s none –b sdbdata

Important: The –s option of the vol clone create command is used to define space reservations for the cloned volume. Valid values for this option are volume, file, and none. If you specify the value volume, Data ONTAP will reserve the same amount of space for the cloned volume as the source FlexVol volume uses. The examples covered in this paper use value none, which means space is not guaranteed on the clone volumes. If there is no space available on the containing aggregate, subsequent writes to the clone volume will fail.

5) Create an export entry for the clone volume. In order to mount a clone volume on the database server, you need to create an export entry for it in the /etc/exports file that resides on storage system. An export entry can be created by executing the following command:

exportfs –p rw=[HostName],root=[HostName] [PathName]

Where • HostName identifies the name assigned to the database server. • PathName identifies the path name assigned for the flexible volume.

For example, to create an export entry for a clone volume named sdbdata_cl and allow the root user’s access privileges from a database server named stndhost, you would execute the following command from the standby database server:

rsh stndstore exportfs -p rw=stndhost,root=stndhost /vol/sdbdata_cl

Repeat this step to create an export entry for each cloned volume that is used by the cloned database.

6) Mount the cloned volumes. The clone database can be accessed from the same database server that is used to access the standby database, or from a completely different server. The scenarios described in this paper were produced using the same database server to access the clone database that was used to access the standby database.

29

In order to access the clone database, you need to mount the FlexClone volumes on the standby database server. First, you need to create a mountpoint for each cloned volume and append a mount entry to the /etc/filesystems file. The mount entry should specify the mount options, and it should look similar to the following:

[MountPoint]: dev = [StorageSystemVolume] mount = true vfs = nfs nodename = [StorageSystemName] options = bg,nointr,rw type = nfs_mount account = false

Where • StorageSystemVolume identifies the name assigned to the cloned volume. • StorageSystemName identifies the name assigned to the storage system that is used for the

standby database storage. • MountPoint identifies the name assigned to the location that is used to mount the cloned volume on

the database server.

For example, to add an entry for a cloned volume named sdbdata_cl that is to be mounted on a mountpoint named /mnt/dbdata_cl, you should append the following lines to the /etc/filesystem file on the database server:

/mnt/dbdata_cl: dev = /vol/sdbdata_cl mount = true vfs = nfs nodename = stndstore options= cio,rw,bg,hard,intr,proto=tcp,vers=3,rsize=32768,wsize=32768,timeo=600 type = nfs_mount account = false

After appending the mountpoint entry, you can mount the FlexClone volume by executing the following command on the database server:

mount [MountPoint] Where • MountPoint identifies the name assigned to the location that is to be used to mount the cloned

volume on the database server.

For example, to mount a cloned volume that has a mount entry specified in the /etc/filesystems file, you would execute the following command on the standby database server:

mount /mnt/dbdata_cl

Both database servers we used for our HADR testing had AIX 5.3 operating system. For information on mounting a FlexVol volume or FlexClone volume on a database server with a different operating system, refer to the technical report IBM DB2 UDB Enterprise Server Edition V8 for UNIX: Integrating with a NetApp storage system (www.netapp.com/library/tr/3272.pdf).

In order to operate DB2 successfully, the DB2 instance owner should have ownership of the file systems on the clone volume that is mounted on the database server. Ownership can be changed by executing the following command on the database server:

chown –R [InstanceOwner]:[InstanceOwnerGroup] [FileSystem]

Where • InstanceOwner identifies the name assigned to the user who owns the database instance.


30

• InstanceOwnerGroup identifies the name assigned to the user’s group that owns the database instance.

• FileSystem identifies the name of the file system whose ownership is changed.

For example, to change ownership of the file system mounted on the mountpoint named /mnt/dbdata_cl, you would execute the following command on the second database server:

chown –R db2inst1:db2adm /mnt/dbdata_cl

7) Configure the cloned database. The clone volumes created in step (4) and mount in step (6) of this section are going to be used as the storage containers for the cloned database. If you are going to use the same database instance as the standby database, you need to change the database name and tablespace container header information using the db2relocatedb command as specified in section 1.4. To do this, you will need to create a configuration file that identifies the source database information and specifies the corresponding new clone database information. A sample configuration file that does this should look something like this:

DB_NAME=mydb,mydbcl DB_PATH=/mnt/mydb,/mnt/mydb_cl INSTANCE=db2inst1,db2inst1 NODENUM=0 LOG_DIR=/mnt/sdblogs,/mnt/sdblogs_cl STORAGE_PATH=/mnt/sdbdata/*,/mnt/sdbdata_cl/*

In this configuration file, the source database name is mydb, and it has its logs on /mnt/sdblogs and data on /mnt/sdbdata. The cloned database is to be renamed to mydbcl; it has its data on /mnt/sdbdata_cl and transaction logs on /mnt/sdblogs_cl.

Save the configuration file as /home/db2inst1/dbrelocate.cfg and execute the following command on the database server:

db2relocatedb -f /home/db2inst1/dbrelocate.cfg

8) Recatalog the standby database. On execution, the db2relocatedb command uncatalogs the standby database. Therefore, you need to recatalog the standby database by executing the following command on the database server:

db2 “catalog database [DBName] as [DatabaseAlias] on [FileSystem]”

Where • DBName identifies the name assigned to the database that is being cataloged. • DatabaseAlias identifies the alias name assigned to the database that is being cataloged. • FileSystem specifies the path on which the database being cataloged resides.

For example, to recatalog a source database named mydb that resides on file system named /mnt/dbdata, you would execute the following command on the database server:

db2 “catalog database mydb as mydb on /mnt/mydb”

9) Stop HADR and perform roll forward recovery for the Cloned database

The standby database was in the HADR peer state when the clone database was created, therefore you need to stop HADR and perform roll-forward recovery on the cloned database before you can connect and start using it.

In order to stop HADR on the clone database, you need to execute the following command:

db2 stop HADR on database[DBName]

31

Where • DBName identifies the name assigned to the cloned database.

For example, to stop HADR for the cloned database named mydbcl, you would execute the following command on the database server:

db2 stop HADR on database mydbcl

After stopping HADR, you need insure database consistency by performing roll-forward recovery. Roll-forward recovery can be performed by executing the following command

db2 rollforward database [DBName] stop

Where • DBName identifies the name assigned to the cloned database.

For example, to perform roll-forward recovery for the cloned database named mydbcl, you would execute the following command on the database server:

db2 rollforward database mydbcl stop

10) Verify the database is operational. After performing the above steps, you need to check the entire clone database for architectural correctness. You can do so by executing the following command on the database server:

db2dart [DBName] /db

Where • DBName identifies the name assigned to the clone database.

For example, to verify that the cloned database named mydbcl is functional, you would execute the following command on the database server:

db2dart mydbcl /db

The db2dart utility inspects the entire database for architectural correctness and generates a detailed report. This report is written to the <$HOME>/sqllib/db2dump/DART0000/ directory and has a summary at the end. You can read the summary to determine if any errors were found.

11. Recommendations 1. To the extent possible, the database configuration parameters and database manager configuration

parameters should be identical on the systems where the primary and standby databases reside. If the configuration parameters are not properly set on the standby database the following problems might occur:

• Error messages might be returned on the standby database while replaying the log files that were shipped from the primary database.

• After a takeover operation, the new primary database will not be able to handle the workload, resulting in performance problems or in applications receiving error messages they were not receiving when they were connected to the original primary database.

2. Changes to the configuration parameters on the primary database are not automatically propagated to the standby database and must be made manually on the standby database. For dynamic configuration parameters, changes will take effect without shutting down and restarting the database management system (DBMS) or the database. For the configuration parameters that are not dynamic, changes will take effect after the standby database is restarted.

32

3. By default, the log receive buffer size on the standby database should be two times the value specified for the LOGBUFSZ configuration parameter on the primary database. However, there might be times when this size is not sufficient. For example, when the HADR synchronization mode is asynchronous and the primary and standby databases are in peer state, if the primary database is experiencing a high transaction load, the log receive buffer on the standby database might fill to capacity, and the log shipping operation from the primary database might stall. To manage these temporary peaks, you can increase the size of the log receive buffer on the standby database by modifying the DB2_HADR_BUF_SIZE registry variable.

4. Since executing a load operation with the COPY NO option is not supported with HADR, the command is automatically converted to a load operation with the NONRECOVERABLE option. To enable a load operation with the COPY NO option to be converted to a load operation with the COPY YES option, set the DB2_LOAD_COPY_NO_OVERRIDE registry variable on the primary database server. This registry variable is ignored by the standby database server. Ensure that the device or directory specified on the primary database can be accessed by the standby database using the same path, device, or load library.

5. The local host name of the primary database must be the same as the remote host name of the standby database, and the local host name of the standby database must be the same as the remote host name of the primary database. Use the HADR_LOCAL_HOST and HADR_REMOTE_HOST configuration parameters to set the local and remote hosts for each database. Configuration consistency for the local and remote host names is checked when a connection is established to ensure that the remote host specified is the expected database.

6. After an HADR pair establishes a connection, they will exchange heartbeat messages. The heartbeat interval is one-quarter of the value of the HADR_TIMEOUT database configuration parameter, or 30 seconds, whichever is shorter. The HADR_HEARTBEAT monitor element shows the number of heartbeats a database expected to receive but did not receive from the other database. If one database does not receive any message from the other database within the number of seconds specified by HADR_TIMEOUT, it will initiate a disconnect. This means that at most it takes the number of seconds specified by HADR_TIMEOUT for the primary to detect the failure of either the standby or the intervening network. In HADR peer state, problems with the standby or network the will only block primary transaction processing for the number of seconds specified by the HADR_TIMEOUT configuration parameter, at most. If you set this configuration parameter too low, you will receive false alarms and frequent disconnections.

7. When a tablespace statement such as CREATE TABLESPACE, ALTER TABLESPACE, or DROP TABLESPACE is issued on the primary database, it is replayed on the standby database. Therefore, you must ensure that the devices involved are set up on both of the databases before you issue a tablespace statement on the primary database. Also, make sure that mountpoint names are same on both systems.

If the devices used for the tablespace are not set up similarly on both systems and you created a tablespace on the primary database, the log replay will fail on the standby database because the containers are not available. In this case, the primary database will not receive an error message stating that the log replay failed. To check for log replay errors, you must monitor the db2diag.log file on the standby database when you are creating new tablespaces.

8. The standby database should be powerful enough to replay the logged operations of the database as fast as they are generated on the primary server. Therefore, identical hardware for the primary and standby databases is recommended.

9. For HADR databases, set the LOGINDEXBUILD database configuration parameter to ON to ensure that complete information is logged for index creation, recreation, and reorganization. Although this means that index builds might take longer on the primary system and that more log space is required, the indexes will be rebuilt on the standby system during HADR log replay and will be available when a failover takes place. If index builds on the primary system are not logged and a failover occurs, any invalid indexes that remain after the failover is complete will have to be rebuilt before they can be accessed. While the indexes are being recreated, they cannot be accessed by any applications.

10. The default setting for the INDEXREC database configuration parameter is RESTART. This will cause invalid indexes to be rebuilt after a takeover operation is complete. If any index builds have

12.

13.

not been logged, this setting allows DB2 to check for invalid indexes and to rebuild them. For HADR database where indexes are used frequently, change the value of the INDEXREC database configuration parameter to ACCESS on both the primary and standby databases. This will cause invalid indexes to be rebuilt when the underlying table is accessed first time.

Summary DB2 HADR protects customers against prolonged data unavailability caused by any software or hardware failure. Customer can use advanced technologies such as SnapMirror, Snapshot, FlexVol, FlexClone, and RAID-DP, available on NetApp FAS or IBM N Series storage systems, to ensure highest data availability and reliability. SnapMirror allows customer to build standby databases in an HADR environment with a few quick and easy steps. The standby database can’t be used until a takeover operation occurs. In order to ensure optimal use of infrastructure at a DR site, customers can create a clone of a standby database that can be used for various purposes such as reporting, test, and development environments.

References 1. IBM DB2 9 Manuals (www-306.ibm.com/software/data/db2/udb/support/manualsv9.html)

2. Technical report: SnapMirror Deployment and Implementation Guide (www.netapp.com/tech_library/ftp/3390.pdf)

3. DB2:Cloning a Database using NetApp FlexClone Technology (www.netapp.com/tech_library/ftp/3460.pdf)

4. DB2 Enterprise Server Edition V8 for UNIX: Integrating with a NetApp storage system. (www.netapp.com/library/tr/3272.pdf)

33

© 2007 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the NetworkAppliance logo, DataFabric, Data ONTAP, FilerView, SnapMirror, and WAFL are registered trademarks and Network Appliance, FlexClone, FlexVol, NOW, RAID-DP, and Snapshot are trademarks of Network Appliance, Inc. in the U.S. and other countries. Solaris is a trademark of Sun Microsystems, Inc. Linux is a registered trademark of Linus Torvalds. UNIX is a registered trademark of The Open Group. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.
www.netapp.com




db2 9 hadr build a standby database

Documents