tr-4396: metrocluster on clustered data ontap 8.3 ... · on the netapp clustered data ontap® 8.3...
TRANSCRIPT
Technical Report
MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads Business Workloads Group, PSE, NetApp
April 2015 | TR-4396
Abstract
This document describes the results of functional testing of NetApp® MetroCluster
™ software
on the NetApp clustered Data ONTAP® 8.3 operating system in an Oracle Database 11g R2
environment. Proper operation is verified as well as expected behavior during each of the test
cases. Specific equipment, software, and functional failover tests are included along with
results.
2 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TABLE OF CONTENTS
1 Introduction ........................................................................................................................................... 4
1.1 Best Practices .................................................................................................................................................4
1.2 Assumptions ...................................................................................................................................................4
2 Executive Summary.............................................................................................................................. 4
3 Product Overview ................................................................................................................................. 4
3.1 NetApp Storage Technology ...........................................................................................................................5
3.2 Oracle Database and Oracle Real Application Clusters ..................................................................................7
4 Challenges for Disaster Recovery Planning ...................................................................................... 7
4.1 Logical Disasters .............................................................................................................................................8
4.2 Physical Disasters ...........................................................................................................................................8
5 Value Proposition ................................................................................................................................. 8
6 High-Availability Options ..................................................................................................................... 8
6.1 ASM Mirroring .................................................................................................................................................8
6.2 Two-Site Storage Mirroring .............................................................................................................................9
7 High-Level Topology ............................................................................................................................ 9
8 Test Case Overview and Methodology ............................................................................................. 10
9 Test Results ........................................................................................................................................ 12
9.1 Loss of Single Oracle Node (TC-01) ............................................................................................................. 12
9.2 Loss of Oracle Host HBA (TC-02) ................................................................................................................. 12
9.3 Loss of Individual Disk (TC-03) ..................................................................................................................... 13
9.4 Loss of Disk Shelf (TC-04) ............................................................................................................................ 14
9.5 Loss of NetApp Storage Controller (TC-05) .................................................................................................. 15
9.6 Loss of Back-End Fibre Channel Switch (TC-06) .......................................................................................... 16
9.7 Loss of Interswitch Link (TC-07) ................................................................................................................... 17
9.8 Maintenance Requiring Planned Switchover from Site A to Site B (TC-08) .................................................. 18
9.9 Disaster Forcing Unplanned Manual Switchover from Site A to Site B (TC-09) ............................................ 19
10 Conclusion .......................................................................................................................................... 21
Appendix .................................................................................................................................................... 21
Detailed Test Cases ............................................................................................................................................. 21
Deployment Details .............................................................................................................................................. 30
Network ................................................................................................................................................................ 34
Data Layout .......................................................................................................................................................... 35
3 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Materials List ........................................................................................................................................................ 37
LIST OF TABLES
Table 1) Test case summary. ....................................................................................................................................... 11
Table 2) Oracle host specifications. ............................................................................................................................. 30
Table 3) Oracle specifications. ..................................................................................................................................... 30
Table 4) Kernel parameters. ......................................................................................................................................... 31
Table 5) Oracle initiation file parameters. ..................................................................................................................... 32
Table 6) NetApp storage specifications. ....................................................................................................................... 33
Table 7) Server network specifications. ........................................................................................................................ 34
Table 8) Storage network specifications. ...................................................................................................................... 34
Table 9) FC back-end switches. ................................................................................................................................... 34
Table 10) Materials list for testing. ................................................................................................................................ 37
LIST OF FIGURES
Figure 1) MetroCluster overview. ...................................................................................................................................6
Figure 2) MetroCluster mirroring. ...................................................................................................................................7
Figure 3) Test environment. ...........................................................................................................................................9
Figure 4) Data layout. ................................................................................................................................................... 10
Figure 5) Test phases................................................................................................................................................... 11
Figure 6) Loss of Oracle node. ..................................................................................................................................... 12
Figure 7) Loss of an Oracle server host HBA. .............................................................................................................. 13
Figure 8) Loss of an individual disk. ............................................................................................................................. 14
Figure 9) Loss of disk shelf. .......................................................................................................................................... 15
Figure 10) Loss of NetApp storage controller. .............................................................................................................. 16
Figure 11) Loss of an FC switch. .................................................................................................................................. 17
Figure 12) Loss of ISL. ................................................................................................................................................. 18
Figure 13) Loss of primary site for planned maintenance. ............................................................................................ 19
Figure 14) Loss of primary site. .................................................................................................................................... 20
Figure 15) Aggregate and volume layouts and sizes. ................................................................................................... 35
Figure 16) Volume and LUN layouts for site A. ............................................................................................................ 36
4 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
1 Introduction
This document describes the results of a series of tests demonstrating that an Oracle Database 11g R2
Real Application Cluster (RAC) database configured in a NetApp MetroCluster solution in clustered Data
ONTAP 8.3 operates without problems while under load in a variety of possible failure scenarios.
The tests simulate several different failure scenarios. This technical report documents their effects on the
Oracle Database 11g database environment. The tests were conducted while both the database servers
and the NetApp storage controllers were subjected to a heavy transactional workload meant to increase
the amount of stress on the system as well as to better represent more of a real-world environment.
In order to pass a test, the MetroCluster cluster had to remain online and accessible while the database
continued to serve I/O in the environment without errors.
1.1 Best Practices
This document should not be interpreted as a best practice guide for using solutions with Oracle
databases on NetApp MetroCluster software in clustered Data ONTAP 8.3. Customer requirements vary,
and therefore configurations vary as well. The configuration described in this document reflects the most
common two-site customer need encountered by NetApp, but many others exist as well. In addition,
NetApp MetroCluster technology is not required for Oracle RAC. Most Oracle RAC clusters on NetApp
storage do not require synchronous remote replication and therefore are used with standard Data ONTAP
clusters, not with MetroCluster clusters. Although Oracle Database 11g R2 was used for these tests, the
principles are equally applicable to Oracle Database 12c and later. The 11g R2 version was chosen as
the most mature, stable, and commonly used version of Oracle RAC.
Although the Fibre Channel (FC) protocol is used in this document, the same overall design and
procedures can be used for NFS and iSCSI.
For more information about all other configuration details, including Oracle database and kernel
parameters, see the appendix of this document. For general Oracle best practices, including those for
Oracle RAC, see TR-3633: Best Practices for Oracle Databases on NetApp Storage.
1.2 Assumptions
Throughout this document, the examples assume two physical sites, site A and site B. Site A represents
the main data center on campus. Site B is the campus disaster recovery (DR) location that provides
protection during a complete data center outage. All components are named to show clearly where they
are physically located.
It is also assumed that the reader has a basic familiarity with both NetApp and Oracle products.
2 Executive Summary
MetroCluster in clustered Data ONTAP 8.3 provides native continuous availability for business-critical
applications, including Oracle. The testing demonstrated that our Oracle Database 11g R2 RAC cluster
operated as expected in the MetroCluster environment under a moderate to heavy transactional workload
when subjected to a variety of failure scenarios that resulted in limited, moderate, and complete disruption
to the systems in our primary production site.
These tests show that NetApp MetroCluster technology and the Oracle RAC database together provide a
winning combination for continuous application availability.
3 Product Overview
This section describes the NetApp and Oracle products used in the solution.
5 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
3.1 NetApp Storage Technology
This section describes the NetApp hardware and software used in the solution.
FAS8000 Series Storage Systems
NetApp FAS8000 series storage systems combine a unified scale-out architecture with leading data-
management capabilities. They are designed to adapt quickly to changing business needs while
delivering core IT requirements for up-time, scalability, and cost efficiency. These systems offer the
following advantages:
Speed the completion of business operations. Leveraging a new high-performance, multicore architecture and self-managing flash acceleration, FAS8000 unified scale-out systems boost throughput and decrease latency to deliver consistent application performance across a broad range of SAN and NAS workloads.
Streamline IT operations. Simplified management and proven integration with cloud providers let you deploy the FAS8000 in your data center and in a hybrid cloud with confidence. Nondisruptive operations simplify long-term scaling and improve uptime by facilitating hardware repair, tech refreshes, and other updates without planned downtime.
Deliver superior total cost of ownership. Proven storage efficiency and a two-fold improvement in price/performance ratio over the previous generation reduce capacity utilization and improve long-term return on investment. NetApp FlexArray
™ storage virtualization software lets you integrate
existing arrays with the FAS8000, increasing consolidation and providing even greater value to your business.
Clustered Data ONTAP Operating System
NetApp clustered Data ONTAP 8.3 software delivers a unified storage platform that enables unrestricted,
secure data movement across multiple cloud environments and paves the way for software-defined data
centers, offering advanced performance, availability, and efficiency. Data ONTAP clustering capabilities
help you keep your business running nonstop.
Clustered Data ONTAP is an industry-leading storage operating system. Its single feature-rich platform
allows you to scale infrastructure without increasing IT staff. Clustered Data ONTAP provides the
following benefits:
Nondisruptive operations:
Perform storage maintenance, hardware lifecycle operations, and software upgrades without interrupting your business.
Eliminate planned and unplanned downtime.
Proven efficiency:
Reduce storage costs by using one of the most comprehensive storage efficiency offerings in the industry.
Consolidate and share the same infrastructure for workloads or tenants with different performance, capacity, and security requirements.
Seamless scalability:
Scale capacity, performance, and operations without compromise, regardless of application.
Scale SAN and NAS from terabytes to tens of petabytes without reconfiguring running applications.
MetroCluster Solution
A self-contained solution, NetApp MetroCluster high-availability (HA) and DR software lets you achieve
continuous data availability for mission-critical applications at half the cost and complexity.
6 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
MetroCluster software combines array-based clustering with synchronous mirroring to deliver continuous
availability and zero data loss. It provides transparent recovery from most failure scenarios so that critical
applications continue running uninterrupted. It also eliminates repetitive change-management activities to
reduce the risk of human error and administrative overhead.
New MetroCluster enhancements deliver the following improvements:
Local node failover in addition to site switchover
End-to-end continuous availability in a virtualized environment with VMware HA and fault tolerance
Whether you have a single data center, a campus, or a metropolis-wide environment, use the cost-
effective NetApp MetroCluster solution to achieve continuous data availability for your critical business
environment.
Figure 1 shows a high-level view of a MetroCluster environment that spans two data centers separated by
a distance of up to 200km. MetroCluster software in clustered Data ONTAP 8.3 provides the following
features:
MetroCluster is an independent two-node cluster at each site, up to 200km apart.
Each site serves data to local clients or hosts and acts as secondary to the other site.
The client/host network spans both sites, just as with fabric and stretch MetroCluster.
Interswitch links (ISLs) and redundant fabrics connect the two clusters and their storage.
All storage is fabric attached and visible to all nodes.
Local HA handles almost all planned and unplanned operations.
Switchover and switchback transfer the entire cluster's workload between sites.
Figure 1) MetroCluster overview.
SyncMirror Mirroring
NetApp SyncMirror®, an integral part of MetroCluster, combines the disk-mirroring protection of RAID 1
with industry-leading NetApp RAID technology. During an outage—whether from a disk problem, a cable
break, or a host bus adapter (HBA) failure—SyncMirror can instantly access the mirrored data without
operator intervention or disruption to client applications. SyncMirror maintains strict physical separation
between two copies of your mirrored data. Each copy is called a plex. As Figure 2 shows, each
controller’s data has its “mirror” at the other location.
7 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 2) MetroCluster mirroring.
With MetroCluster, all mirroring is performed at an aggregate level so that all volumes are automatically
protected with one simple replication relationship. Other protection solutions operate at an individual
volume level. This means that to protect all of the volumes (which could be hundreds), some type of
replication relationship must be created after each source and destination volume is created.
3.2 Oracle Database and Oracle Real Application Clusters
The Oracle Database 11g R2 Enterprise Edition provides industry-leading performance, scalability,
security, and reliability on clustered or single servers with a wide range of options to meet the business
needs of critical enterprise applications.
Oracle Database with Real Application Clusters (RAC) brings an innovative approach to the challenges of
rapidly increasing amounts of data and demand for high performance. In the scale-out model of Oracle
RAC, active-active clusters use multiple servers to deliver high performance, scalability, and availability,
making Oracle Database 11g the ideal platform for private and public cloud deployments.
The use of Oracle RAC clusters running on extended host clusters provide the highest level of Oracle
capability for availability, scalability, and low-cost computing. It also supports popular packaged products
such as SAP, PeopleSoft, Siebel, and Oracle E*Business Suite, as well as custom applications.
4 Challenges for Disaster Recovery Planning
Disaster recovery (DR) is defined as the processes, policies, and procedures related to preparing for
recovery or continuation of technical infrastructure critical to an organization after a natural disaster (such
as flood, tornado, volcano eruption, earthquake, or landslide) or a human-induced disaster (such as a
threat having an element of human intent, negligence, or error or involving a failure of a human-made
system).
DR planning is a subset of a larger process known as business continuity planning, and it should include
planning for the resumption of applications, data, hardware, communications (such as networking), and
other IT infrastructure. A business continuity plan (BCP) includes planning for non-IT related aspects such
8 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
as key personnel, facilities, crisis communication, and reputation protection, and it should refer to the
disaster recovery plan (DRP) for IT-related infrastructure recovery or continuity.
Generically, a disaster can be classified as either logical or physical. Both categories are addressed with
HA, recovery processing, and/or DR processes.
4.1 Logical Disasters
Logical disasters include, but are not limited to, data corruption by users or technical infrastructure.
Technical infrastructure disasters can result from file system corruption, kernel panics, or even system
viruses introduced by end users or system administrators.
4.2 Physical Disasters
Physical disasters include the failure of any storage component on site A or site B that supersedes the
resiliency features of an HA pair of NetApp controllers not based on MetroCluster that would normally
result in downtime or data loss.
In certain cases, mission-critical applications should not be stopped even in a disaster. By leveraging
Oracle RAC extended-distance clusters and NetApp storage technology, it is possible to address those
failure scenarios and provide a robust deployment for critical database environments and applications.
5 Value Proposition
Typically, mission-critical applications must be implemented with two requirements:
RPO = 0 (recovery point objective equal to zero), meaning that data loss from any type of any failure is unacceptable
RTO ~= 0 (recovery time objective as close to zero as possible), meaning that the time to recovery from a disaster scenario should be as close to 0 minutes as possible
The combination of Oracle RAC on extended-distance clusters with NetApp MetroCluster technology
meets these RPO requirements by addressing the following common failures:
Any kind of Oracle Database instance crash
Switch failure
Multipathing failure
Storage controller failure
Storage or rack failure
Network failure
Local data center failure
Complete site failure
6 High-Availability Options
Multiple options exist for spanning sites with an Oracle RAC cluster. The best option depends on the
available network connectivity, the number of sites, and customer business needs. NetApp Professional
Services can offer assistance with configuration planning and, when necessary, can offer Oracle
consulting services as well.
6.1 ASM Mirroring
Automatic storage management (ASM) mirroring, also called ASM normal redundancy, is a frequent
choice when only a very small number of databases must be replicated. In this configuration, the Oracle
9 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
RAC nodes span sites and leverage ASM to replicate data. Storage mirroring is not required, but
scalability is limited because as the number of databases increases the administrative burden to maintain
many mirrored ASM disk groups becomes excessive. In these cases, customers generally prefer to mirror
data at the storage layer.
This approach can be configured with and without a tiebreaker to control the Oracle RAC cluster quorum.
6.2 Two-Site Storage Mirroring
The configuration chosen for these tests was two-site storage mirroring because it reflects the most
common use of site-spanning Oracle RAC with MetroCluster.
As described in detail in the following section, this option establishes one of the sites as a designated
primary site and the other as the designated secondary site. This is done by first selecting one site to host
the active storage site and then configuring two Oracle Cluster Ready Services (CRSs) and voting
resources on it. The other site is a synchronous but passive replica. It does not directly serve data. It also
contains only one CRS and voting resource.
7 High-Level Topology
Figure 3 shows the architecture of the configuration used for our validation testing. These tests used a
two-node Oracle RAC database environment with a RAC node deployed at both site A and site B with the
following specifics:
The sites were separated by a 20km distance, and fiber spools were used for both the MetroCluster and the RAC nodes.
The RAC configuration used the FC protocol and ASM to provide access to the database.
A WAN emulator was used to simulate a 20km distance between the RAC nodes for the private interconnect and to introduce approximately 10ms of latency into the configuration.
Figure 3) Test environment.
10 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
The Oracle binaries were installed locally on each server. The configuration included the following
specific arrangements:
The data and logs were mirrored to site B with the storage controllers at site B acting strictly in a passive DR capacity.
Using a single front-end fabric spanning both sites, the FC LUNs were presented to both Oracle RAC nodes by the storage controllers at site A.
There were three OCR disks: two at site A and one at site B.
There were three voting disks: two at site A and one at site B.
Figure 4 shows the distribution of the Oracle logs and data files across both controllers on site A. For
more information about the hardware and software used in this test configuration, see the appendix of this
document.
Figure 4) Data layout.
8 Test Case Overview and Methodology
All of the test cases listed in Table 1 were executed by injecting a specific fault into an otherwise
nominally performing system under a predefined load driven to the Oracle RAC cluster. The load was
generated by a utility called Simple Little Oracle Benchmark (SLOB), and it used a combination of 90%
reads and 10% writes with a 100% random access pattern.
This load delivered more than 70K IOPS evenly across the FAS8060 controllers at site A, resulting in
storage CPU and disk utilization of 30% to 40% during the tests. The goal was not to measure
performance of the overall environment specifically but to subject the environment to a substantial load
during testing.
11 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
To increase the load on the test environment, we made sure that the Oracle RAC node that was installed
in site B participated in the load generation by driving IOPS to the FAS8060 storage controllers on site A
across the network.
Table 1) Test case summary.
Test Case Description
TC01 Loss of a single Oracle node
TC02 Loss of an Oracle host HBA
TC03 Loss of an individual disk in an active data aggregate
TC04 Loss of an entire disk shelf
TC05 Loss of a NetApp storage controller
TC06 Loss of a back-end FC switch on the MetroCluster cluster
TC07 Loss of an ISL
TC08 Sitewide maintenance requiring a planned switchover from site A to site B
TC09 Sitewide disaster requiring an unplanned manual switchover from site A to site B
For more information about how we conducted each of these tests, see the appendix of this document.
Each test was broken into the following three phases:
1. A baseline stage, indicative of normal operations. A typical duration for this stage was 15 minutes.
2. A fault stage, during which the specific fault under test was injected and allowed to continue in this stage for 15 minutes to provide sufficient time to verify correct database behavior.
3. A recovery stage, in which the fault was corrected and database behavior was verified. When applicable, this stage generally included 30 additional minutes of run time after the fault was corrected.
Figure 5 shows the process. Before each stage of a specific test, we used the automatic workload
repository (AWR) functionality of the Oracle database to create a snapshot of the current condition of the
database. After the test was complete, we captured the data between the snapshots to understand the
impact of the specific fault on database performance and behavior. Finally, we monitored the CPU, IOPS,
and disk utilization on the storage controllers throughout the tests.
Figure 5) Test phases.
12 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
9 Test Results
The following sections summarize the tests that were performed and report the results of each.
9.1 Loss of Single Oracle Node (TC-01)
This test case resulted in the loss of the Oracle RAC node on site B. As Figure 6 shows, this loss was
accomplished by powering off the Oracle RAC database node while it was under load. For this test, we
ran the workload for a total of 60 minutes and allowed the RAC node to be disabled for 15 minutes before
restarting it to correct the fault.
Figure 6) Loss of Oracle node.
As expected, we observed no impact to the Oracle RAC functionality during this test. Also as expected,
we observed a larger impact to the overall performance driven from the database because the loss of one
of the database nodes reduced the amount of I/O data driven to the FAS8060 controllers on site A. The
database remained operational during the 15 minutes we allowed the test to continue in the failed state.
To correct the failure, we powered on the RAC node located at site B and observed that it was correctly
added back into the RAC environment. We then started the workload again on both RAC nodes to verify
that they were both operating correctly.
9.2 Loss of Oracle Host HBA (TC-02)
This test resulted in the loss of an HBA on one of the Oracle RAC nodes. As Figure 7 shows, this loss
was accomplished by removing the cables from an HBA on the Oracle node at site A. For this test, we ran
the workload for a total of 60 minutes and allowed the HBA to be disconnected for 15 minutes before
reconnecting it to correct the fault.
13 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 7) Loss of an Oracle server host HBA.
As expected, during this test we observed no impact to the Oracle RAC functionality while the database
servers continued to drive load to the FAS8060 controllers on site A. The database remained operational
during the 15 minutes we allowed the test to continue in the failed state.
To correct the failure, we reconnected the HBA on the RAC node located at site A and verified that it
ultimately started participating in the workload again after a brief time. We observed no database errors
during this test.
9.3 Loss of Individual Disk (TC-03)
This test resulted in the loss of a disk on one of the storage controllers. As Figure 8 shows, this loss was
accomplished by removing one of the disks on an active data aggregate at site A. For this test, we ran the
workload for a total of 60 minutes and allowed the disk to be removed for 15 minutes before reinserting it
to correct the fault.
14 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 8) Loss of an individual disk.
As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact
to the overall performance driven from the database while both RAC nodes continued to drive load to the
FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the
test to continue in the failed state.
Note: NetApp RAID DP® technology can survive the failure of two disks per aggregate, and it
automatically reconstructs the data on the spare.
9.4 Loss of Disk Shelf (TC-04)
This test resulted in the loss of an entire shelf of disks on one of the FAS8060 storage controllers. As
Figure 9 shows, this loss was accomplished by powering off one of the disk shelves at site A. For this
test, we ran the workload for a total of 60 minutes and allowed the disk shelf to be powered off for 15
minutes before reapplying power to correct the fault.
15 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 9) Loss of disk shelf.
As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact
to the overall performance driven from the database while both RAC nodes continued to drive load to the
FAS8060 controllers on site A. The database remained operational during the 15 minutes we allowed the
test to continue in the failed state.
With the use of SyncMirror in MetroCluster, shelf failure at either site is transparent. There are two plexes,
one at each site. In normal operation, all reads are fulfilled from the local plex, and all writes are
synchronously updated on both plexes. If one plex fails, reads continue seamlessly on the remaining plex,
and writes are directed to the remaining plex. If the hardware can be powered on for recovery, the
resynchronization of the recovered plex is automatic. If the failed shelf must be replaced, the new disks
are added to the mirrored plex. Afterward, resynchronization again becomes automatic.
9.5 Loss of NetApp Storage Controller (TC-05)
This test resulted in the unplanned loss of an entire storage controller. As Figure 10 shows, this was
accomplished by powering off one of the FAS8060 storage controllers at site A. The surviving storage
controller automatically took over the workload that was initially shared evenly across both storage
controllers.
Note: The storage controller takeover and giveback process used for this test differs from the MetroCluster switchover and switchback process used in test cases TC-08 and TC-09.
For this test we ran the workload for a total of 60 minutes and allowed the controller to be powered off for
15 minutes before reapplying power to correct the fault and performing a storage controller giveback to
bring both FAS8060 controllers back on line at site A.
16 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 10) Loss of NetApp storage controller.
As expected, during this test we observed no impact to the Oracle RAC functionality, with a larger impact
to the overall performance driven from the database while both RAC nodes continued to drive load to the
surviving FAS8060 storage controller, albeit at a lower rate because of the nature of the failure.
After performing a storage controller giveback to rectify the failure, we allowed the test to continue for an
additional 30 minutes and observed that overall performance returned to prefailure levels. We continued
to observe no problems with the operation of the database.
9.6 Loss of Back-End Fibre Channel Switch (TC-06)
This test resulted in the loss of one of the MetroCluster FC switches. As Figure 11 shows, this loss was
accomplished by powering off one of the switches at site A. For this test, we ran the workload for a total of
60 minutes and allowed the switch to be powered off for 15 minutes before reapplying power to correct
the fault.
17 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 11) Loss of an FC switch.
As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact
to the overall performance driven from the database. In this case, continuous operation was maintained
by automatically moving all of the I/O across the surviving path to the LUNs on the surviving switch.
After rectifying the failure by reapplying power to the switch, we allowed the test to continue for an
additional 30 minutes and observed that overall performance was maintained at prefailure levels. We
continued to observe no problems with the operation of the database.
9.7 Loss of Interswitch Link (TC-07)
This test resulted in the loss of one of the ISLs on the MetroCluster FC switches. As Figure 12 shows, this
loss was accomplished by unplugging the ISL on one of the switches at site A. For this test, we ran the
workload for a total of 60 minutes and allowed the ISL to be disconnected for 15 minutes before
reconnecting it to correct the fault.
18 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 12) Loss of ISL.
As expected, during this test we observed no impact to the Oracle RAC functionality and minimal impact
to the overall performance driven from the database. In this case, continuous operation was maintained
by automatically moving all of the I/O across the surviving paths.
After rectifying the failure by reconnecting the ISL to the switch, we allowed the test to continue for an
additional 30 minutes and observed that the overall performance was maintained at prefailure levels. We
continued to observe no problems with the operation of the database.
9.8 Maintenance Requiring Planned Switchover from Site A to Site B (TC-08)
This test resulted in the planned switchover of the FAS8060 storage controllers on site A in order to
conduct a maintenance operation. As Figure 13 shows, this was accomplished by executing a
MetroCluster switchover that changed the LUNs serving the RAC database from site A to those that were
mirrored at site B.
19 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 13) Loss of primary site for planned maintenance.
For this test, we ran the workload for a total of 60 minutes. After 15 minutes, we initiated the MetroCluster
switchover command from the FAS8060 controllers on site B. After the switchover was successfully
completed, we observed that the workload was picked up by the FAS8060 controllers at site B and that
both Oracle RAC nodes continued to operate normally without interruption.
Note: The switchover was accomplished by using a single command to switch over the entire storage resource from site A to site B while preserving the configuration and identity of the LUNs. The result was that no action, rediscovery, remapping, or reconfiguration was required from the perspective of the Oracle RAC database.
We allowed the test to continue in the switched-over state for another 15 minutes and then initiated the
MetroCluster switchback process to restore site A as the primary site for the Oracle RAC database. After
successfully completing the MetroCluster switchback process, we observed the FAS8060 in site A
resuming the processing of the workload from both RAC nodes, and the database operation continued
without interruption.
During this test, were observed no problems with the operation of the Oracle RAC database.
9.9 Disaster Forcing Unplanned Manual Switchover from Site A to Site B (TC-09)
This test resulted in the unexpected complete loss of site A because of an unspecified disaster. As Figure
14 shows, this loss was accomplished by powering off both of the FAS8060 storage controllers and the
Oracle RAC node located at site A.
Our expectation was that the second Oracle RAC node running on site B would lose access to the
database LUNs hosted on the FAS8060 controllers on site A and shut down. After officially declaring the
loss of site A, we manually initiated a MetroCluster switchover from site A to site B and restarted the
Oracle RAC database instance on site B.
20 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 14) Loss of primary site.
For this test, we ran the workload for a total of 60 minutes. After 15 minutes, we powered off both of the
FAS8060 controllers and the Oracle RAC node on site A. We continued in this state for a total of 15
minutes. As expected, the Oracle RAC database node that was running on site B lost access to the voting
LUNs on site A and stopped working after exceeding the defined timeout period. As discussed previously,
this interruption occurred because of the lack of a third-site tiebreaker service, which is the most common
configuration chosen by customers. If completely seamless DR capability is desired, this can be
accomplished through the use of a third site.
We then initiated the MetroCluster switchover command from the FAS8060 controllers on site B. After the
switchover was completed, we restarted the Oracle RAC node on site B and observed that it started
normally without additional manual intervention after the database LUNs were redirected to the copies
that had been mirrored to the FAS8060 controllers on site B.
To verify that the database was working, we initiated the workload from the surviving RAC node and
observed that it successfully drove IOPS to the FAS8060 controllers on site B.
We allowed the test to continue for an additional 15 minutes and then initiated the MetroCluster
switchback process to restore site A as the primary site for the Oracle RAC database. After successfully
completing the MetroCluster switchback process, we restarted the Oracle RAC server on site A and
verified that it was added back into the cluster. We then restarted the workload and verified that the
Oracle RAC nodes on site A and site B were again driving load to the FAS8060 controllers on site A.
21 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
10 Conclusion
NetApp MetroCluster software in clustered Data ONTAP 8.3 provides native continuous availability for
business-critical applications, including Oracle. Our tests demonstrated that even under heavy
transactional workloads Oracle databases continue to function normally during a wide variety of failure
scenarios that could potentially cause downtime and data loss.
In addition, clustered Data ONTAP provides the following benefits:
Nondisruptive operations leading to zero data loss
Set-it-once simplicity
Zero change management
Lower cost and complexity of competitive solutions
Seamless integration with storage efficiency, SnapMirror®, nondisruptive operations, and virtualized
storage
Unified support for both SAN and NAS
Together, these products create a winning combination for continuous data availability.
Appendix
This appendix provides detailed information about the test cases described in this document as well as
about deployment, the network, the data layout, and the list of materials used.
Detailed Test Cases
TC-01: Loss of Single Oracle Node
Test Case Details
Test case number TC-01
Test case description No single point of failure should exist in the solution. Therefore, the loss of one of the Oracle servers in the cluster was tested. This test was accomplished by halting a host in the cluster while running a test workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The loss of an Oracle RAC node causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at a lower rate because of the loss of one of the RAC nodes. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
22 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Halt one of the Oracle RAC servers and allow the test to continue for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Bring the halted server back online and verify that it is placed back into the RAC environment.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault.
TC-02: Loss of Oracle Host HBA
Test Case Details
Test number TC-02
Test case description No single point of failure should exist in the solution. Therefore, the loss of one of the Oracle servers in the cluster was tested. This test was accomplished by halting a host in the cluster while running a test workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results Removal of the HBA connection from the Oracle RAC node causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Remove the cable from the FC HBA on the Oracle RAC server on site B and allow the test to continue for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Reinstall the cable.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
23 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-03: Loss of Individual Disk
Test Case Details
Test number TC-03
Test case description No single point of failure should exist in the solution. Therefore, the loss of a single disk was tested. This test was accomplished by removing a disk drive from the shelf hosting the database data files on the FAS8060 running on site A while running an active workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The removal of the disk drive causes no interruption of Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Remove one of the disks in an active aggregate and allow the test to continue for 15 minutes.
5. Reinstall the disk drive.
6. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
24 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-04: Loss of Disk Shelf
Test Case Details
Test number TC-04
Test case description No single point of failure should exist in the solution. Therefore, the loss of an entire shelf of disks was tested. This test was accomplished by turning off both power supplies on one of the disk shelves hosting the database data files on the FAS8060 running on site A while running an active workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The loss of a disk shelf causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Turn off the power supplies on the designated disk shelf and let the test continue for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Turn on the power supplies on the affected disk shelf.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
25 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-05: Loss of NetApp Storage Controller
Test Case Details
Test number TC-05
Test case description No single point of failure should exist in the solution. Therefore, the loss of one of the FAS8060 controllers serving the database on site A was tested while an active workload was running.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The loss of a controller of an HA pair has no impact on Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at a lower rate in the time frame while the second storage controller is halted and the surviving storage controller is handling the entire workload.
After the storage giveback process is completed, performance returns to prefailure levels because both storage controllers are again servicing the workload. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Without warning, halt one of the controllers of the FAS8060 HA pair on site A.
5. Initiate a storage takeover by the surviving node and let the test continue for 15 minutes.
6. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
7. Reboot the halted storage controller.
8. Initiate a storage giveback operation to bring the failed node back into the storage cluster.
9. Allow the test to continue for the remainder of the 60-minute duration.
10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
26 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-06: Loss of Back-End Fibre Channel Switch
Test Case Details
Test number TC-06
Test case description No single point of failure should exist in the solution. Therefore, the loss of an entire FC switch supporting the MetroCluster cluster was tested. This test was accomplished by simply removing the power cord from one of the Brocade 6510 switches in site A while running an active workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The loss of a single MetroCluster FC switch causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Power off one of the MetroCluster Brocade 6510 switches in site A and allow the test to run for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Power on the Brocade 6510 switch.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
27 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-07: Loss of Interswitch Link
Test Case Details
Test number TC-07
Test case description No single point of failure should exist in the solution. Therefore, the loss of one of the ISLs was tested. This test was accomplished by removing the FC cable between two Brocade 6510 switches on site A and site B while running an active workload.
Test assumptions A completely operational NetApp MetroCluster cluster has been properly installed and configured.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers
Expected results The loss of one of the ISL switch links between site A and site B causes no interruption of the Oracle RAC operation. During the failure period, IOPS continue to the FAS8060 at prefailure levels. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. Disconnect one of the MetroCluster ISLs.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Reconnect the affected ISL.
7. Allow the test to continue for the remainder of the 60-minute duration.
8. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
28 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-08: Maintenance Requiring Planned Switchover from Site A to Site B
Test Case Details
Test number TC-08
Test case description If there is a required maintenance window for the FAS8060 storage controllers at site A, the MetroCluster switchover feature should be capable of moving the production workload to site B and presenting the Oracle RAC database LUNs from the FAS8060 storage controllers at site B, allowing the database to continue operations. To test this premise, we initiated a MetroCluster switchover and switchback from site A to site B and then back to site A after the maintenance was complete.
Test assumptions A completely operational NetApp MetroCluster cluster has been installed and configured properly.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers on site A and site B
Expected results Moving the production operations from site A to site B by using the MetroCluster switchover operations causes no interruption of the Oracle RAC operation. After the MetroCluster switchover, IOPS are directed to the FAS8060 storage controllers at site B from both RAC nodes. After the MetroCluster switchback, IOPS are again directed at the FAS8060 storage controllers on site A. No database errors are detected.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. On site B, initiate a MetroCluster switchover of production operations and let the test continue to run in switchover mode for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Heal the aggregates on site A.
7. Perform a MetroCluster switchback to return to normal operation.
8. Verify successful switchback.
9. Allow the test to continue for the remainder of the 60-minute duration.
10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
29 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
TC-09: Disaster Forcing Unplanned Manual Switchover from Site A to Site B
Test Case Details
Test number TC-09
Test case description If an unplanned disaster at site A takes out the FAS8060 storage controllers and the Oracle RAC node at site A, the MetroCluster switchover feature should be capable of moving the production workload to site B and presenting the Oracle RAC database LUNs from the FAS8060 storage controllers at site B, allowing the database to continue operations.
To test this premise, we powered off the FAS8060 storage controllers and the Oracle RAC node located at site A to simulate a site failure. We then manually initiated a MetroCluster switchover and switchback from site A to site B and then back to site A after mitigating the disaster at site A.
Test assumptions A completely operational NetApp MetroCluster cluster has been properly installed and configured.
A completely operational Oracle RAC environment has been installed and configured.
The SLOB utility has been installed and configured to generate a workload consisting of 90% reads and 10% writes with a 100% random access pattern.
Test data or metrics to capture
AWR data as described in section 8, “Test Case Overview and Methodology”
IOPS, CPU, and disk utilization data from both NetApp FAS8060 controllers on site A and site B
Expected results As a result of the disaster, the FAS8060 at site A is lost, which ultimately causes the RAC node at site B to lose access to the database LUNs and stop running. Manually moving the production operations from site A to site B through the MetroCluster switchover operations allows the Oracle RAC database to be restarted by using the surviving database node.
After the MetroCluster switchover and restart of the database, IOPS are directed to the FAS8060 storage controllers at site B from the surviving RAC node on site B. After the disaster is repaired and the MetroCluster switchback is completed, the repaired Oracle RAC node on site A is restarted and added back into the database. IOPS are again directed at the FAS8060 storage controllers on site A from both Oracle RAC nodes.
Test Methodology
1. Initiate the defined workload by using the SLOB tool for a total of 60 minutes. SLOB generates an initial AWR snapshot.
2. Allow the workload to run for 15 minutes to establish consistent performance.
3. Initiate an AWR snapshot to capture database-level IOPS and latency information before the fault is injected.
4. On site B, initiate a MetroCluster switchover of production operations and let the test continue to run in switchover mode for 15 minutes.
5. Initiate an AWR snapshot to capture database-level IOPS and latency during the fault.
6. Heal the aggregates on site A.
7. Perform a MetroCluster switchback to return to normal operation.
8. Verify successful switchback.
9. Allow the test to continue for the remainder of the 60-minute duration.
30 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
10. SLOB creates a final AWR snapshot at the end of the test to capture database-level IOPS and latency for the period after the fault is corrected.
Deployment Details
In this section, the deployment details of the architecture are listed in Table 2 through Table 6.
Table 2) Oracle host specifications.
Oracle Hosts
Server Two Fujitsu Primergy RX300 S7 servers
Operating system RedHat Enterprise Linux 6.5
Memory 132GB
Network interfaces Eth0:10000Mb/sec, MTU=9,000
Eth1:10000Mb/sec, MTU=9,000
Eth2:1000Mb/sec, MTU=1,500
Eth3:1000Mb/sec, MTU=1,500
HBA QLogic QLE2562 - PCI-Express dual-channel 8Gb FC HBA
Host attach kit and version NetApp Linux Host Utilities version 6.2
Multipathing Yes
SAN switches, models, and firmware
Brocade 6510, v7.0.2c
Local storage used RHEL 6.5 only
Table 3) Oracle specifications.
Oracle
Version 11.2.0.4.0
ASM (SAN only) 11.2.0.4.0
Oracle CRS (SAN only) 11.2.0.4.0
For these tests, we set the Oracle RAC parameters miscount and disktimeout to 120 and 300
seconds, respectively. These parameters control the amount of time the RAC nodes wait after losing
access to storage and/or network heartbeats before taking themselves out of the cluster to prevent a
potential split-brain situation. These values should be changed from the defaults only with careful
understanding of the storage, network, and cluster layout.
31 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Table 4) Kernel parameters.
Kernel Parameters: /etc/sysctl.conf File
kernel.sem 250 32000 100 128
kernel.shmmni 4096
kernel.sem 250 32000 100 128
net.ipv4.ip_local_port_range 6815744
net.core.rmem_default 4194304
net.core.rmem_max 16777216
net.core.wmem_default 262144
net.core.wmem_max 16777216
net.ipv4.ipfrag_high_thresh 524288
net.ipv4.ipfrag_low_thresh 393216
net.ipv4.tcp_rmem 4096 524288 16777216
net.ipv4.tcp_wmem 4096 524288 16777216
net.ipv4.tcp_timestamps 0
net.ipv4.tcp_sack 0
net.ipv4.tcp_window_scaling 1
net.core.optmem_max 524287
net.core.netdev_max_backlog 2500
sunrpc.tcp_slot_table_entries 128
sunrpc.udp_slot_table_entries 128
net.ipv4.tcp_mem 16384 16384 16384
fs.file-max 6815744
fs.aio-max-nr 1048576
net.ipv4.tcp_no_metrics_save 1
net.ipv4.tcp_moderate_rcvbuf 0
vm.swappiness 0
32 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Table 5) Oracle initiation file parameters.
Oracle init.ora Parameters
MCCDB2.__db_cache_size 3G
MCCDB1.__db_cache_size 3G
MCCDB1.__java_pool_size 67108864
MCCDB1.__large_pool_size 83886080
MCCDB2.__oracle_base /u01/app/oracle'#ORACLE_BASE set from environment
MCCDB1.__oracle_base /u01/app/oracle'#ORACLE_BASE set from environment
MCCDB2.__pga_aggregate_target 300M
MCCDB1.__pga_aggregate_target 419430400
MCCDB2.__sga_target 4G
MCCDB1.__sga_target 4294967296
MCCDB2.__shared_io_pool_size 0
MCCDB1.__shared_io_pool_size 0
MCCDB2.__shared_pool_size 300M
MCCDB1.__shared_pool_size 922746880
MCCDB2.__streams_pool_size 0
MCCDB1.__streams_pool_size 0
*.audit_file_dest '/u01/app/oracle/admin/MCCDB/adump'
*.audit_trail 'db'
*.cluster_database TRUE
*.compatible '11.2.0.4.0'
*.control_files '+FRA/MCCDB/control01.ctl','+FRA/MCCDB/control02.ctl'
*.db_block_size 8192
*.db_domain ''
*.db_name 'MCCDB'
*.db_writer_processes 20
*.diagnostic_dest '/u01/app/oracle'
*.dispatchers (PROTOCOLTCP) (SERVICE MCCDBXDB)'
MCCDB1.instance_number 1
MCCDB2.instance_number 2
*.log_buffer 102400000
33 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Oracle init.ora Parameters
*.open_cursors 300
*.pga_aggregate_target 400M
*.processes 1500
*.remote_listener 'rac-mcc:1521'
*.remote_login_passwordfile 'exclusive'
*.sessions 1655
*.sga_target 4294967296
MCCDB2.thread 2
MCCDB1.thread 1
MCCDB2.undo_tablespace 'UNDOTBS2'
MCCDB1.undo_tablespace 'UNDOTBS1'
Table 6) NetApp storage specifications.
NetApp Storage
Model Four FAS8060 storage systems (2 two-node clusters)
Number of disks 192
Size of disks 838.36GB
Drive type SAS
Shelf type DS2246
Number of shelves 8
Operating system Data ONTAP 8.3RC1
Flash Cache™
1TB
Network interface card (NIC) Dual 10GbE controller IX1-SFP+
Target HBA Qlogic 8324 (2a,2b)
4 back-end switches Brocade 6510
Kernel: 2.6.14.2
Fabric OS: v7.0.2c
Made on: Fri Feb 22 21:29:23 2013
Flash: Mon Nov 4 18:39:15 2013
BootProm: 1.0.9
Software NFS, CIFS, FCP, FlexClone®, OnCommand
® Balance
34 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Network
Table 7, Table 8, and Table 9 list the network details.
Table 7) Server network specifications.
Hostname Interface IP Address Speed MTU Purpose
stlrx300s7-85 eth0 172.20.160.100 10Gb/s 9,000 RAC interconnect
eth0:1 169.254.76.209 10Gb/s 9,000
eth2 10.61.164.204 1Gb/s 1,500 Public
eth2:1 10.61.164.138 1Gb/s 1,500 Public VIP
eth2:2 10.61.164.140 1Gb/s 1,500 Mgmt
eth2:3 10.61.164.142 1Gb/s 1,500 Mgmt
stlrx300s7-87 eth0 172.20.160.102 10Gb/s 9,000 RAC interconnect
eth0:1 169.254.180.210 10Gb/s 9,000
eth2 10.61.164.206 1Gb/s 1,500 Public
eth2:1 10.61.164.141 1Gb/s 1,500 Public VIP
eth2:5 10.61.164.139 1Gb/s 1,500 Public VIP
Table 8) Storage network specifications.
SVM LIF Port IP Address Speed
MTU Role
Cluster stl-mcc-01-01_clus1 stl-mcc-01-01 e0a 169.254.228.130 10Gb 9,000 Cluster
Cluster stl-mcc-01-01_clus2 stl-mcc-01-01 e0c 169.254.183.28 10Gb 9,000 Cluster
Cluster stl-mcc-01-02_clus1 stl-mcc-01-02 e0a 169.254.32.214 10Gb 9,000 Cluster
Cluster stl-mcc-01-02_clus2 stl-mcc-01-02 e0c 169.254.235.240 10Gb 9,000 Cluster
Stl-mcc-01 cluster_mgmt stl-mcc-01-01 e0i 10.61.164.172 1Gb 1,500 Cluster mgmt
Stl-mcc-01 stl-mcc-01-01_icl1 stl-mcc-01-01:e0b 10.61.164.176 10Gb 1,500 Intercluster
Stl-mcc-01 stl-mcc-01-01_mgmt1 stl-mcc-01-01 e0i 10.61.164.170 1Gb 1,500 Node mgmt
Stl-mcc-01 stl-mcc-01-02_icl1 stl-mcc-01-02 e0b 10.61.164.177 10Gb 1,500 Intercluster
Stl-mcc-01 stl-mcc-01-02_mgmt1 stl-mcc-01-02 e0i 10.61.164.171 1Gb 1,500 Node mgmt
Table 9) FC back-end switches.
Hostname IP Address
FC_switch_A1 10.61.164.166
FC_switch_A2 10.61.164.167
FC_switch_B1 10.61.164.168
35 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Hostname IP Address
FC_switch_B2 10.61.164.169
Data Layout
Figure 15 and Figure 16 show the layout of the data.
Figure 15) Aggregate and volume layouts and sizes.
36 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Figure 16) Volume and LUN layouts for site A.
37 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Materials List
Table 10 lists the materials used in the testing.
Table 10) Materials list for testing.
Quantity Description
2 HA pairs of FAS8060 (total 4 nodes)
4 Brocade 6510 switches for back-end MC SAN
4 FC/SAS bridges
8 2,246 disk shelves with 900GB SAS drives
38 MetroCluster in Clustered Data ONTAP 8.3 Verification Tests Using Oracle Workloads © 2015 NetApp, Inc. All Rights Reserved.
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product and feature versions described in this document are supported for your specific environment. The NetApp IMT defines the product components and versions that can be used to construct configurations that are supported by NetApp. Specific results depend on each customer's installation in accordance with published specifications.
Trademark Information
NetApp, the NetApp logo, Go Further, Faster, ASUP, AutoSupport, Campaign Express, Cloud ONTAP, Customer Fitness, Data ONTAP, DataMotion, Fitness, Flash Accel, Flash Cache, Flash Pool, FlashRay, FlexArray, FlexCache, FlexClone, FlexPod, FlexScale, FlexShare, FlexVol, FPolicy, GetSuccessful, LockVault, Manage ONTAP, Mars, MetroCluster, MultiStore, NetApp Insight, OnCommand, ONTAP, ONTAPI, RAID DP, SANtricity, SecureShare, Simplicity, Simulate ONTAP, Snap Creator, SnapCopy, SnapDrive, SnapIntegrator, SnapLock, SnapManager, SnapMirror, SnapMover, SnapProtect, SnapRestore, Snapshot, SnapValidator, SnapVault, StorageGRID, Tech OnTap, Unbound Cloud, and WAFL are trademarks or registered trademarks of NetApp, Inc., in the United States and/or other countries. A current list of NetApp trademarks is available on the Web at http://www.netapp.com/us/legal/netapptmlist.aspx.
Cisco and the Cisco logo are trademarks of Cisco in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-4396-0415
Copyright Information
Copyright © 1994–2015 NetApp, Inc. All rights reserved. Printed in the U.S. No part of this document covered by copyright may be reproduced in any form or by any means—graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval system—without prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice. NetApp assumes no responsibility or liability arising from the use of products described herein, except as expressly agreed to in writing by NetApp. The use or purchase of this product does not convey a license under any patent rights, trademark rights, or any other intellectual property rights of NetApp.
The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).