cluster extension for xp and eva
DESCRIPTION
Cluster Extension for XP and EVA. 2007. Dankwart Medger – Trident Consulting S.L. CLX/Cluster overview. Disaster Tolerant Design Considerations. Wide variety of interconnect options Regional or wide-area protection Support local to global Disaster Tolerant solutions. - PowerPoint PPT PresentationTRANSCRIPT
© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice
Cluster Extension for XP and EVA
2007
Dankwart Medger – Trident Consulting S.L.
CLX/Cluster overview
26-Oct-06Till Stimberg, SWD EMEA3
Protection level (Distance)
• Wide variety of interconnect options• Regional or wide-area protection• Support local to global Disaster Tolerant solutions
Data Currency (Recovery Point Object)
• Synchronous or asynchronous options available• Data consistency is always assured
Performance requirements
• Asynchronous Continuous Access provides minimum latency across extended distances
• Performance depends on bandwidthto remote data center
Failover time(Recovery Time Object)
• Manual failover to secondary site• Fully automated failover with
geographically dispersed clusters on HP-UX, Solaris, AIX,Linux, Windows
Disaster Tolerant Design Considerations
26-Oct-06Till Stimberg, SWD EMEA4
App A
App B
App BApp A Cluster
Server ClusteringPurpose• Protect against failures at the
host level− Server failure− Some infrastructure failures
• Automated failover • incl. necessary arbitration
• Local distances
Limits• Does not protect against
− Site disaster− Storage failure− Core infrastructure failure
• A major disaster can mean full restore from tape− tapes should therefore be
stored off-site
26-Oct-06Till Stimberg, SWD EMEA5
Storage ReplicationPurpose• Copy of your data in a remote
site• In case of a major disaster on the
primary site− no tape restore necessary− data still available on remote site− operation can be resumed on
remote site• Long distances through
FC Extension technologies and async replication technologies
Limits• Human intervention to resume
operation on remote site• Standby system difficult to
maintain
App A
App B
App BApp A
Array based
replication
WANCluste
r
26-Oct-06Till Stimberg, SWD EMEA6
The solution Cluster Extension/Metrocluster combines
with
to build a failover cluster spanning two data centers.
• Benefits− Fully automated application failover even in case of site or storage failure
• No manual intervention• No server reboots, no presentation changes, no SAN changes
− Intelligent failover decision based on status checking and user settings• No simple failover script
− Integrated into standard OS cluster solution• No change to how you manage your cluster today
− Host IO limited to local array• Reducing intersite traffic, enabling long distance, low bandwidth setups
the remote replication capabilities of the EVA and
XP
the automated failover capabilities of a standard
server cluster
26-Oct-06Till Stimberg, SWD EMEA7
Cluster Extension – the goal
App A
App B
Continuous Access EVA/XP
App A App A
App B
Automated by CLX
Arbitrator Node*
*type of arbitrator depends on cluster
App A
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA8
HP – XP HP – EVA
HP-UXMC/SG
Metroclustersync & async, journaling CA
Metroclustersync CA (future: async)
WindowsMSCS
Cluster Extensionsync & async, journaling CA
Cluster Extensionsync CA (future: async)
SolarisVCS
Cluster Extensionsync & async, journaling CA future
AIXHACMP
Cluster Extensionsync & async, journaling CA
future
LinuxMC/SG
Cluster Extensionsync & async, journaling CA
Cluster Extensionsync CA (future: async)
VmWare future future
Automated failover solutionsavailability for all major platforms
26-Oct-06Till Stimberg, SWD EMEA
Cluster extension for WindowsCluster Extension XP Cluster Extension EVA
Array support XP48/128/512/1024/10000/12000 EVA 3000/4000/5000/6000/8000
OS support Windows 2000 Advanced Server and Data Center Edition
Windows 2003 Server Standard/Enterprise (32/64 bit) and Datacenter Edition (64bit)
Windows 2003 Server Standard/Enterprise x64 Edition
Windows 2003 Server Standard/Enterprise (32/64 bit) and Datacenter Edition (64bit)
Windows 2003 Server Standard/Enterprise x64 Edition
Cluster Support
MS Cluster (MS certified as geographically dispersed cluster solution)
MS Cluster (MS certified as geographically dispersed cluster solution)
Replication technology
Continuous Access XP (synchronous, asynchronous, journaling)
Continuous Access EVA (synchronous, asynchronous planned)
Distance, inter-site technology
No CLX specific limits; must stay within cluster and replication limitations
500 km with not more than 20 ms roundtrip latency
Arbitration CLX Quorum Filter Service (Windows 2000/2003) or MS Majority Node Set (Windows 2003) (including file share witness)
MS Majority Node Set (including file share witness)
Licensing Licensed per cluster node Licensed per cluster node
26-Oct-06Till Stimberg, SWD EMEAApr 21, 202310 HP confidential
Cluster integrationexample: CLX for Windows
File Share
Network Name
IP Address
Physical Disk
CLX
− All Physical Disk resources of one Resource Group depend on a CLX resource
− Very smooth integration
Example taken from CLX EVA
26-Oct-06Till Stimberg, SWD EMEA
CLX EVA Resource Parameters
Cluster node – data center location
Failover behavior settingSMI-S communication settings
SMA – data center location
DR Group for which the CLX resource is responsible for
• All dependent disk resources (Vdisks) must belong to that DR Group
• This field must contain the full DR Group name including the „\Data Replication\“ folder and is case sensitive
Data concurrence settings
EVA – data center location
Pre/Post Exec Scripts
26-Oct-06Till Stimberg, SWD EMEA
CLX XP Resource Parameters
Cluster node – data center location
Failover behavior setting CA resync setting Pre/Post Exec Scripts
Fence Level settings
XP arrays and Raidmanager Library Instances
Device Group managed by this CLX resource
• All dependent disk resources must belong to that Device Group
Cluster Arbitration and CLX for Windows
26-Oct-06Till Stimberg, SWD EMEA14
Local Microsoft Cluster – Shared Quorum disk
Quorum
App A
App B
App A
App B
Quorum
Traditional MSCS uses
−Shared Quorum disk• keep the quorum log• keep a copy of the
cluster configuration• propagate registry
checkpoints• arbitration, if LAN-
connectivity is lost
−Shared application disks
• store the application data
26-Oct-06Till Stimberg, SWD EMEA15
Challenges with dispersed MSCS• Managing data disks
−Check data disk pairs on failover
−Allow data disk failover only if current and consistent
• Managing quorum disk (for a traditional shared quorum cluster)−Mirror quorum disk to remote disk array
−Implement quorum disk pair and keep challenge/defense protocol working as if it is a single shared resource
−Filter SCSI Reserve/Release/Reset and any necessary IO commands without performance impacts
−Prevent split-brain phenomena
26-Oct-06Till Stimberg, SWD EMEA16
Majority Node Set Quorum (1)• New Quorum mechanism
introduced with Windows 2003
• Shared application disks− store the application data
• Quorum data on local disk− used to keep a copy of the
cluster configuration− synchronized by the Cluster
Service− No common quorum log and
no common cluster configuration available => changes to the cluster configuration are only allowed, when a majority of nodes is online and can communicate.
App A
App B
App A
App B
Quorum
26-Oct-06Till Stimberg, SWD EMEA17
Majority Node Set Quorum (2)• MNS arbitration rule:
− In case of a failure, the cluster will survive, if a majority of nodes is still available
− In case of a split site situation, the site with the majority will survive
− Only nodes which belong to the majority are allowed to keep up the cluster service and could run applications. All others will shut down the cluster service.
App A
App B
Quorum
App A
App B
Quorum
The majority is defined as:(<number of nodes configured in the cluster>/2) + 1
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA18
Majority Node Set Quorum (3)
App A
App B
App A
Quorum
App B
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA19
Majority Node Set Quorum (3)
App A
App B
Quorum # cluster nodes # node failures
2 0
3 1
4 1
5 2
6 2
7 3
8 3
App A
App B
Quorum
App A
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA2020
Majority Node Set Quorum (4)- File Share Witness• What is it?
− A patch for Windows 2003 SP1 clusters provided by Microsoft (KB921181)
• What does it do?− Allows the use of a simple file share to provide a vote for an MNS quorum-based 2-
node cluster• In addition to introducing the file share witness concept, this patch also
introduces a configurable cluster heartbeat
• What are the benefits?− The „arbitrator“ node is no longer a full cluster member.
• A simple file share can be used to provide this vote.• No single subnet requirement for network connection to the arbitrator.
− One arbitrator can serve multiple clusters. However, you have to set up a separated share for each cluster.
− The abitrator exposing the share can be • a standalone server• a different OS architecture (e.g. a 32-bit Windows server providing a vote for a
IA64 cluster)
26-Oct-06Till Stimberg, SWD EMEA2121
Majority Node Set Quorum (5)- File Share Witness
# cluster nodes
# node failures
2 0
3 1
4 1
5 2
6 2
7 3
8 3
App A
App B
App A
App B
\\arbitrator\share
Get vote
1 with MNS fileshare witness
App A
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA2222
Majority Node Set Quorum (6)- File Share Witness
\\arbitrator\share1
MNS Private Property: MNSFileShare = \\arbitrator\share1
MNS Private Property: MNSFileShare = \\arbitrator\share2
\\arbitrator\share2
Cluster 2
Cluster1
26-Oct-06Till Stimberg, SWD EMEA2424
File Share Witness- Prerequisits• Cluster
− Windows 2003 SP1 & R2 (x86, x64, IA64*, EE and DC)− 2-node MNS quorum-based cluster
• Property will be ignored for >2 node clusters
• Arbitrator− OS requirements
• Windows 2003 SP1 or later − MS did not test earlier/other OS versions even though they should work
• Server OS is recommended for availability and security− File Share requirements
• One file share for each cluster for which the arbitrator provides a vote• 5 MB per share are sufficient
− The external share does not store the full state of the cluster configuration. Instead, the external share contains only data sufficient to help prevent split-brain syndrome and to help detect a partition-in-time
• Cluster Service account requires read/write permission • For highest availability, you might want to create a clustered file share/file
server
* There is no Windows Server 2003 R2 release for IA64 (Itanium)
26-Oct-06Till Stimberg, SWD EMEA2626
File Share Witness/Arbitrator- What does it mean for CLX?Remember: File Share Witness only works with 2-node clusters
Abitrator Node requirement
CLX with traditional MNS CLX with MNS using file share witness
Cluster Membership Arbitrator is a full additional cluster memberand has full cluster configuration information.
Arbitrator is external to the cluster and has only minimal cluster configuration information.
Operating System Same Windows version as on other cluster nodes, e.g. if IA64 cluster, abitrator has to be IA64 server, as well.
Can be different Windows versions, e.g. 32-bit fileshare witness (arbitrator) can serve 64-bit cluster.
Hardware Determined by the OS. Arbitrator could be a smaller, less powerfull machine than the main nodes
Determined by the OS. Fileshare server could be a smaller, less powerfull machine than the main nodes.Due to the less strict OS requirements, the hardware selection is also more flexible.
Multiple Clusters One arbitrator node per cluster. One arbitrator can serve multiple clusters.
Location 3rd site
Network requirements Single subnet.Should NOT be dependent on a network route (physically) through one DC in order to reach the other DC.
Can be a routed network (different subnets).Should NOT be dependent on a network route (physically) through one DC in order to reach the other DC.
26-Oct-06Till Stimberg, SWD EMEA27
CLX XP Quorum Filter Service (QFS)• Component of CLX XP for Windows
−Required for Windows 2000
−Optional for Windows 2003 • can also use MNS
• QFS provides some benefits over MNS
• Functionality−Allowing use of Microsoft share quorum cluster across two
Data Center and XPs
−Implements filter drivers that intercept quorum arbitration commands and uses additional CA pairs to make cross site decision
−„external arbitrator“ for automated failover even in case of full site failure or split
26-Oct-06Till Stimberg, SWD EMEAApr 21, 202330 HP confidential
CLX XP on Windows – LAN split
App A
App B
App A
Quorum
App B
Quorum
CTRL1
CTRL2
CTRL3
Quorum
CTRL1
CTRL2
CTRL3
reserved by
left node
reserved by
left node
App B
Slide is animated
26-Oct-06Till Stimberg, SWD EMEAApr 21, 202331 HP confidential
CLX XP on Windows – site failure
App B
Quorum
CTRL1
CTRL2
CTRL3
reserved by
left node
App A
App A
Quorum
Quorum
CTRL1
CTRL2
CTRL3
reserved by
left node
App B
cdm
External Arbitrator
App A
reserved by
right node
Quorum
App A
Slide is animated
26-Oct-06Till Stimberg, SWD EMEA32
Majority Node Set Quorum Filter Service
Pro‘s− Solution owned by Microsoft− Works with both CLX EVA and XP
− MS prefered solution for geographically dispersed clusters
− Most likely the CLX solution going forward
Pro‘s− Shared quorum, hence can
survive node failovers down to a single node
− Allows asymmetric cluster setups
− Windows 2000 and Windows 2003
Con‘s − Requires symmetric node
setup− For >2 nodes one additional
node per cluster is required− Will only survive with a
majority of nodes− Forced majority requires
another downtime to reform original cluster
− Windows 2003 only
Con‘s− More intrusive− More difficult to maintain
across Service Packs and other updates
− Full site failure or split will first result in cluster service shutdown before external arbitrator kicks in (if quorum disk was in the remote data center)
Majority Node Set vs CLX XP Quorum Filter Service
Recommended quorum mechanism for new installs
Manual vs Automated failover – failover times
26-Oct-06Till Stimberg, SWD EMEA34
Automated failover
Question: „How long does a CLX failover take?“
Answer: „Depends !!!“
CLX failover is first of all still a cluster failover−There are components influencing the total
application failover time, which are out of control of CLX• Failure recognition, cluster arbitration, application startup
−The CLX component of the total failover time also depends on many factors.
26-Oct-06Till Stimberg, SWD EMEA35
Factors affecting the automated failover times in a CLX for Windows cluster
Recognize failure
Cluster arbitration
Start application on other nodeCLX failover
Phase Factors influencing time
Time to recognize failure • Resource failures−resources timeouts (e.g. disk timeouts)−potentially resources restarts until the cluster moves the group
to another node• Node(s), network or whole DC failure
−Cluster heartbeat timeouts to decide that communication to node(s) is lost
Time for cluster arbitration
• Only happens in case of a node or network failure• Time depends on network latency, heartbeat settings and cluster size
Time for CLX to failover replication
• Type of components that fails−Node failure−Storage failure−Management Server failure−Intersite communication
Time to start application on surviving node
• Application type and size• Required recovery steps (e.g. log replay)
(5 sec – 5 min)
26-Oct-06Till Stimberg, SWD EMEA37
Manual vs. automated failover times• Typical example of a manual failover is a Stretched cluster
− Cluster stretched across two sites, but all nodes accessing the same array which replicates the data to a remote partner.
− Even in case of a node failure the primary storage array will be used (across a remote link if node is in remote data center)
− A storage or site failure will bring down the cluster requiring manual intervention to start cluster from remote array.
• Steps involved in case of a storage failure− Notification of operator (15 min*)− Evaluation of the situation and necessary recovery steps (30 min*)− Shutdown surviving cluster node (5 min*)− Replication failover (2 min*)− Start up surviving cluster nodes (10 min*)
− The effort mulitplies with the number of application/clusters/arrays being affected
*times are just examples and will vary depending on the situation and setup. A full site disaster for instance might involve much more troubleshooting and evalution time.
26-Oct-06Till Stimberg, SWD EMEA38
Manual vs. automated failover times- single cluster
Notification of operator
Evaluation of the situation and necessary recovery steps S
hut
dow
n Start Servers
time (min)
Failo
vermanu
al
automated (CLX)
Recognize failure
Cluster arbitrat
or
CLX failove
r
Application startup
Total = 62 min*
Total = 10 min*
5 10 15 20 25 30 35 40 45 50 55 60 65
*times are just examples and will vary depending on the situation and setup
26-Oct-06Till Stimberg, SWD EMEA39
Manual vs. automated failover times- multiple clusters
time (min)
manual
automated (CLX)
Total = 96 min*
Total = 10 min*
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95
cluster 1
cluster 2
cluster 3
cluster 1cluster 2cluster 3
*times are just examples and will vary depending on the situation and setup
26-Oct-06Till Stimberg, SWD EMEA40
Other advantage• CLX helps to avoid human mistakes
−Manual failover operations introduce the risk of making mistakes, failing over the wrong DR Group, etc.
• CLX simplifies planned failover for maintenance−Similar to disaster failover, just faster
−Manual failover still requires same steps besides the notification and evaluation
• Failback is as simple as a failover−Once the primary site is restored, it‘s just another cluster
failover
−Manual failback is as complex and intrusive as maintenance failover
26-Oct-06Till Stimberg, SWD EMEA41
External information• Cluster Extension EVA
− http://h18006.www1.hp.com/products/storage/software/ceeva/index.html• Cluster Extension XP
− http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/cluster/index.html
• Link to Metrocluster EVA− http://h71028.www7.hp.com/enterprise/cache/108988-0-0-0-121.html
• Link to Metrocluster XP− http://h71028.www7.hp.com/enterprise/cache/4181-0-0-0-121.html
• Disaster-Tolerant Solutions Library (Solutions for Serviceguard)− http://h71028.www7.hp.com/enterprise/cache/4190-0-0-0-121.html
• Continental Cluster− http://h71028.www7.hp.com/enterprise/cache/4182-0-0-0-121.html
• CA EVA− http://h18006.www1.hp.com/products/storage/software/conaccesseva/index.html
• CA XP− http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/
continuousaccess/index.html• CLX EVA migration whitepaper and Exchange replication whitepaper (incl.
CLX) − http://h18006.www1.hp.com/storage/arraywhitepapers.html