cluster extension for xp and eva

© 2006 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Cluster Extension for XP and EVA

2007

Dankwart Medger – Trident Consulting S.L.

CLX/Cluster overview

26-Oct-06Till Stimberg, SWD EMEA3

Protection level (Distance)

• Wide variety of interconnect options• Regional or wide-area protection• Support local to global Disaster Tolerant solutions

Data Currency (Recovery Point Object)

• Synchronous or asynchronous options available• Data consistency is always assured

Performance requirements

• Asynchronous Continuous Access provides minimum latency across extended distances

• Performance depends on bandwidthto remote data center

Failover time(Recovery Time Object)

• Manual failover to secondary site• Fully automated failover with

geographically dispersed clusters on HP-UX, Solaris, AIX,Linux, Windows

Disaster Tolerant Design Considerations


App A

App B

App BApp A Cluster

Server ClusteringPurpose• Protect against failures at the

host level− Server failure− Some infrastructure failures

• Automated failover • incl. necessary arbitration

• Local distances

Limits• Does not protect against

− Site disaster− Storage failure− Core infrastructure failure

• A major disaster can mean full restore from tape− tapes should therefore be

stored off-site


Storage ReplicationPurpose• Copy of your data in a remote

site• In case of a major disaster on the

primary site− no tape restore necessary− data still available on remote site− operation can be resumed on

remote site• Long distances through

FC Extension technologies and async replication technologies

Limits• Human intervention to resume

operation on remote site• Standby system difficult to

maintain

App A

App B

App BApp A

Array based

replication

WANCluste

r


The solution Cluster Extension/Metrocluster combines

with

to build a failover cluster spanning two data centers.

• Benefits− Fully automated application failover even in case of site or storage failure

• No manual intervention• No server reboots, no presentation changes, no SAN changes

− Intelligent failover decision based on status checking and user settings• No simple failover script

− Integrated into standard OS cluster solution• No change to how you manage your cluster today

− Host IO limited to local array• Reducing intersite traffic, enabling long distance, low bandwidth setups

the remote replication capabilities of the EVA and

XP

the automated failover capabilities of a standard

server cluster


Cluster Extension – the goal

App A

App B

Continuous Access EVA/XP

App A App A

App B

Automated by CLX

Arbitrator Node*

*type of arbitrator depends on cluster

App A

Slide is animated


HP – XP HP – EVA

HP-UXMC/SG

Metroclustersync & async, journaling CA

Metroclustersync CA (future: async)

WindowsMSCS

Cluster Extensionsync & async, journaling CA

Cluster Extensionsync CA (future: async)

SolarisVCS

Cluster Extensionsync & async, journaling CA future

AIXHACMP


future

LinuxMC/SG


Cluster Extensionsync CA (future: async)

VmWare future future

Automated failover solutionsavailability for all major platforms

26-Oct-06Till Stimberg, SWD EMEA

Cluster extension for WindowsCluster Extension XP Cluster Extension EVA

Array support XP48/128/512/1024/10000/12000 EVA 3000/4000/5000/6000/8000

OS support Windows 2000 Advanced Server and Data Center Edition

Windows 2003 Server Standard/Enterprise (32/64 bit) and Datacenter Edition (64bit)

Windows 2003 Server Standard/Enterprise x64 Edition

Windows 2003 Server Standard/Enterprise (32/64 bit) and Datacenter Edition (64bit)

Windows 2003 Server Standard/Enterprise x64 Edition

Cluster Support

MS Cluster (MS certified as geographically dispersed cluster solution)

MS Cluster (MS certified as geographically dispersed cluster solution)

Replication technology

Continuous Access XP (synchronous, asynchronous, journaling)

Continuous Access EVA (synchronous, asynchronous planned)

Distance, inter-site technology

No CLX specific limits; must stay within cluster and replication limitations

500 km with not more than 20 ms roundtrip latency

Arbitration CLX Quorum Filter Service (Windows 2000/2003) or MS Majority Node Set (Windows 2003) (including file share witness)

MS Majority Node Set (including file share witness)

Licensing Licensed per cluster node Licensed per cluster node

26-Oct-06Till Stimberg, SWD EMEAApr 21, 202310 HP confidential

Cluster integrationexample: CLX for Windows

File Share

Network Name

IP Address

Physical Disk

CLX

− All Physical Disk resources of one Resource Group depend on a CLX resource

− Very smooth integration

Example taken from CLX EVA


CLX EVA Resource Parameters

Cluster node – data center location

Failover behavior settingSMI-S communication settings

SMA – data center location

DR Group for which the CLX resource is responsible for

• All dependent disk resources (Vdisks) must belong to that DR Group

• This field must contain the full DR Group name including the „\Data Replication\“ folder and is case sensitive

Data concurrence settings

EVA – data center location

Pre/Post Exec Scripts


CLX XP Resource Parameters

Cluster node – data center location

Failover behavior setting CA resync setting Pre/Post Exec Scripts

Fence Level settings

XP arrays and Raidmanager Library Instances

Device Group managed by this CLX resource

• All dependent disk resources must belong to that Device Group

Cluster Arbitration and CLX for Windows


Local Microsoft Cluster – Shared Quorum disk

Quorum

App A

App B

App A

App B

Quorum

Traditional MSCS uses

−Shared Quorum disk• keep the quorum log• keep a copy of the

cluster configuration• propagate registry

checkpoints• arbitration, if LAN-

connectivity is lost

−Shared application disks

• store the application data


Challenges with dispersed MSCS• Managing data disks

−Check data disk pairs on failover

−Allow data disk failover only if current and consistent

• Managing quorum disk (for a traditional shared quorum cluster)−Mirror quorum disk to remote disk array

−Implement quorum disk pair and keep challenge/defense protocol working as if it is a single shared resource

−Filter SCSI Reserve/Release/Reset and any necessary IO commands without performance impacts

−Prevent split-brain phenomena


Majority Node Set Quorum (1)• New Quorum mechanism

introduced with Windows 2003

• Shared application disks− store the application data

• Quorum data on local disk− used to keep a copy of the

cluster configuration− synchronized by the Cluster

Service− No common quorum log and

no common cluster configuration available => changes to the cluster configuration are only allowed, when a majority of nodes is online and can communicate.

App A

App B

App A

App B

Quorum


Majority Node Set Quorum (2)• MNS arbitration rule:

− In case of a failure, the cluster will survive, if a majority of nodes is still available

− In case of a split site situation, the site with the majority will survive

− Only nodes which belong to the majority are allowed to keep up the cluster service and could run applications. All others will shut down the cluster service.

App A

App B

Quorum

App A

App B

Quorum

The majority is defined as:(<number of nodes configured in the cluster>/2) + 1

Slide is animated


Majority Node Set Quorum (3)

App A

App B

App A

Quorum

App B

Slide is animated


Majority Node Set Quorum (3)

App A

App B

Quorum # cluster nodes # node failures

2 0

3 1

4 1

5 2

6 2

7 3

8 3

App A

App B

Quorum

App A

Slide is animated


Majority Node Set Quorum (4)- File Share Witness• What is it?

− A patch for Windows 2003 SP1 clusters provided by Microsoft (KB921181)

• What does it do?− Allows the use of a simple file share to provide a vote for an MNS quorum-based 2-

node cluster• In addition to introducing the file share witness concept, this patch also

introduces a configurable cluster heartbeat

• What are the benefits?− The „arbitrator“ node is no longer a full cluster member.

• A simple file share can be used to provide this vote.• No single subnet requirement for network connection to the arbitrator.

− One arbitrator can serve multiple clusters. However, you have to set up a separated share for each cluster.

− The abitrator exposing the share can be • a standalone server• a different OS architecture (e.g. a 32-bit Windows server providing a vote for a

IA64 cluster)


Majority Node Set Quorum (5)- File Share Witness

# cluster nodes

# node failures

2 0

3 1

4 1

5 2

6 2

7 3

8 3

App A

App B

App A

App B

\\arbitrator\share

Get vote

1 with MNS fileshare witness

App A

Slide is animated


Majority Node Set Quorum (6)- File Share Witness

\\arbitrator\share1

MNS Private Property: MNSFileShare = \\arbitrator\share1

MNS Private Property: MNSFileShare = \\arbitrator\share2

\\arbitrator\share2

Cluster 2

Cluster1


File Share Witness- Prerequisits• Cluster

− Windows 2003 SP1 & R2 (x86, x64, IA64*, EE and DC)− 2-node MNS quorum-based cluster

• Property will be ignored for >2 node clusters

• Arbitrator− OS requirements

• Windows 2003 SP1 or later − MS did not test earlier/other OS versions even though they should work

• Server OS is recommended for availability and security− File Share requirements

• One file share for each cluster for which the arbitrator provides a vote• 5 MB per share are sufficient

− The external share does not store the full state of the cluster configuration. Instead, the external share contains only data sufficient to help prevent split-brain syndrome and to help detect a partition-in-time

• Cluster Service account requires read/write permission • For highest availability, you might want to create a clustered file share/file

server

* There is no Windows Server 2003 R2 release for IA64 (Itanium)


File Share Witness/Arbitrator- What does it mean for CLX?Remember: File Share Witness only works with 2-node clusters

Abitrator Node requirement

CLX with traditional MNS CLX with MNS using file share witness

Cluster Membership Arbitrator is a full additional cluster memberand has full cluster configuration information.

Arbitrator is external to the cluster and has only minimal cluster configuration information.

Operating System Same Windows version as on other cluster nodes, e.g. if IA64 cluster, abitrator has to be IA64 server, as well.

Can be different Windows versions, e.g. 32-bit fileshare witness (arbitrator) can serve 64-bit cluster.

Hardware Determined by the OS. Arbitrator could be a smaller, less powerfull machine than the main nodes

Determined by the OS. Fileshare server could be a smaller, less powerfull machine than the main nodes.Due to the less strict OS requirements, the hardware selection is also more flexible.

Multiple Clusters One arbitrator node per cluster. One arbitrator can serve multiple clusters.

Location 3rd site

Network requirements Single subnet.Should NOT be dependent on a network route (physically) through one DC in order to reach the other DC.

Can be a routed network (different subnets).Should NOT be dependent on a network route (physically) through one DC in order to reach the other DC.


CLX XP Quorum Filter Service (QFS)• Component of CLX XP for Windows

−Required for Windows 2000

−Optional for Windows 2003 • can also use MNS

• QFS provides some benefits over MNS

• Functionality−Allowing use of Microsoft share quorum cluster across two

Data Center and XPs

−Implements filter drivers that intercept quorum arbitration commands and uses additional CA pairs to make cross site decision

−„external arbitrator“ for automated failover even in case of full site failure or split


CLX XP on Windows – LAN split

App A

App B

App A

Quorum

App B

Quorum

CTRL1

CTRL2

CTRL3

Quorum

CTRL1

CTRL2

CTRL3

reserved by

left node

reserved by

left node

App B

Slide is animated


CLX XP on Windows – site failure

App B

Quorum

CTRL1

CTRL2

CTRL3

reserved by

left node

App A

App A

Quorum

Quorum

CTRL1

CTRL2

CTRL3

reserved by

left node

App B

cdm

External Arbitrator

App A

reserved by

right node

Quorum

App A

Slide is animated


Majority Node Set Quorum Filter Service

Pro‘s− Solution owned by Microsoft− Works with both CLX EVA and XP

− MS prefered solution for geographically dispersed clusters

− Most likely the CLX solution going forward

Pro‘s− Shared quorum, hence can

survive node failovers down to a single node

− Allows asymmetric cluster setups

− Windows 2000 and Windows 2003

Con‘s − Requires symmetric node

setup− For >2 nodes one additional

node per cluster is required− Will only survive with a

majority of nodes− Forced majority requires

another downtime to reform original cluster

− Windows 2003 only

Con‘s− More intrusive− More difficult to maintain

across Service Packs and other updates

− Full site failure or split will first result in cluster service shutdown before external arbitrator kicks in (if quorum disk was in the remote data center)

Majority Node Set vs CLX XP Quorum Filter Service

Recommended quorum mechanism for new installs

Manual vs Automated failover – failover times


Automated failover

Question: „How long does a CLX failover take?“

Answer: „Depends !!!“

CLX failover is first of all still a cluster failover−There are components influencing the total

application failover time, which are out of control of CLX• Failure recognition, cluster arbitration, application startup

−The CLX component of the total failover time also depends on many factors.


Factors affecting the automated failover times in a CLX for Windows cluster

Recognize failure

Cluster arbitration

Start application on other nodeCLX failover

Phase Factors influencing time

Time to recognize failure • Resource failures−resources timeouts (e.g. disk timeouts)−potentially resources restarts until the cluster moves the group

to another node• Node(s), network or whole DC failure

−Cluster heartbeat timeouts to decide that communication to node(s) is lost

Time for cluster arbitration

• Only happens in case of a node or network failure• Time depends on network latency, heartbeat settings and cluster size

Time for CLX to failover replication

• Type of components that fails−Node failure−Storage failure−Management Server failure−Intersite communication

Time to start application on surviving node

• Application type and size• Required recovery steps (e.g. log replay)

(5 sec – 5 min)


Manual vs. automated failover times• Typical example of a manual failover is a Stretched cluster

− Cluster stretched across two sites, but all nodes accessing the same array which replicates the data to a remote partner.

− Even in case of a node failure the primary storage array will be used (across a remote link if node is in remote data center)

− A storage or site failure will bring down the cluster requiring manual intervention to start cluster from remote array.

• Steps involved in case of a storage failure− Notification of operator (15 min*)− Evaluation of the situation and necessary recovery steps (30 min*)− Shutdown surviving cluster node (5 min*)− Replication failover (2 min*)− Start up surviving cluster nodes (10 min*)

− The effort mulitplies with the number of application/clusters/arrays being affected

*times are just examples and will vary depending on the situation and setup. A full site disaster for instance might involve much more troubleshooting and evalution time.


Manual vs. automated failover times- single cluster

Notification of operator

Evaluation of the situation and necessary recovery steps S

hut

dow

n Start Servers

time (min)

Failo

vermanu

al

automated (CLX)

Recognize failure

Cluster arbitrat

or

CLX failove

r

Application startup

Total = 62 min*

Total = 10 min*

5 10 15 20 25 30 35 40 45 50 55 60 65

*times are just examples and will vary depending on the situation and setup


Manual vs. automated failover times- multiple clusters

time (min)

manual

automated (CLX)

Total = 96 min*

Total = 10 min*

5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

cluster 1

cluster 2

cluster 3

cluster 1cluster 2cluster 3

*times are just examples and will vary depending on the situation and setup


Other advantage• CLX helps to avoid human mistakes

−Manual failover operations introduce the risk of making mistakes, failing over the wrong DR Group, etc.

• CLX simplifies planned failover for maintenance−Similar to disaster failover, just faster

−Manual failover still requires same steps besides the notification and evaluation

• Failback is as simple as a failover−Once the primary site is restored, it‘s just another cluster

failover

−Manual failback is as complex and intrusive as maintenance failover


External information• Cluster Extension EVA

− http://h18006.www1.hp.com/products/storage/software/ceeva/index.html• Cluster Extension XP

− http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/cluster/index.html

• Link to Metrocluster EVA− http://h71028.www7.hp.com/enterprise/cache/108988-0-0-0-121.html

• Link to Metrocluster XP− http://h71028.www7.hp.com/enterprise/cache/4181-0-0-0-121.html

• Disaster-Tolerant Solutions Library (Solutions for Serviceguard)− http://h71028.www7.hp.com/enterprise/cache/4190-0-0-0-121.html

• Continental Cluster− http://h71028.www7.hp.com/enterprise/cache/4182-0-0-0-121.html

• CA EVA− http://h18006.www1.hp.com/products/storage/software/conaccesseva/index.html

• CA XP− http://www.hp.com/products1/storage/products/disk_arrays/xpstoragesw/

continuousaccess/index.html• CLX EVA migration whitepaper and Exchange replication whitepaper (incl.

CLX) − http://h18006.www1.hp.com/storage/arraywhitepapers.html

cluster extension for xp and eva

Documents

failover cluster

standard server cluster

swd emeacluster extension

advanced server

animatedtill stimberg

maintaintill stimberg

sitetill stimberg

remote siteoperation