exchange server 2013 high availability

49
Exchange Server 2013 High Availability and Site Resilience Scott Schnoll, Senior Content Developer, Microsoft Corporation [email protected]

Upload: omoboy

Post on 28-Dec-2015

93 views

Category:

Documents


0 download

DESCRIPTION

EXchange

TRANSCRIPT

Page 1: Exchange Server 2013 High Availability

Exchange Server 2013

High Availability and Site Resilience

Scott Schnoll, Senior Content Developer, Microsoft Corporation

[email protected]

Page 2: Exchange Server 2013 High Availability

New in Exchange Server 2013

• Storage– Multiple databases per volume

– Autoreseed

– Self-recovery behaviors

– Lagged copy innovations

• High Availability– Managed Availability

– Database failover changes

– Best copy selection changes

– DAG network innovations

• Site Resilience

http://aka.ms/E15HATechEdAU

http://aka.ms/E15HATechEdNZ

http://aka.ms/E15HATechDaysNL

http://aka.ms/E15HATechEdNA

http://aka.ms/E15HATechEdEU

Page 3: Exchange Server 2013 High Availability

Agenda

• DAG architecture

– MSExchangeRepl

– MSExchangeDAGMgmt

– Cluster

– Crimson Channel

• Witness Server Placement

• Dynamic Quorum

• DAG Member Maintenance

Page 4: Exchange Server 2013 High Availability
Page 5: Exchange Server 2013 High Availability

DAG Replication Service

• Introduced in Exchange 2007 RTM

– Microsoft Exchange Replication service | MSExchangeRepl

– MSExchangeRepl.exe

– Runs on all Mailbox servers (not just DAG members)

– Communicates with Active Directory and other DAG members

• Includes 16 componentsActive Directory lookup Replay RPC server wrapper TPR API manager

Copy status lookup Remote data provider wrapper Support API manager

Replay core manager VssWriter Server locator manager

Seed manager Active Manager Health state tracker

Autoreseed manager Active Manager RPC server wrapper

Disk reclaimer manager Failure item manager

Page 6: Exchange Server 2013 High Availability

DAG Management Service

• Introduced in RTM CU2– Microsoft Exchange DAG Management service |

MSExchangeDagMgmt

– MSExchangeDagMgmt.exe

– Runs on all Mailbox servers (not just DAG members)

– Communicates with Active Directory and other DAG members

• Includes 4 components– Active Directory lookup

– Copy status lookup

– Monitoring

– Tracer instance

Page 7: Exchange Server 2013 High Availability

DAG Management Service

• Created for two primary reasons:– so the Replication service can have more focused functionality

– so Managed Availability actions can kill lower-priority activities

• Writes events to same place as Replication service– Application event log (source of MSExchangeRepl)

– HighAvailability crimson channel

• As we refactor more, other functions will move to this service– AutoReseed

– Disk Reclaimer

– Dynamic replay lag playdown

– Future AutoDAG copy layout and mobility features

Page 8: Exchange Server 2013 High Availability

Cluster Service

• Introduced in NT Server Enterprise Edition (1997)

– Cluster Service | ClusSvc

– Clussvc.exe

• Exchange DAGs use several Cluster components

– Quorum

– Membership and Node Management

– Networks and Heartbeating

– Cluster Registry

Page 9: Exchange Server 2013 High Availability

Cluster Service

• Quorum is required to mount databases

• Quorum is based on votes, not members

• Votes can be taken away manually or dynamically

– NodeWeight or Dynamic Quorum

• Exchange manages quorum model, not quorum

– Exchange management of quorum model based on nodes, not votes

– Removing votes requires manual configuration of quorum model

– Exchange will make incorrect quorum model management decisions if votes are manually removed at the cluster level

Page 10: Exchange Server 2013 High Availability

Cluster Registry

• Active Manager stores information in the cluster

registry for DAG members

– Registry changes are replicated immediately to all

DAG members

• Stored information is used as part of BCSS

Page 11: Exchange Server 2013 High Availability

Cluster RegistryIsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime?2013-07-15T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowed?True*

• ActiveServer– Name of the server where the database is currently mounted or

is expected to be mounted when mount operations complete

• LastMountServer– The name of the server where the database was last

successfully mounted

• LastMountedTime– The date and time stamp of the last time the database was

mounted

Page 12: Exchange Server 2013 High Availability

Cluster RegistryIsEntryExist?True*ActiveServer?ex2*LastMountedServer?ex2*LastMountedTime?2013-07-15T22:29:39*MountStatus?Mounted*IsAdminDismounted?False*IsAutomaticActionsAllowed?True*

• MountStatus– The current mount status for the database

– Possible values are mounted / dismounted

• IsAdminDismounted– Designates whether the current dismounted status of the database is the

result of administrator action

– Possible values are True / False

• IsAutomaticActionsAllowed– Designates whether the database can be automatically activated by AM

– Possible values are True / False

Page 13: Exchange Server 2013 High Availability

Cluster Registry

• Last Log

– Entry for each database copy in the DAG (named by the database

GUID)

– Stores the last sequence number of the last generated log (in decimal)

Page 14: Exchange Server 2013 High Availability

Cluster Networking

• Cluster provides network heartbeating for all

networks

• Heartbeat tolerances are configurable• D cluster-1 SameSubnetDelay 1000 (0x3e8)

• D cluster-1 CrossSubnetDelay 1000 (0x3e8)

• D cluster-1 SameSubnetThreshold 5 (0x5)

• D cluster-1 CrossSubnetThreshold 5 (0x5)

Page 15: Exchange Server 2013 High Availability

Cluster Networking

• Cluster Network Communications

– UDP unicast on port 3343

– Heartbeats between nodes are TCP

– IPv6 is supported for cluster IP addresses

• Windows Network Orders still important

– MAPI network at top of binding order

– Followed by replication networks

– Followed by iSCSI networks

Page 16: Exchange Server 2013 High Availability

Crimson Channel

• Applications and Services logs– Area of event log used by applications for logging and internal

communication

– Store events from a single application or component rather than events that might have system-wide impact

– This is referred to as an application's crimson channel

• Exchange 2013 has multiple channels– ActiveMonitoring

– HighAvailability

– MailboxDatabaseFailureItems

– ManagedAvailability

– PushNotifications

– Troubleshooters

Page 17: Exchange Server 2013 High Availability

Crimson Channel

Page 18: Exchange Server 2013 High Availability
Page 19: Exchange Server 2013 High Availability

Witness Server

• A server that participates in a failover cluster with an even number of members– Is not a member of the cluster

– Does not contain a full copy of quorum data

– Represented by File Share Witness resource• Created when Node and File Share quorum model used

• Uses IsAlive Check for availability– If witness server or share is not available, cluster core resources are failed

and moved to another node

– If another node does not bring witness resource online, the resource remains in a Failed state, with restart attempts every 60 minutes

– If needed for quorum, but cannot be brought online, quorum will be lost

Page 20: Exchange Server 2013 High Availability

Witness Server

• A lock is not actively maintained on the witness

• When it becomes necessary to obtain an additional vote to maintain quorum

– An SMB file lock is placed on the witness.log file by one node

– Node paxos information is incremented by locking node and the updated paxos tag written to the witness.log file

• When it is no longer needed to maintain quorum

– The lock on the witness.log file is released

Page 21: Exchange Server 2013 High Availability

Windows Failover Clustering

• Node that locks witness.log retains the vote

– Nodes in contact with the locking node are in the majority and maintain quorum

– Nodes not in contact with the locking node are in the minority and lose quorum

• Nodes not owning cluster core resources wait 6 seconds prior to attempting to lock the FSW (arbitrationDelay)

Page 22: Exchange Server 2013 High Availability

Windows Failover Clustering•Cluster Core Resources•Sequence #: 20

•Sequence #: 20

Cluster state change –node owning cluster core resources locks FSW – updates sequence number

•Cluster Core Resources•Sequence #: 21

•Lock witness.log•Sequence #: 21

Challenging node attempts witness lock. Lock already exists –sequence # higher, challenge not successful.

All nodes available. FSW lock released. Changes replicated, sequence numbers in sync.

•Sequence #: 22

•Cluster Core Resources•Sequence #: 22

0 1 5432 6 7 111098 12 13 161514

Page 23: Exchange Server 2013 High Availability

Windows Failover Clustering•Cluster Core Resources•Sequence #: 20

Cluster state change –node owning cluster core resources unavailable.

•Cluster Core Resources•Sequence #: 21

•Lock witness.log•Sequence #: 21

Challenging node attempts witness lock. No lock exists, lock successful, sequence number updated.

All nodes available. FSW lock released. Changes replicated, sequence numbers in sync.

•Sequence #: 22

•Cluster Core Resources•Sequence #: 22•Sequence #: 20

0 1 5432 6 7 111098 12 13 161514

Page 24: Exchange Server 2013 High Availability

Witness Server Placement

• Basic guidance for placement of witness server in Exchange 2010

“We recommend that you use a Hub Transport server running on Microsoft Exchange Server 2010 in the Active Directory site containing the DAG. This allows the witness server and directory to remain under the control of an Exchange administrator.”

“If your DAG is extended to multiple datacenters, we recommend deploying the witness server in the datacenter that is considered to be the primary datacenter.”

Page 25: Exchange Server 2013 High Availability

Witness Server Placement

• Exchange 2013 guidance more complicated due to

new options introduced by architectural changes

– Options that were not recommended or possible in

previous versions of Exchange are now possible,

such as a third location (third physical datacenter or a

branch office)

Page 26: Exchange Server 2013 High Availability

Witness Server Placement

• Ultimately, the placement of a DAG’s witness server

depends on business requirements and the options

available to the organizationDeployment Scenario Recommendations

Single DAG deployed in a single datacenter Locate witness server in the same datacenter as DAG members

Single DAG deployed across two datacenters; no additional locations available

Locate witness server in primary datacenter

Multiple DAGs deployed in a single datacenter Locate witness server in the same datacenter as DAG members. Additional options include:• Using the same witness server for multiple DAGs• Using a DAG member to act as a witness server for a different DAG

Multiple DAGs deployed across two datacenters Locate witness server in the same datacenter as DAG members. Additional options include:• Using the same witness server for multiple DAGs• Using a DAG member to act as a witness server for a different DAG

Single or Multiple DAGs deployed across more than two datacenters

Locate the witness server in the datacenter where you want the majority of quorum votes to exist

Page 27: Exchange Server 2013 High Availability

Witness Server Placement

• A DAG’s witness server can be deployed in a third

location for automatic site resilience

– The third location must have network infrastructure

and connectivity that is isolated from network failures

that affect the two datacenters with Exchange

• For all DAGs, the availability of the witness server

should be on the Exchange administrator’s radar

Page 28: Exchange Server 2013 High Availability

Witness Server Placement

• Windows Azure is not supported for use as a

Witness Server for Exchange DAGs

– Azure does not support the required underlying

network configuration to enable an Azure file server

VM to act as a witness server

– More info at http://aka.ms/DAGAzure

• No IaaS or cloud providers are supported for

witness servers

Page 29: Exchange Server 2013 High Availability
Page 30: Exchange Server 2013 High Availability

Dynamic Quorum

• Windows Server 2012+ Cluster feature

– Enabled for all clusters by default

– Cluster quorum majority is determined by the set of nodes that are active members of the cluster at a given time

– This is different from Windows Server 2008 R2, where quorum majority is fixed, based on the cluster configuration

Page 31: Exchange Server 2013 High Availability

Dynamic Quorum

• Cluster dynamically manages vote assignment based on state of node

– When a node shuts down or crashes, it loses its vote

– When a node rejoins the cluster, it regains its vote

• Cluster can dynamically increase or decrease the number of votes needed to maintain quorum and keep running

– Enables the cluster to maintain availability during sequential node failures or shutdowns

Page 32: Exchange Server 2013 High Availability

Dynamic Quorum

• It is now possible for a cluster to keep running on

the last surviving cluster node

• If the cluster has quorum, number of votes needed

for quorum can be adjusted down to one node

• This is called the “Last Man Standing” scenario

Page 33: Exchange Server 2013 High Availability

Dynamic Quorum

• Dynamic quorum management does not allow the cluster to sustain a simultaneous failure of a majority of voting members

• To continue running, the cluster must always have a quorum majority at the time of a node shutdown or failure

• If you explicitly remove the vote of a node, the cluster cannot dynamically add or remove that vote

Page 34: Exchange Server 2013 High Availability

Dynamic Quorum

Page 35: Exchange Server 2013 High Availability

Dynamic Quorum

XX

X

Page 36: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XX

Page 37: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XXX

Page 38: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XX

X

Page 39: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XX

X

Page 40: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XX

X

Page 41: Exchange Server 2013 High Availability

Dynamic Quorum

XX

XX

X

X

Page 42: Exchange Server 2013 High Availability

Dynamic Quorum

• Use Get-ClusterNode to verify DynamicWeight property of Node

– 0 = no quorum vote

– 1 = quorum vote

Get-ClusterNode <Name> | ft name, *weight, state

• Verify vote assignment with Validate Cluster Quorum test

Name DynamicWeight NodeWeight State

---- ------------------------- ------

EX1 1 1 Up

Page 43: Exchange Server 2013 High Availability

Dynamic Quorum and DAGs

• Dynamic quorum does work with DAGs

• Exchange is not dynamic quorum-aware

• Dynamic quorum does not change quorum

requirements for DAGs

• All internal DAG testing is performed with dynamic

quorum enabled

• Dynamic quorum is enabled in Office 365

Page 44: Exchange Server 2013 High Availability

Dynamic Quorum and DAGs

• Cluster team guidance:– “Selecting this option generally increases the availability of the

cluster. By default the option is enabled, and it is strongly recommended to not disable this option. This option allows the cluster to continue running in failure scenarios that are not possible when this option is disabled.”

• Exchange team guidance:– Leave it enabled for majority of DAG members

– Don’t factor it into availability plans• The advantage is that, in some cases where 2008 R2 would have lost

quorum, 2012 can maintain quorum; this only applies to a few cases, and should not be relied upon when planning a DAG

Page 45: Exchange Server 2013 High Availability
Page 46: Exchange Server 2013 High Availability

DAG Member Maintenance

• Basic guidance for DAG member maintenance in Exchange 2010– Run StartDagServerMaintenance.ps1 to put DAG member

in maintenance mode

– Perform the maintenance (e.g., install the update rollup)

– Run StopDagServerMaintenance.ps1 to take DAG member out of maintenance mode and put it back into production

– Optionally rebalance the DAG by using RedistributeActiveDatabases.ps1

Page 47: Exchange Server 2013 High Availability

DAG Member Maintenance• Exchange 2013 guidance more complicated

– Go into Maintenance ModeSet-ServerComponentState <Server> -Component HubTransport -State Draining -Requester Maintenance

Set-ServerComponentState <Server> -Component UMCallRouter –State Draining –Requester Maintenance

Restart-Service MSExchangeTransport

Redirect-Message -Server <Server> -Target <FQDNTarget>

Suspend-ClusterNode <Server>

Set-MailboxServer <Server> -DatabaseCopyActivationDisabledAndMoveNow $True

Set-MailboxServer <Server> -DatabaseCopyAutoActivationPolicy Blocked

Set-ServerComponentState <Server> -Component ServerWideOffline -State Inactive -Requester Maintenance

– Verify Maintenance ModeGet-ServerComponentState <Server> | ft Component,State -Autosize

Get-MailboxServer <Server> | ft DatabaseCopy* -Autosize

Get-ClusterNode <Server> | fl

Get-Queue

Page 48: Exchange Server 2013 High Availability

DAG Member Maintenance

• Exchange 2013 guidance more complicated

– Go into Production ModeSet-ServerComponentState <Server> -Component ServerWideOffline -State Active -Requester Maintenance

Set-ServerComponentState <Server> -Component UMCallRouter –State Active –Requester Maintenance

Resume-ClusterNode <Server>

Set-MailboxServer <Server> -DatabaseCopyActivationDisabledAndMoveNow $False

Set-MailboxServer <Server> -DatabaseCopyAutoActivationPolicy Unrestricted

Set-ServerComponentState <Server> -Component HubTransport -State Active -Requester Maintenance

Restart-Service MSExchangeTransport

– Verify Production ModeGet-ServerComponentState <Server> | ft Component,State -Autosize

Get-MailboxServer <Server> | ft DatabaseCopy* -Autosize

Get-ClusterNode <Server> | fl

Get-Queue

Page 49: Exchange Server 2013 High Availability

Thank you!

Questions?

Scott [email protected]: @SchnollBlog: http://aka.ms/schnoll