citrix xendesktop: dealing with failure - syn408

22
SYN408: XenDesktop 7.6 Architecture: dealing with failure Tom Gamull – Ericsson Consulting Manager Citrix Synergy – May 2015 @magicalyak

Upload: tom-gamull

Post on 08-Aug-2015

266 views

Category:

Technology


0 download

TRANSCRIPT

SYN408: XenDesktop 7.6 Architecture:

dealing with failure

Tom Gamull – Ericsson Consulting ManagerCitrix Synergy – May 2015

@magicalyak

WHAT WOULD YOU SAY YOU DO HERE?

2

Prevent Failures to begin with•Failures are bad events

•Today’s technology should be bulletproof

• Is 99.999% uptime the new normal?

“The perfect is the enemy of the good” - Voltaire

Our thinking is brokenCustomer: “I can’t get to my

desktop”

Support/Admin: “The desktops aren’t working because storage

failed”

CIO/Boss: “We need to ensure storage never fails”

Solution• Upgrade/Redundant SAN

• Somehow believe replication can occur without penalty (sales guy promised)

• Storage stays up!

Netflix Chaos Monkey

2010: Netflix moves to AWS2011: US-East Outage - Netflix posts lessons learned

The best way to avoid failure is to fail constantly

Since 2013: Chaos Monkey is run in Production except holidays and weekends

Before you buy more stuff – try this• How do you respond to events today?

• How long to identity them?• How long to solve them?

• Mean time before Failure is legacy• Focus on Mean time to Resolution or Cycle time

MTBF

VS

MTTR

Before you buy more stuff – try this

• How are you rolling out Citrix or changes?• AUTOMATE!!!

• RULE: If you do it twice, it should be automated• Focus on reducing Cycle time

• time(what is wrong) + time(how to fix it) + time(implement fix) = cycle time

• Immutable Servers• Servers are rebuilt from scratch for changes

Survive Failure - Architecture

• Does Citrix still work if:• Your storage fails (SAN, Local,

whatever)?• Your database fails?• NetScaler fails?

• What can your users handle?• Most can handle getting logged

off if they can log in again• Most can NOT handle

• Applications hangs• Print failures• Can’t log in or connect Source: theoatmeal.com

User Profiles and Folders

• Redirect Folders as much as possible• This is where data that people use live (My Docs, Downloads, etc).

• Profiles• Profiles should be as light as possible• Can you use mandatory profile settings?

• Replicate profiles across 2 data centers• Profiles are not going to work on DFS-R without corruption (except one-way)• Active/Passive only (not active/active)• Split users so some are active for one data center, passive for the other

• Use cloud storage• Hack OneDrive for My Docs - https://office365drivemap.codeplex.com/

Storage / DB

• Use redundancy in the software, not hardware

• PVS fails over on the fly (not for CIFS/SMB though!)

• Local disk with PVS is better than an expensive SAN (and likely performs better, esp if you have SSD local)

Local Disk on ServerWhiptail_61 Whiptail_62

Mirror Aware Databases:

Standalone Databases:

Primary DatabaseAPS-DCXA1SQL01

Mirror DatabaseAPS-DCXA2SQL02

Witness(no Database)

APS-DCXDCSQL03

PVS HA/DR Components

SQLDatabase

(highly available)

PVS Server

PVS Server

Vdisk Store

Vdisk Store

DHCP – can be split on 2008 R2/2012

TFTP can be load balanced with a hardware load

balancer

2 DifferentLocations

Mirror – storage resilientCluster – server resilient

Network

• Multiple Sites = Netscaler GSLB• Active/Passive is easiest to setup

• All components should be load balanced if possible• Even TFTP, double up on every component

• No NetScaler stags in Production• HA/Failover Pair

• They share the VIP but have separate IP info (so the VIP floats)• 1 NS + Hypervisor != Pair

NS LB

Zone US-East1

Zone US-West1

NS LB

NS LB

VIP

BLUE/GREEN

LB

App v1.0

App v1.0

App v1.1

App v1.1

Db v1.0

Db v1.1

Limiting Downtime• Like active/passive

Don’t use DNS for this• can’t trust TTL

When to use• ANY database/schema upgrade• Restore from backup is too large/long

• Like active/active but with a purpose• Canary in the coal mine

• See if someone screams!• Live to production

• Limiting Risk• Back up your data• All nodes use production database• Route new connections to new nodes

CANARY

LB

App v1.0

App v1.0

App v1.1

Db v1.0

External Firewall

Internal Firewall

2 MPX 11500

External Users

Internal Users

24,000 Zero

Clients

School Districts

Printers

CitrixPVS

XA1 SCVMMXA2 SCVMMXDC SCVMMAPPVPublishAPPVReport

SQL Mirror

Profiles

User Data

2 Delivery Controllers

2 Provisioning Servers

License Servers

AppV Cluster

SCVMM Server

Storefront

2008 R2 Desktops2008 R2

Applications

2 Delivery Controllers

2 Provisioning Servers

SCVMM Server

2008 R2 Desktops2008 R2

Applications

2 Delivery Controllers

2 Provisioning Servers

SCVMM Server

Windows 7 Desktops

Atlanta Public Schools Citrix Delivery Overview

Architect: Thomas GamullCompany: PresidioDate: 3/17/2014

File Server

Print Servers

CLL Data Center - 8,000 Concurrent Desktops for Students

XENAPP1

APS-DCXA1HOST01 APS-DCXA1HOST02

APS-DCXA1 Management Cluster

vSwitch

vSS-iSCSI-BvSS-PVS-XAPP1-B : 10.90.68.0/23 – VLAN 68vSS-XAPP1-A : 10.90.72.0/23 – VLAN 72vSS-Servers-A

APS-DCXA1PVS01 APS-DCXA1SF01 APS-DCXA1DDC01 APS-DCXA1VMM01 APS-DCXA1WDM01APS-DCXA1SQL01 APS-DCXA1APPV01

PVS02

SF02

DDC02

Rack LayoutNetScaler NetScaler

Top of Rack Switch Top of Rack Switch

Compute Blades

Compute Blades

Compute Blades

Compute Blades

Compute Blades

Compute Rack-MountLocal Disk Storage

Compute Rack-MountLocal Disk Storage

Compute Rack-MountLocal Disk StorageCompute Blades

iSCSI/FC Storage iSCSI/FC StorageStorage is always in pairs if needed• Prefer multiple smaller arrays over monolithic SAN• Let app/software do the work

Network redundancy is important• Load balancers can remove switch dependencies• Leverage common NIC cabling

Server choice can vary• Blades are dense but lack local disk• Rack Mounts are often very flexible

• Without automation you will have scaling problems

“Je n’ai fait celle-ci plus longue que parce que je n’ai pas eu le loisir de la faire plus courte.” – Blaise Pascal, Provincial Letters:

Letter XVI, 1657

English Translation: “If I had more time, I would have written a shorter

letter.”

Tom Gamull@magicalyakhttp://magicalyak.org