5000 persistent user scale out test with citrix xendesktop ... · pdf filewith citrix...

31
5000 Persistent User Scale out Test with Citrix XenDesktop and Atlantis Computing Last Update 28 January 2015 Mike Perks Kenny Bain Pawan Sharma

Upload: lekhanh

Post on 23-Feb-2018

256 views

Category:

Documents


0 download

TRANSCRIPT

5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis

Computing

Last Update 28 January 2015

Mike Perks

Kenny Bain

Pawan Sharma

ii 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Table of Contents

Executive summary ......................................................................................... 1

1 Lenovo Client Virtualization solution ...................................................... 2

1.1 Hardware components ............................................................................................. 2

1.2 Software components .............................................................................................. 6

2 Performance test methodology and tools ............................................... 8

2.1 Test methodology ..................................................................................................... 8

2.2 Login VSI ................................................................................................................. 8

2.3 VMware esxtop ........................................................................................................ 9

2.4 Superputty.............................................................................................................. 10

2.5 IBM FlashSystem performance monitor ................................................................. 10

3 Performance test hardware configuration .............................................11

3.1 System under test .................................................................................................. 11

3.2 Load framework ..................................................................................................... 13

4 Software configuration and setup .......................................................... 15

4.1 Setting up Management VMs ................................................................................. 15

4.2 Setting up master user VM..................................................................................... 15

4.3 Setting up master launcher VM .............................................................................. 16

4.4 Setting up Login VSI .............................................................................................. 16

4.5 Setting up Atlantis Computing software.................................................................. 16

4.6 Setting up 5000 persistent desktop VMs ................................................................ 17

5 Scale out performance results ............................................................... 18

5.1 Brokerless by using RDP ....................................................................................... 18

5.2 Citrix XenDesktop .................................................................................................. 22

Resources ....................................................................................................... 28

1 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Executive summary

This paper documents persistent desktop scale out performance testing that was done at the Lenovo solution

lab in Austin, Texas. The goal was to test 5000 persistent desktops by using a combination of Lenovo® servers

and networking, IBM® FlashSystem™

storage, and Atlantis Computing

® software.

This paper includes the following sections:

Section 1 describes the hardware and software components that

were used in the Lenovo Client Virtualization (LVC) solution for Citrix

XenDesktop.

Section 2 describes the methodology and tools that were used for

the scale out performance test including Login VSI.

Section 3 describes the hardware configuration that was used.

Section 4 describes the software configuration that was used.

Section 5 describes the performance results.

The Lenovo results show a best in class achievement of 5000 persistent desktops with Citrix XenDesktop.

This type of result has not been documented before, partly because 5000 persistent desktops require a

significant investment in storage. However, as shown in this paper, a combination of Atlantis Computing

software and IBM FlashSystem storage turns the usual I/O performance problem into a non-event. Without

data reduction technology, enterprise class flash storage can be expensive. The Atlantis Computing

de-duplication and compression facilities make it cost effective (the storage cost for 5000 users is less than

$130 per user, including the Atlantis license). Moreover, the performance of logging onto a desktop in 16

seconds and rebooting 5000 desktops in 20 minutes makes this system easy to use for users and IT staff.

A total of 35 Lenovo Flex System servers in 30U were used to support 5000 users with up to 160 persistent

users per server. The Lenovo servers provide a dense and cost effective solution in terms of CAPEX and

OPEX. The performance results show that the usual persistent desktop I/O problem was effectively eliminated

and the compute server performance is the driving factor. More compute servers can be added to support more

users or more compute intensive users; it all depends on the individual customer environment and user load.

For more information about the Lenovo Client Virtualization solution for Citrix XenDesktop, contact your

Lenovo sales representative or business partner. For more information about the Citrix XenDesktop reference

architecture that includes information on the LCV solution, performance benchmarks, and recommended

configurations, see this website: http://lenovopress.com/tips1278.

2 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

1 Lenovo Client Virtualization solution

The LCV solution includes compute servers, networking, storage, and clients. The compute servers use the

VMware ESXi hypervisor for user Virtual Machines (VMs) or management VMs. In this configuration, 10 Gb

Ethernet (10 GbE) networking is used for connectivity between compute servers and clients. A SAN network

that uses Fibre Channel is used for connectivity between compute servers and shared storage.

Citrix XenDesktop provides a connection broker service between clients and user VMs to support virtual

applications or virtual desktops. XenDesktop uses management VMs and supports multiple hypervisors,

although only ESXi is used for the scale out performance test. Atlantis Computing software provides an

important service, which substantially reduces the amount of IO from the virtual desktops to the shared

storage.

Figure 1 shows an overview of the main components in the LCV. The rest of this section describes the subset

of hardware and software components that are used for the scale out performance test.

x3550 M4/3650 M4

Storage

XenDesktopESXi

ServersHypervisorVirtual Desktops and Applications

BrokerClients

Flex x240 M4

ThinkServer RD350/RD450

Tablets

Laptops

Workstations

IBM Storwize

NetApp NAS + DAS

IBM FlashSystem

EMC VNX

Thin Clients

Desktops andAll-in-Ones

Figure 1: Overview of Lenovo Client Virtualization Solution

1.1 Hardware components

This section describes the hardware components that are used for the 5000 persistent user scale out

performance test, which includes Lenovo compute servers, Lenovo networking components, and IBM storage

components.

1.1.1 Flex System elements

Flex® System is an enterprise-class platform that is specifically created to meet the demands of a virtualized

data center and help clients establish a highly secure private cloud environment. Flex System includes the

following features:

Greatest choice for clients in processor type and OS platform all in the same chassis and is managed

from a single point of control.

3 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Flex System networking delivers 50% latency improvement through node-to-node (east-west) traffic

rather than routing everything through the top-of-rack (TOR) switch (north-south).

Figure 2: Flex System Enterprise Chassis, and Flex System compute nodes

For more information, see the following Flex System website:

ibm.com/systems/pureflex/overview.html

1.1.2 Flex System x240 Compute Node

The Flex System x240 Compute Node (as shown in Figure 3) is a high-performance Intel® Xeon®

processor-based server that offers outstanding performance for virtualization with new levels of processor

performance and memory capacity and flexible configuration options for a broad range of workloads. The Flex

System x240 Compute Node is ideal for virtualization, with maximum memory support (24 DIMMs and up to

768 GB of memory capacity), 10 GbE Integrated Virtual Fabric, and 8 Gb or 16 Gb Fibre Channel for high

networking bandwidth. The Flex System x240 Compute Node also supports Flex System Flash for up to eight

1.8-inch solid-state drives (SSDs) for maximum local storage.

Figure 3: Lenovo Flex System x240 Compute Node

1.1.3 Flex System x222 Compute Node

The Flex System x222 Compute Node (as shown in Figure 4) is a high-density blade server that is designed for

virtualization, dense cloud deployments, and hosted clients. The Flex System x222 Compute Node has two

independent compute nodes in the one mechanical package, which means that Flex System x222 has a

double-density design that allows up to 28 servers to be housed in a single 10U Flex System Enterprise

4 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Chassis. The Flex System x222 Compute Node supports up to 768 GB of memory capacity, 10 GbE Integrated

Virtual Fabric, and 16 Gb Fibre Channel for high networking bandwidth. The Flex System x222 Compute Node

also supports Flex System Flash for up to four 1.8-inch SSDs for maximum local storage.

Figure 4: Flex System x222 Compute Node

1.1.4 Flex System Fabric EN4093R 10Gb Scalable Switch

The Flex System Fabric EN4093R 10Gb Scalable Switch (as shown in Figure 5) provides unmatched

scalability, port flexibility, and performance. It also delivers innovations to address many networking concerns

today and provides capabilities that help you prepare for the future. This switch can support up to 64 10 Gb

Ethernet connections while offering Layer 2/3 switching, in addition to OpenFlow and "easy connect" modes. It

is designed to install within the I/O module bays of the Flex System Enterprise Chassis.

Figure 5: Flex System Fabric EN4093R 10 Gb Scalable Switch

For more information, see this website: ibm.com/redbooks/abstracts/tips0864.html

1.1.5 Flex System FC3171 8Gb SAN Switch

The Flex System FC3171 8Gb SAN Switch (as shown in Figure 6) is a full-fabric Fibre Channel component

with expanded functionality that is used in the Lenovo Flex System Enterprise Chassis. The SAN switch

supports high-speed traffic processing for Flex System configurations, and offers scalability in external SAN

size and complexity, and enhanced systems management capabilities. The FC3171 switch provides 14 internal

8 Gb Fibre Channel ports and 6 external ports and supports 2 Gb, 4 Gb, and 8 Gb port speeds.

5 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 6: Flex System FC3171 8 Gb SAN Switch

For more information, see this website: ibm.com/redbooks/abstracts/tips0866.html

1.1.6 Flex System FC5022 16Gb SAN Switch

The Flex System FC5022 16 Gb SAN Scalable Switch (as shown in Figure 7) is a high-density, 48-port,

16 Gbps Fibre Channel switch that is used in the Flex System Enterprise Chassis. The switch provides 28

internal ports to compute nodes and 20 external SFP+ ports. The FC5022 offers end-to-end 16 Gb and 8 Gb

Fibre Channel connectivity.

Figure 7: Flex System FC5022 16 Gb SAN Switch

For more information, see this website: ibm.com/redbooks/abstracts/tips0870.html

1.1.7 Lenovo RackSwitch G8264

Designed with top performance in mind, Lenovo RackSwitch G8264 (as shown in Figure 8) is ideal for today’s

big data, cloud, and optimized workloads. The G8264 switch offers up to 64 10 Gb SFP+ ports in a 1U form

factor and can accommodate future needs with four 40 Gb QSFP+ ports. It is an enterprise-class and

full-featured data center switch that delivers line-rate, high-bandwidth switching, filtering, and traffic queuing

without delaying data. Large data center grade buffers keep traffic moving. Redundant power and fans and

numerous high availability features equip the switches for business-sensitive traffic.

Figure 8: Lenovo RackSwitch G8264

The G8264 switch is ideal for latency-sensitive applications, such as client virtualization. It supports Virtual

Fabric to help clients reduce the number of I/O adapters to a single dual-port 10 Gb adapter, which helps

reduce cost and complexity. The G8264 switch supports the newest protocols, including Data Center

Bridging/Converged Enhanced Ethernet (DCB/CEE) for support of FCoE, in addition to iSCSI and NAS.

For more information, see this website: ibm.com/redbooks/abstracts/tips0815.html

6 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

1.1.8 IBM System Storage SAN24B-5

The IBM System Storage SAN24B-5 SAN switch (as shown in Figure 9) is designed to meet the demands of

hyper-scalable, private cloud storage environments by delivering 16 Gbps Fibre Channel technology and

capabilities that support highly virtualized environments. These switches support autosensing of 2 Gb, 4 Gb,

8 Gb or 16 Gb port speeds. The SAN24B-5 supports up to 24 ports in a 1U package. A 48-port version

(SAN48B-5) also is available.

Figure 9: IBM System Storage SAN24B-5

For more information, see this website: ibm.com/systems/networking/switches/san/b-type/san24b-5

1.1.9 IBM FlashSystem 840

The IBM FlashSystem™ 840 (as shown in Figure 10) is an all-flash storage system that is used to make

applications and data centers faster and more efficient by providing over 1 million input/output operations per

second (IOPS). The FlashSystem 840 storage system has an industry-leading latency of nearly 100

microseconds. This latency is especially useful for client virtualization that requires high IOPS and low-latency

access to large amounts of data. For enterprise-level availability, the IBM FlashSystem 840 system uses

two-dimensional flash RAID with patented IBM Variable Stripe RAID™ technology that maintains system

performance and capacity if there are partial or full-flash chip failures, which helps reduce downtime and

forestall system repairs. It is also extremely compact, with up to 40 TB of useable flash storage in a 2U

package with hot-swappable power supplies, backup batteries, and controllers. IBM FlashSystem 840 supports

all industry standard interfaces, including 4 Gb, 8 Gb, or 16 Gb Fibre Channel, 40 Gb InfiniBand®, 10 Gb iSCSI,

and 10 Gb FCoE.

Figure 10: IBM FlashSystem 840

For more information, see this website: ibm.com/systems/storage/flash/840/

1.2 Software components

This section describes the software components that are used for the 5000 persistent user scale out

performance test, which includes VMware ESXi hypervisor, Citrix XenDesktop, and Atlantis Computing

software.

7 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

1.2.1 VMware ESXi hypervisor

VMware ESXi™ is a bare-metal hypervisor. ESXi partitions a physical server into multiple virtual machines.

The compute, memory, and networking resources on the server are all virtualized. One advantage is that ESXi

can be booted from a small USB key.

For more information, see this website: vmware.com/products/esxi-and-esx/

1.2.2 Citrix XenDesktop

Citrix XenDesktop is an industry-leading connection broker for virtual applications and virtual desktops. It

provides a range of services for provisioning, managing, and connecting users to Microsoft Windows virtual

machines.

For more information, see this website: citrix.com/products/xendesktop/.

1.2.3 Atlantis Computing

Atlantis Computing provides a software-defined storage solution, which can deliver better performance than

physical PC and reduce storage requirements by up to 95% in virtual desktop environments of all types. The

key is Atlantis HyperDup content-aware data services, which fundamentally changes the way VMs use storage.

This change reduces the storage footprints by up to 95% while minimizing (and in some cases, entirely

eliminating) I/O to external storage. The net effect is a reduced CAPEX and a marked increase in performance

to start, log in, start applications, search, and use virtual desktops or hosted desktops and applications. Atlantis

software uses random access memory (RAM) for write-back caching of data blocks, real-time inline

de-duplication of data, coalescing of blocks, and compression, which significantly reduces the data that is

cached and persistently stored in addition to greatly reducing network traffic.

Atlantis software works with any type of heterogeneous storage, including server RAM, direct-attached storage

(DAS), SAN, or network-attached storage (NAS). It is provided as a VMware ESXi compatible VM that presents

the virtualized storage to the hypervisor as a native data store, which makes deployment and integration

straightforward. Atlantis Computing also provides other utilities for managing VMs and backing up and

recovering data stores.

For the purposes of this scale out test, the Atlantis ILIO Persistent VDI version was used in disk-backed mode.

This mode provides the optimal solution for desktop virtualization customers that are using traditional or

existing storage technologies that are optimized by Atlantis software with server RAM. In this scenario, Atlantis

employs memory as a tier and uses a small amount of server RAM for all I/O processing while using the

existing SAN, NAS, or all-flash arrays storage as the primary storage. Atlantis storage optimizations increase

the number of desktops that the storage can support by up to 20 times while improving performance.

Disk-backed configurations can use various different storage types, including host-based flash memory cards,

external all-flash arrays, and conventional spinning disk arrays.

For more information, see this website: atlantiscomputing.com/products/

8 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

2 Performance test methodology and tools

This section describes the test methodology that was used for the scale out performance test and the tools that

were used to run and monitor the test.

2.1 Test methodology

An industry standard client virtualization test tool, Login VSI is used to provide a simulated load of up to 5000

users. Normally, Login VSI is used to benchmark compute servers to find the maximum number of users that

can be supported in a specific configuration and the CPU load is at 100%. For the scale out performance test,

the idea is not to overload any part of the system and ensure that all of the components are running at less than

100% utilization. This condition mirrors what is required for customer deployments.

Login VSI supports two launcher modes: serial and parallel. Serial mode is normally used to test the maximum

workload for a specific server. For the scale out performance testing, Login VSI was used in parallel mode so

that the login interval could be substantially reduced from the default of every 30 seconds and the simulated

load evenly distributed across the Login VSI launchers and compute servers. The user login interval was

varied to achieve the best result given the available servers and in many cases one logon every two seconds

was used. This means that 5000 users logon over a period of 10,000 seconds (approximately 2.75 hours) and

the total test time (including the standard 30 minute Login VSI idle period and logoff) would be about 3.5 hours.

All user VMs were pre-booted before the test so they were idle and ready to receive users. The Login VSI

medium workload was chosen to represent typical customer workloads. The more intensive heavy workload

simply required more servers to support the extra CPU load.

During the scale out performance test, different performance monitors were used to ensure that no single

component is overloaded. The esxtop tool was used for the compute servers and storage monitoring tools for

the IBM FlashSystem shared storage. The results from these tools are described in section 5 on page 17.

After each test run, the user VMs and Login VSI launcher VMs are rebooted and everything is reset and ready

for the next run a few hours later. Two or three runs often were done for each test variation.

2.2 Login VSI

Login VSI is a vendor-independent benchmarking tool that is used to objectively test and measure the

performance and scalability of server-based Windows desktop environments (client virtualization). Leading IT

analysts recognize and recommend Login VSI as an industry-standard benchmarking tool for client

virtualization and can be used by end-user organizations, system integrators, hosting providers, and testing

companies.

Login VSI can be used for the following purposes:

Benchmarking: Make the correct decisions about different infrastructure options that are based on

tests.

Load-testing: Gain insight in the maximum capacity of your current (or future) hardware environment.

Capacity planning: Decide exactly what infrastructure is needed to offer users an optimal performing

desktop.

Change Impact Analysis: To test and predict the performance effect of every intended modification

before its implementation.

9 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Login VSI measures the capacities of virtualized infrastructures by simulating typical (and atypical) user

workloads and application usage. For example, the Login VSI medium workload simulates a medium-level

knowledge worker that uses Microsoft Office, Internet Explorer, and PDFs. The medium workload is scripted in

a 12- to 14-minute loop when a simulated Login VSI user is logged on and each test loop performs the

following operations:

Microsoft Outlook: Browse 10 messages.

Internet Explorer: One instance is left open (BBC.co.uk) and one instance is browsed to Wired.com,

Lonelyplanet.com

Flash application: gettheglass.com (not used with MediumNoFlash workload).

Microsoft Word: One instance to measure response time, one instance to review and edit document.

Bullzip PDF Printer and Acrobat Reader: The Word document is printed and reviewed to PDF.

Microsoft Excel: A large randomized sheet is opened.

Microsoft PowerPoint: A presentation is reviewed and edited.

7-zip: By using the command line version, the output of the session is zipped.

After the loop finished, it restarted automatically. Each loop takes approximately 14 minutes to run. Within each

loop, the response times of specific operations are measured at a regular interval: six times within each loop.

The response times of these seven operations is used to establish the VSImax score. VSImax is the maximum

capacity of the tested system expressed in the number of Login VSI sessions. For more information see this

website: loginvsi.com/

2.3 VMware esxtop

IOPS distribution and latency are the two most important metrics to be considered in the analysis of storage

system. The VMware tool esxtop was used to capture this information from the ESXi hypervisor. Figure 11

shows the command that was used to pipe the esxtop data to a file.

Figure 11: esxtop command line and usage

For more information, see this website:

http://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.monitoring.doc/GUID-D89E8267-C

74A-496F-B58E-19672CAB5A53.html

For more information about interpreting esxtop statistics, see this website:

http://communities.vmware.com/docs/DOC-9279

10 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

2.4 Superputty

Superputty is a Windows GUI application that allows multiple PuTTY SSH clients to be opened, one per tab. In

particular, this tool was used to allow simultaneous control of multiple SSH sessions and start tools (such as

esxtop) in each session at the same time.

For more information, see this website: https://code.google.com/p/superputty/

2.5 IBM FlashSystem performance monitor

As with other IBM storage platforms, IBM FlashSystem features an integrated web-based GUI that can be

used for management and performance analysis, in addition to supporting data collection from external tools.

The procedure described at the following website was used to export performance metrics into CSV format so

they can be easily reviewed for this study: http://ibm.com/support/docview.wss?uid=tss1td106293&aid=1

11 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

3 Performance test hardware configuration

The hardware configuration for the 5000 persistent user scale out performance test consists to two major parts:

the “System under test” that runs the 5000 persistent desktop VMs, and the “Load framework” that provides the

simulated load of 5000 users that were using Login VSI. Figure 12 shows an overview of the hardware

configuration for the 5000 persistent user performance test.

Load FrameworkSystem Under Test

Active Directory, DHCP, and DNS Server

Launcher Servers

NAS Storage for results, logs, management and launcher VM images

Compute Servers (for users and management)

IBM FlashSystem 840 Storage for 5000 persistent user VMs

Management Network

Storage Network

User Network

SAN Network

G8264Switch

SAN24B-5Switch

Figure 12: Overview of hardware configuration for performance test

3.1 System under test

The system under test configuration consists of 35 compute servers that are running 5000 user VMs and two

management servers that are running management VMs. All servers have a USB key with ESXi 5.5. The 35

compute servers are various Lenovo Flex x240 and Lenovo Flex x222 compute nodes, as listed in Table 1.

Table 1: Compute nodes used in system under test

Server Processor Memory Count

x222 2 x E5-2470 (Sandy Bridge EN) in each half 192 GB each half (384 total) 5 x 2

x240 2 x E5-2670 (Sandy Bridge EP) 256 GB 18

x240 2 x E5-2690 (Sandy Bridge EP) 256 Gb 5

x240 v2 2 x E5-2690v2 (Ivy Bridge EP) 384 GB 2

12 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Each x240 compute node has a two-port 10 GbE LAN on motherboard (LOM) adapter and a two-port 8 Gb

Fibre Channel (FC) adapter (FC3172). Each x222 compute node also has a two-port LOM for each half and a

shared four port 16 Gb Fibre Channel adapter (FC5024D).

The 35 compute nodes are placed in three Lenovo Flex chassis, which use a total of 30U in a rack. Each Flex

chassis is configured with a EN4093R 10 GbE switch that is connected to a Lenovo G8264 64-port TOR 10

GbE ethernet switch. Each chassis is connected by using four a 40 GbE cable for best performance. An extra

EN4093R switch in each chassis and a second G8264 TOR switch can be used for redundancy.

Each Flex chassis also contains an FC3171 or FC5022 FC switch that is configured in pass-thru mode. The

chassis switches are connected with four LC-LC fibre cables to an IBM SAN24B-5 TOR SAN switch. An extra

FC switch in each chassis and a second SAN24B-5 TOR switch can be used for redundancy. All zoning for the

compute nodes and IBM FlashSystem 840 storage is centralized in the SAN24B-5 switch.

The IBM FlashSystem 840 storage server was configured with a full complement of 12 4 TB flash cards for a

total of 40 TB of redundant storage (usable after two-dimensional RAID protection). The 5000 persistent virtual

desktops used less than 5 TB of FlashSystem capacity after Atlantis Computing data reduction. The

FlashSystem 840 is connected to the SAN24B-5 switch by using four LC-LC fibre cables, two to each storage

controller for redundancy. Another four fibre cables can be used to connect to a second SAN switch for further

failover protection.

Even with redundancy, there are enough ports on the IBM FlashSystem 840 for a direct FC connection from

the Flex chassis FC switches. Pass-thru mode to a TOR SAN switch was used to show how a larger SAN

network is built.

All of the management VMs that are required by Citrix XenDesktop and Atlantis ILIO center are split across two

x240 compute nodes. The configuration and number of these VMs is in Table 2.

Table 2: Characteristics of management VMs

Management VM Virtual

processors

System

memory

Storage Windows OS Count

AD, DNS and DHCP 2 4 GB 50 GB 2008 R2 SP1 1 (+1) for redundancy

Web Interface 2 4 GB 70 GB 2008 R2 SP1 2 (1 per 2500 VMs)

Delivery Controller 4 16 GB 70 GB 2008 R2 SP1 4 (1 per 1250 VMs)

Citrix licensing server 2 4 GB 20 GB 2008 R2 SP1 1

XenDesktop SQL server 2 4 GB 150 GB 2008 R2 SP1 1

vCenter server 10 32 GB 100 GB 2008 R2 SP1 1

vCenter SQL server 4 4 GB 150 GB 2008 R2 SP1 1

Atlantis ILIO center 2 4 GB 20 GB Linux 1

The VM for the Active Directory, DNS, and DHCP services is shared by the servers in the system under test

and the load framework and a second instance is used for redundancy. Windows Server 2012 R2 can be used

instead of Windows 2008 R2 SP1.

13 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

For production purposes the AD server and SQL servers should be replicated to provide fault tolerance. Four

VMs for the XenDesktop Delivery Controller are used to provide adequate performance under load.

Figure 13 shows the compute servers, shared storage, and networking hardware for the system under test.

• IBM G8264 64 port 10GbE TOR

• IBM SAN24B-5 FC SAN TOR

• IBM FlashSystem 840 (40TB) for

storage of persistent VMs

• IBM Flex Chassis with

– 5 x222 twin node (compute)

– 4 x240 node (compute)

• Two IBM Flex Chassis with

– 26 x240 node (compute)

– 2 x240 node (management)

41

42

39

40

37

38

35

36

33

34

31

32

29

30

27

28

25

26

23

24

21

22

19

20

17

18

15

16

13

14

11

12

09

10

07

08

05

06

03

04

01

02

41

42

39

40

37

38

35

36

33

34

31

32

29

30

27

28

25

26

23

24

21

22

19

20

17

18

15

16

13

14

11

12

09

10

07

08

05

06

03

04

01

02

Flex System Enterprise

1

3

5

7

9

11

13

2

4

6

8

10

12

14

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

Flex System Enterprise

1

3

5

7

9

11

13

2

4

6

8

10

12

14

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

Flex System Enterprise

1

3

5

7

9

11

13

2

4

6

8

10

12

14

x222x222

x222x222

x222

x240

0

1

x240

0

1

SAN24B-5

23192218211720161511141013912873625140

26

27

28

29 31 33 35

30 32 34 36

35

37

38

39

40

41 43 45 47

42 44 46 48

47

36 48

38

14

15

16

17 19 21 23

18 20 22 24

23

25

24

26

1

2

3

4

5 7 9 11

6 8 10 12

11

13

12

142

D

4

D

2

3

AA

D

AA

D

1

S

Rst

Mgmt

121

1-2

840

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

x240

0

1

Figure 13: Hardware configuration for System under Test

3.2 Load framework

The load framework uses Login VSI 3.7 to simulate a user load of up to 5000 users with the medium workload.

The load framework consists of 29 compute servers and one management server that uses Lenovo x3550 rack

servers with the VMware ESXi 5.5 hypervisor, and NAS shared storage for the Login VSI launcher VMs and

performance data.

The compute servers for the load framework must have adequate performance to support the required load of

8 - 12 Login VSI launcher VMs. These compute servers often have two Westmere EP or better processors and

96 GB or more of memory. Each “launcher” compute server has a USB key with ESXi 5.5 and a two-port

10 GbE adapter that is connected to the same G8264 10 GbE TOR switch that is used by the system under

test. There is no need for an FC connection to the IBM FlashSystem storage, although there is nothing

preventing centralization of the storage on FlashSystem. Instead, all of the data for the load framework is

stored on NAS shared storage, which is connected to the same G8264 10 GbE switch.

The management server for the load framework supports several VMs. The main VM is used to run Login VSI

Launcher and Analyzer tools. In addition a separate Citrix XenDesktop configuration is used to provision

14 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

multiple launcher VMs using XenDesktop machine creation services (MCS). There are different ways this could

have been done but it was easy to use MCS in dedicated mode to create the launcher VMs.

Figure 14 shows the compute servers and storage hardware for load framework.

29 IBM System x3550 nodes

(launchers)

1 IBM System x3550 node

(management)

NAS Storage for launcher VMs

41

42

39

40

37

38

35

36

33

34

31

32

29

30

27

28

25

26

23

24

21

22

19

20

17

18

15

16

13

14

11

12

09

10

07

08

05

06

03

04

01

02

41

42

39

40

37

38

35

36

33

34

31

32

29

30

27

28

25

26

23

24

21

22

19

20

17

18

15

16

13

14

11

12

09

10

07

08

05

06

03

04

01

02

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

System x3550

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

2.0TB

System Storage N6240

Figure 14: Hardware configuration for Load Framework

15 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

4 Software configuration and setup

The following software and configurations setup tasks must be done before any performance tests are run:

Management VMs

Master User VM

Master Launcher VM

Login VSI

Atlantis Computing software

5000 Persistent Desktop VMs

4.1 Setting up Management VMs

The configuration and set up of the management VMs that are required for Citrix XenDesktop should follow the

normal procedures as documented by Microsoft and Citrix. The following special considerations apply:

The Active directory, DNS, and DHCP server is shared between all compute servers on the network

(from the system under test and the load framework).

There are four XenDesktop Delivery Controllers.

The mapping between user IDs and the names of the persistent desktop VMs is statically specified to

the connection broker rather than being randomly assigned the first time it is needed. This specification

makes it easier to remedy any VM setup problems before the first performance test. If this is not done,

the assignment of user IDs to VMs must be rerun until it completes successfully for all 5000 users.

4.2 Setting up master user VM

Windows 7 Professional with SP1 is used as the basis for the master user VM (master image) for the scale out

performance testing. The master image was created by completing the following steps:

1. Create a Windows 7 Professional 64-bit with SP1 VM. The following VM parameters should be

specified: 1 vCPU, 1024 MB vRAM, and 24 GB Disk.

2. Configure Windows 7, networking, and other OS features.

3. Install VMware VMtools for access by vCenter and reboot.

4. Join to the Active Directory domain and reboot.

5. Disable all Internet Explorer plug-ins.

6. Ensure that the firewalls are turned off.

7. Enable remote desktop for remote access to the desktop.

8. Install the Windows applications that are needed for Login VSI medium workload, including Microsoft

Office, Adobe Acrobat, and so on.

9. Apply the Citrix recommended optimizations. For more information, see this website:

support.citrix.com/article/CTX125874

10. Install the Citrix XenDesktop Virtual Desktop Agent (VDA). This step is not needed for the brokerless

RDP test scenario.

16 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

11. Add registry to point to the FQDNs of the 4 XenDesktop Delivery Controller VMs. For more

information, see this website: support.citrix.com/article/CTX137993.

This step is not needed for the brokerless RDP test. The Citrix desktop service randomly selects a

controller from the list (grouped or ungrouped) until a successful connection to a controller is

established.

12. Shutdown the VM and take a snapshot.

4.3 Setting up master launcher VM

The setting up the master launcher VM for Login VSI is similar to that for the master user VM except that the

Citrix receiver should be installed. The Citrix receiver is not needed for the brokerless RDP test scenario. To

save time, an autologon script is added so that the launcher VMs are automatically logged on after being

started.

4.4 Setting up Login VSI

Login VSI 3.7 was used to simulate the load of 5000 users for the scale out testing. The process starts by

installing Login VSI using the install instructions that are available at this website:

loginvsi.com/documentation/index.php?title=Installation

A separate management VM is used to run Login VSI performance tests and analyze the results.

As noted earlier a Citrix MCS environment is used to create the launcher VMs. First add all of the physical

launcher machines to VMware vCenter and also Citrix XenCenter. Then using the master launcher image as a

template, the 288 launcher VMs are created using MCS dedicated mode. The number of launcher VMs per

physical server depends on its performance; however, 8 - 12 launcher VMs works well.

The Login VSI tool is started to ensure that all of the launchers were created properly and are ready to use.

Finally, a script is used to add the 5000 unique user IDs to AD. The password for all of these users is the same

for simplicity.

For the brokerless RDP test scenario, the following slightly different steps are used for running Login VSI:

Ensure that the LoginVSI RDP group has access to the Master image.

Use vCenter to copy and paste the IP addresses of the user VMs that are performing the LoginVSI test

into a CSV file (named %csv_target% in the commandline example below).

In the Login VSI configuration, replace the commandline with the following:

C:\Program Files\Login Consultants\VSI\Launcher\RDPConnect.exe %csv_target% <AD

domain>\<login vsi user> P@ssword1

4.5 Setting up Atlantis Computing software

Install Atlantis ILIO Persistent VDI product by following the standard installation procedure. The Atlantis ILIO

Center VM can be run on one of the servers that is designated for management VMs.

It is a recommended Atlantis best practice that each ILIO VM has its own logical unit number (LUN) on shared

storage. Therefore, 35 volumes (each with 300 GB capacity) were created on the IBM FlashSystem storage.

17 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

This capacity totals about 10 TB. Each ILIO VM and its datastore of user VMs requires less than 150 GB for a

total of only 5 GB on the shared storage. However, in production, the de-duplication savings for persistent

desktops is more likely to be 80% - 90% instead of the 98% that is achieved by this performance test.

By using vCenter, each physical server has access to all 35 of the volumes, even though only one is actually

used per physical server. The ILIO master VM and the master user VM is then placed in one of those volumes.

Scripts that are available from Atlantis Computing are used to clone the ILIO VM and the master user VM

across all 35 compute servers in preparation for the next step.

4.6 Setting up 5000 persistent desktop VMs

Each of the 35 compute servers supports persistent desktop VMs. The number of VMs depends upon the

processor capability of the server. Table 3 lists the number of VMs per compute server and VM total.

Table 3: Number of VMs per compute server

Server Processor Count VMs Total VMs

x222 2 x E5-2470 (Sandy Bridge EN) in each half 5 x 2 100 1000

x240 2 x E5-2670 (Sandy Bridge EP) 2 160 2880

x240 2 x E5-2690 (Sandy Bridge EP) 5 160 800

x240 v2 2 x E5-2690v2 (Ivy Bridge EP) 18 160 320

Total 5000

A command line script from Atlantis and a CSV file are used to fast clone the master VM on each compute

server to create the required number of VMs on each of the servers. A naming scheme of the server name and

VM number is used to create a set of 5000 uniquely named VMs. The cloning process can take half a day to

complete, but needs to be done only once for each different master VM image.

Each VM is started to register with active directory and automatically assign the machine name to the VM. This

process can be done as a separate step or as part of the fast cloning process described above. The VMs are

then shutdown via vCenter.

The dedicated machine catalog is created for Citrix XenDesktop and a 5000 line CSV file is used to

automatically insert all of the named VMs into the machine catalog. A desktop group is created and the 5000

VMs are added to it. XenDesktop automatically starts each VM and ensures that it is accessible from

XenDesktop. Sometimes it is necessary to do some manual steps to get all of the VMs into the correct state.

The last step is to perform a standard Login VSI profile run to automatically create the user profile in each

persistent desktop. Because of the static assignment of names, any failures can be corrected manually or by

rerunning Login VSI. A final restart of the guest operating systems and the 5000 persistent desktops are ready

for a performance test.

18 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

5 Scale out performance results

To show the performance of 5000 persistent users, the following test scenarios were run:

Brokerless with RDP connected clients (no Citrix XenDesktop)

Citrix XenDesktop 7.5 with HDX connected clients

This section describes the results of the scale out tests by examining the performance of Login VSI test, the

compute servers, and shared storage.

5.1 Brokerless by using RDP

In this test scenario, the View connection server is not used and the launcher VMs are connected directly to the

VMs by using the RDP protocol. This test is used as a baseline and comparison for the other tests.

Figure 15 shows the output from LoginVSI with a new logon every second. Out of 5000 started sessions, 4998

successfully reported back to Login VSI. The average response time is extremely good with a VSI baseline of

860 milliseconds (ms). The graph is flat with only a slight increase in the average between the first and last

desktop. As measured by Login VSI, the longest time to logon for any session was 8 seconds.

Figure 15: Login VSI performance result for Brokerless by using RDP

Figure 16 shows the percentage CPU utilization by using representative curves for each of the four different

servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off

after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest

19 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

utilization because only 100 VMs are started on those servers. The E5-2670 has the highest utilization (92%)

because it has the slowest CPU of the other three servers, which all have 160 VMs.

Figure 16: Esxtop CPU utilization for Brokerless by using RDP

Figure 17 shows the total number of server IOPS as reported by Esxtop by using representative curves for

each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and

then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based

server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have

similar curves because all have 160 VMs. The IOPS curves are spiky and show that the number of IOPS at any

instant of time can vary considerably. The peaks are most likely because of logons.

20 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 17: Esxtop IOPS for Brokerless by using RDP

Figure 18 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS

curve shows the classic Login VSI pattern of gradual building of IOPS (up 12:56am), then a steady state period

of 30 minutes (12:56am to 1:26am), and finally a peak for all of the logoffs at the end.

The read IOPS are low as Atlantis Computing software is managing most of them out of its in-memory cache.

The write IOPS are fairly low, peaking at less than 30,000 IOPS, which is 6 per persistent desktop. Atlantis

Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS.

Figure 18: FlashSystem storage IOPS for Brokerless using RDP

21 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 19 shows the server latency in milliseconds as reported by Esxtop by using representative curves for

each of the four different servers that are used in the test. The average latency is 300 microseconds (us) and is

constant throughout the whole test. The latency for the servers with the E5-2470 CPU tends peak higher, but

often is not more than 1ms.

Figure 19: Esxtop latency for Brokerless by using RDP

Figure 20 shows the storage request latency in milliseconds as measured by the IBM FlashSystem 840. The

curve shows that the average read latency is less than 200 us and even drops to zero during the steady state

phase because all of the read requests are satisfied by the Atlantis Computing cache. The write latency also

often is less than 200 us with occasional peaks, which are still less than 1000 us (1 millisecond), except during

the 5000 virtual desktop restart.

22 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 20: FlashSystem storage latency for Brokerless by using RDP

5.2 Citrix XenDesktop

In this test scenario, the Citrix XenDesktop broker is used and the launcher VMs are connected to the

XenDesktop Web Interface.

Figure 21 shows the output from LoginVSI with a new logon every 2 seconds, which is half the interval of the

brokerless RDP scale out test. Out of 5000 started sessions, 4997 successfully reported back to Login VSI and

is a successful run. The average response time is good, with a VSI baseline of 1356 ms. The graphs for

minimum and average response times are flat with only a slight increase in the average between the first and

last desktop. The graph for the maximum response time increases steadily and only shows the worst case. As

measured by Login VSI, the longest time to logon for any session was 16 seconds.

23 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 21: Login VSI performance result for Citrix XenDesktop

Figure 22 shows the percentage CPU utilization by using representative curves for each of the four different

servers that were used in the test. The utilization slowly climbs as more users logon and then sharply drops off

after the steady state period ends and the users are logged off. The E5-2470 based server has the lowest

utilization because only 100 VMs are started on those servers. The E5-2670 and E5-2690 CPUs have the

highest utilization (95%) compared to the faster E5-2690v2 and all three servers have 160 VMs.

24 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 22: Esxtop CPU utilization for Citrix XenDesktop

Figure 23 shows the total number of server IOPS as reported by Esxtop by using representative curves for

each of the four different servers that were used in the test. The IOPS slowly climb as more users logon and

then sharply drops off after the steady state period ends and the users are logged off. The E5-2470 based

server has the lowest IOPS because only 100 VMs are started on those servers. The other three servers have

similar curves as all have 160 VMs. The IOPS curves are spiky, which shows that the number of IOPS at any

instant of time can vary considerably with the peaks are most likely because of a logon.

25 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 23: Esxtop IOPS for Citrix XenDesktop

Figure 24 shows the total number storage IOPS as measured by the IBM FlashSystem 840. The write IOPS

curve shows the classic Login VSI pattern of gradual building of IOPS. The steady state period is less

discernible in this graph and occurs around 9:15 p.m. The read IOPS are low as Atlantis Computing software is

managing most of them out of its in-memory cache. The number of read IOPS increases substantially at logoff.

The write IOPS are quite low, peaking at less than 35,000 IOPS, which is 7 IOPS per persistent desktop. Again,

Atlantis Computing software is using its data services to compress, de-dupe, and coalesce the write IOPS.

Figure 24: FlashSystem storage IOPS for Citrix XenDesktop

26 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 25 shows two successive runs of the 5000 persistent desktop scale out test. Each run shows a similar

pattern of IOPS, which culminates with the logoffs and an idle back to a low number. At 10.33 p.m., a reboot of

all of the desktops was started and then completed 20 minutes later. In Figure 25, there are jumps in the IOPS

log at 7:12 p.m. and 11:47 p.m., which is an artifact of the data collection process on the IBM FlashSystem 840.

Figure 25: Two Citrix XenDesktop runs with reboot in between

Figure 26 shows the server latency in milliseconds as reported by Esxtop by using representative curves for

each of the four different servers that were used in the test. The average latency is 350 microseconds (us) and

is fairly constant throughout the whole test. The latency for the servers with the E5-2470 CPU tends peak

higher, but is usually not more than 1 ms.

Figure 26: Esxtop latency for Citrix XenDesktop

27 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Figure 27 shows the storage request latency in milliseconds as measured by the IBM FlashSystem 840. The

curve shows that the average read latency is less than 200 us. The write latency is also less than 250 us and

most peaks are below 750 us with occasional peaks to 2.5 ms.

Figure 27: FlashSystem storage latency for Citrix XenDesktop

28 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Resources

Reference architecture for Lenovo Client Virtualization with Citrix XenDesktop

lenovopress.com/tips1278

Atlantis Computing

atlantiscomputing.com/products

IBM FlashSystem 840

ibm.com/storage/flash

VMware vSphere

vmware.com/products/datacenter-virtualization/vsphere

Citrix XenDesktop

citrix.com/products/xendesktop

Acknowledgements

Thank you to the teams at Atlantis Computing (Mike Carman, Bharath Nagaraj), IBM (Rawley Burbridge), and

ITXen (Brad Wasson) for their tireless work on helping with the performance testing.

29 5000 Persistent User Scale out Test

with Citrix XenDesktop and Atlantis Computing

Trademarks and special notices

© Copyright Lenovo 2015.

References in this document to Lenovo products or services do not imply that Lenovo intends to make them

available in every country.

Lenovo, the Lenovo logo, ThinkCentre, ThinkVision, ThinkVantage, ThinkPlus and Rescue and Recovery are

trademarks of Lenovo.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines

Corporation in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United

States, other countries, or both.

Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other

countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

All customer examples described are presented as illustrations of how those customers have used Lenovo

products and the results they may have achieved. Actual environmental costs and performance characteristics

may vary by customer.

Information concerning non-Lenovo products was obtained from a supplier of these products, published

announcement material, or other publicly available sources and does not constitute an endorsement of such

products by Lenovo. Sources for non-Lenovo list prices and performance numbers are taken from publicly

available information, including vendor announcements and vendor worldwide homepages. Lenovo has not

tested these products and cannot confirm the accuracy of performance, capability, or any other claims related

to non-Lenovo products. Questions on the capability of non-Lenevo products should be addressed to the

supplier of those products.

All statements regarding Lenovo future direction and intent are subject to change or withdrawal without notice,

and represent goals and objectives only. Contact your local Lenovo office or Lenovo authorized reseller for the

full text of the specific Statement of Direction.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive

statement of a commitment to specific levels of performance, function or delivery schedules with respect to any

future products. Such commitments are only made in Lenovo product announcements. The information is

presented here to communicate Lenovo’s current investment and development activities as a good faith effort

to help with our customers' future planning.

Performance is based on measurements and projections using standard Lenovo benchmarks in a controlled

environment. The actual throughput or performance that any user will experience will vary depending upon

considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the

storage configuration, and the workload processed. Therefore, no assurance can be given that an individual

user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown are of engineering prototypes. Changes may be incorporated in production models.

Any references in this information to non-Lenovo websites are provided for convenience only and do not in any

manner serve as an endorsement of those websites. The materials at those websites are not part of the

materials for this Lenovo product and use of those websites is at your own risk.