ser2724bu extreme performance series: or distribution · mark achtemichuk, vcdx, staff engineer,...

46
Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE, VMware SER2724BU #VMworld #xPerfSeries #SER2724BU * Extreme Performance Series: Performance Best Practices VMworld 2017 Content: Not for publication or distribution

Upload: others

Post on 09-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Mark Achtemichuk, VCDX, Staff Engineer, VMwareReza Taheri, Principal Engineer, VMwareValentin Bondzio, Senior Staff TSE, VMware

SER2724BU

#VMworld #xPerfSeries #SER2724BU *

Extreme Performance Series:

Performance Best Practices

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 2: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

#SER2724BU CONFIDENTIAL 2

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 3: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Agenda

1 The Baseline

2 vNUMA

3 Keeping Things Up To Date

4 Power Management

5 Hyper-threading

3

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 4: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Baseline

4

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 5: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Baseline Best Practices

• * Use the most current release: vSphere, VCSA, VM Tools, vHW, OS, BIOS, Firmware

• HW selection makes a difference ex: bandwidth, offloads, processor architectures

• Refer to existing best practice documentation ex: SQL BPs, Latency Sensitive BPs

• * Rightsize your workloads, size into a pNUMA node, correct vCPU presentation

• * Evaluate your power management policy

• Use resource management properly, or not at all!

• * Keep Hyper-threading enabled

• Use DRS to manage contention

5

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 6: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Baseline Best Practices

• Monitor oversubscription ex: pCPU:vCPU, memory reclamation, via vROPs

• Use paravirtualized drivers: vmxnet3, pvscsi

• Evaluate disabling interrupt coalescing, lower latency, higher cost

• Storage design needs to be optimized for flash, map app -> disk

• Understand what the workload is – java apps are different than databases

• Define and monitor application level KPIs

• You can’t compare Apples to Oranges

6

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 7: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Baseline Best Practices

• What’s Important to Monitor:

– Compute – Contention – Ready, Co-Stop

– Memory – Oversubscription – Balloon, Swap (in-guest difficult)

– Storage – Service Time – Device and Kernel Latency

– Network – Health - Throughput

7

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 8: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Baseline Best Practices

• Performance Best Practices for vSphere 6.5

https://www.vmware.com/techpapers/2017/Perf_Best_Practices_vSphere65.html

• Application Specific Best Practice Guides (SQL, Oracle, etc)

https://www.vmware.com/solutions/business-critical-apps.html

• VROOM! Blog

https://blogs.vmware.com/performance/

• Performance Community

https://communities.vmware.com/community/vmtn/performance

8

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 9: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Agenda

1 The Baseline

2 vNUMA

3 Keeping Things Up To Date

4 Power Management

5 Hyper-threading

9

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 10: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

• Poor NUMA Locality (N%L)

• pNUMA doesn’t match vNUMA

• I see conflicting guidance

10

vNUMA

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 11: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

11

vNUMA – Optimal Configuration

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 12: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

1. While there are many advanced vNUMA settings, only in rare cases do they need to be changed from defaults.

2. Always configure the virtual machine vCPU count to be reflected as Cores per Socket, until you exceed the physical core count of a single physical NUMA node.

3. When you need to configure more vCPUs than there are physical cores in the NUMA node, evenly divide the vCPU count across the minimum number of NUMA nodes.

4. Don’t assign an odd number of vCPUs when the size of your virtual machine exceeds a physical NUMA node.

5. Don’t enable vCPU Hot Add unless you’re okay with vNUMA being disabled*

6. Don’t create a VM larger than the total number of physical cores of your host*

12

vNUMA – Rules of Thumb

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 13: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Agenda

1 The Baseline

2 vNUMA

3 Keeping Things Up To Date

4 Power Management

5 Hyper-threading

13

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 14: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

• In house developed application that is “mission critical” runs as expected on developer laptop but only about 70% of the performance on ESXi

• Analyze the problem

• Use tools to identify issue

• Lesson learned

CONFIDENTIAL 14

Why Keep Things Up-to-date

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 15: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

The developers “laptop” The ESXi “server”

15

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 16: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Is There Contention?

16

10:05:56am up 2 days 43 min, 675 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.05, 0.01, 0.01

PCPU USED(%): 0.3 0.3 0.1 0.3 0.4 0.0 0.1 0.1 0.1 0.2 138 0.3 0.2 0.3 0.0 0.0 0.1 15 0.2 0.1 0.0 0.3 0.1 0.1 AVG: 6.6

PCPU UTIL(%): 0.5 0.4 0.1 0.3 0.4 0.0 0.2 0.2 0.1 0.3 100 0.7 0.3 0.3 0.1 0.0 0.1 16 0.4 0.3 0.1 0.6 0.3 0.3 AVG: 5.1

CORE UTIL(%): 0.8 0.0 0.8 0.5 0.4 100 0.0 15 16 0.0 7.1 0.3 AVG: 11

ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP

96337 148153 vmx 1 0.01 0.01 0.00 98.23 - 0.00 0.00 0.00

96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00

96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00

96341 148153 vmast.96338 1 0.04 0.06 0.00 98.18 - 0.00 0.00 0.00

96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00

96344 148153 vmx-mks:prime95 1 0.00 0.01 0.00 98.23 - 0.00 0.00 0.00

96345 148153 vmx-svga:prime95 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00

96346 148153 vmx-vcpu-0:prime95 1 137.13 98.24 0.00 0.00 0.00 0.00 0.00 0.06

96348 148153 vmx-vcpu-1:prime95 1 0.31 0.52 0.00 97.70 2.71 0.02 94.99 0.01

96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00

96350 148153 vmx-vthread-7:prime95 1 0.00 0.00 0.00 98.24 - 0.00 0.00 0.00VMworld 2017 Content: N

ot for publicatio

n or distribution

Page 17: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Processor Differences?

• Both processors are Haswell, very similar clock frequencies

– Intel Core i7-4700MQ (4 cores, 8 threads / 6 MB L3 Cache / 2.4 -> 3.4 GHz)

– Intel Xeon E5-2620 v3 (6 cores, 12 threads / 15 MB L3 Cache / 2.4 -> 3.2 GHz)

• Is it EVC?

17

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 18: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

What About Virtual Hardware?

18

ESXi vHW GA (ISO 8601) ~ Intel CPU gen. level Model Name mod

5.5 10(9) 2013-09-22 Ivy Bridge E*-**** v2

6.0 11 2015-03-12 Haswell E*-**** v3

6.5 13 2016-11-15 Skylake E*-**** v5

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 19: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Virtual Hardware Version Changes

• Some 38 changes that were implemented in vHW 11

• Examples:

– benefit SMT FT VMs (mostly SVGA optimizations)

– reduces timer interrupts for idle Windows 2012+ VMs

• (less CPU consumption / contention when VMs are idle)

– enable RSC (LRO) for Windows 2012+ VMs

– improves some aspect of nested VM performance

– etc.

19

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 20: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Agenda

1 The Baseline

2 vNUMA

3 Keeping Things Up To Date

4 Power Management

5 Hyper-threading

20

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 21: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

• I run performance test on my software but always get varying results within runs. Could it be Power Management?

• Analyze the problem

• Use tools to identify issue

• Lesson learned

CONFIDENTIAL 21

Power Management Impact

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 22: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

What is Power Management?

• More Turbo bins on cores in C0 when other cores are in deep C-States

Reallocating power consumption within a processor package

22

P0

TB1

Fre

quency

C0

C-Statedepth

C6

P0

TB1

C1 C1

C1

P0

TB1

C0

P0

TB1

C6 C6

TB2 TB2

TB3 TB3

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 23: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Where Can I See Power Management?

23

10:15:23am up 2 days 53 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.10, 0.09, 0.03

Power Usage: 147W, Power Cap: N/A

PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200

CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF

(…)

4 0.3 0.5 0 11 88 95 0 0 0 0 0 0 0 0 0 0 0 0 4 95.2

5 0.0 0.1 0 3 97 9 0 0 0 0 0 0 0 0 1 0 0 0 91 77.8

6 0.1 0.1 0 7 93 0 0 0 0 0 0 0 0 0 0 0 0 0 100 105.5

7 0.5 0.7 1 1 99 100 0 0 0 0 0 0 0 0 0 0 0 0 0 117.1

8 2.5 2.4 2 16 81 17 0 0 0 0 0 0 0 0 0 0 0 0 83 103.9

9 0.1 0.3 0 1 98 6 0 0 0 0 0 0 0 0 0 0 0 0 94 59.7

10 0.4 0.7 1 9 90 7 0 0 0 0 0 0 0 0 0 0 1 0 92 54.8

11 129.4 100.0 100 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 132.5

12 3.1 3.1 3 12 85 85 0 0 0 0 0 0 0 0 0 0 0 0 15 102.4

13 0.3 0.5 0 17 83 12 0 0 0 0 0 0 0 0 0 0 0 0 88 79.8

14 0.4 0.5 1 16 84 43 0 0 0 0 0 0 0 0 0 0 0 0 57 94.1

15 0.1 0.3 0 2 97 100 0 0 0 0 0 0 0 0 0 0 0 0 0 73.0

16 3.7 3.1 3 4 93 5 0 0 0 0 0 0 0 0 0 0 0 0 95 126.0

17 0.0 0.1 0 5 95 3 0 0 0 0 0 0 0 0 0 0 0 0 97 50.8

18 0.4 0.7 1 9 90 7 0 0 0 0 0 0 0 0 0 0 1 0 92 54.8

19 1.4 1.4 1 14 85 22 0 0 0 0 0 0 0 0 0 0 0 0 78 103.3

(…)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 24: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Balanced or High Performance?

• Always set BIOS to ‘OS Controlled’

– Then the policy change is dynamic

• Balanced (default) allows for Turbo opportunities

– Great for populations of small virtual machines

– Some performance variability is okay

• High Performance caps Turbo opportunities

– Best for populations that have Large VMs (greater than 8 vCPU)

– Required for Latency Sensitive workloads

CONFIDENTIAL 24

“It Depends”

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 25: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Agenda

1 The Baseline

2 vNUMA

3 Keeping Things Up To Date

4 Power Management

5 Hyper-threading

25

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 26: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting Scenario

• I ran my workload yesterday and it ran in 565 seconds; today it’s 1096 seconds. What happened?

• Analyze the problem

• Use tools to identify issue

• Lesson learned

CONFIDENTIAL 26

Interfering VMs

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 27: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Problem definition• The workload VM:

– 36 vCPUs

– Running the bzip2 test of SPECcpu2006

• Simple, no I/O

• But easy to see

• The Server:

– Broadwell-EP E5-2697 v4 @ 2.30GHz

– Turbo boost up to 2.8GHz for a 1.22X improvement

– 36 cores/72 HyperThreads

• Tools:

– Standard Linux tools inside the guest

– esxtop on the hypervisor

– Hardware event counters using PMC inside the guest

• Enable virtual Performance Monitoring Counters (vPMC), KB 2030221

• “perf stat” command to collect

27

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 28: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting steps

• Fast case: 565 seconds

• mpstat(1) on the guest

– 100% CPU utilization

• All other guest tools report the same stats

28

• Slow case: 1096 seconds

• mpstat(1) on the guest

– 100% CPU utilization

• All other guest tools report the same stats

• Questions:

– Why did our performance drop by half?

– Can we reconcile guest stats with ESX stats?

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 29: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Troubleshooting with esxtop

29

• In the fast case, we get the full utilization on all 36 cores

– The VM’s %RUN is nearly 3600%

– Turbo boost of 22% (PCPU USED% to PCPU UTIL% ratio) matches the hardware counters

• In the slow case, we have a second VM running!

– Our %RUN is about half of available cycles, matching the hardware counters

– Our %READY and %COSTOP add up to about half the available cycles

• That’s why the guest tools were fooled

• We need more CPUs!!!

– HyperThreading to the rescue!!

Two VMs active

One VM active

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 30: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Intel ® Hyper-Threading Technology

30

That doubles my CPUs, right? Not!

• Increases instruction level parallelism

• logic is replicated, partitioned, shared

• ~ 5% additional die size / cost

• ~ 25% more “performance”

• Most of the benefit comes from one HyperThread using the core while the other one is waiting for memory load

• Recent processors replicate some core functionality on each HyperThread

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 31: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

No problem: Enable HyperThreading!

31

• First, compare the one-VM case with and without HyperThreading

– Twice as many “physical” processors

– With HT, we have a new field: CORE UTIL%

• esxtop fields with HT enabled

– PCPU UTIL% versus CORE UTIL% with one HyperThread in use

• One HyperThread saturated keeps the core at 100%

• PCPU USED% reflects benefits of Turbo boost

HT Enabled

HT Disabled

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 32: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

What is PCPU USED %• A measure of how much a given HyperThread uses the core

– A metric calculated by ESX

– When it’s the only HyperThread active, it gets 100% credit

• Its twin gets 0% credit since it’s idle

– During any periods that both HyperThreads are active, they each get 50% credit

32

• The average of PCPU USED credit of the two HyperThreads cannot go over 50%

• PCPU USED % is then adjusted up or down for Turbo and frequency scaling

– So with Turbo boost, the average of PCPU USED % of a core can go to, say, 2.8GHz/2.3GHz=61%

0

10

20

30

40

50

60

70

80

90

100

Hyperthread1 Hyperthreads2

ExecutiontimeontheCPU

OneHyperthreadbusy100%

PCPUUSED

0

10

20

30

40

50

60

70

80

90

100

Hyperthread1 Hyperthreads2

ExecutiontimeontheCPU

BothHyperthreadssaturated50%

PCPUUSED50%

PCPUUSED

0

10

20

30

40

50

60

70

80

90

100

Hyperthread1 Hyperthreads2

ExecutiontimeontheCPU

BothHyperthreadsbusy

25%PCPUUSED

55%PCPUUSED

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 33: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

HyperThreading with two VMs

33

HT Enabled

HT Disabled

• On this fully saturated Broadwell with HyperThreading enabled

– PCPU USED%: is 60%

• Discounted for HT; boosted for Turbo

• Our performance boost is only 1.16X despite

– CPU %RUN per VM has gone from ~1800% to ~3600%

– %READY and %COSTOP have largely disappeared

– The expected boost is 10-40%

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 34: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Going a layer deeper

• Can we use other tools to diagnose why the HyperThreading boost is on the low side?

– Use Performance Monitoring Counters (PMC) built into the processor hardware

• PMCs can be virtualized on vSphere

– With vPMC, each VM only sees the event counts while it was running

• Analysis common for hard-core performance engineers

CONFIDENTIAL 34

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 35: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Enabling virtual Performance Monitoring Counters (vPMC) in a VM

35

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 36: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Collecting hardware event counts in Linux

36

# perf stat -a -e cycles -e ref-cycles -e instructions -e cache-misses -e cache-references sleep 10

Performance counter stats for 'system wide':

1,009,203,198,874 cycles

828,986,892,878 ref-cycles

1,456,882,482,627 instructions # 1.44 insn per cycle

984,566,743 cache-misses # 6.782 % of all cache refs

14,516,610,550 cache-references

10.015823286 seconds time elapsed

Let’s check the math for this workload that fully saturated all vCPU:

36 𝑣𝐶𝑃𝑈𝑠 × 2.3𝐺𝐻𝑧 × 10 𝑠𝑒𝑐𝑜𝑛𝑑𝑠 = 828,000,000,000

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 37: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Processor PMC stats with HyperThreading, one or two VMs

• With HyperThreading Disabled, 1 VM and 2 VMs have nearly identical profiles

• With HyperThreading Enabled and 2 VMs, we get twice the cycles in VMs, but not the efficiency

• The L3 cache and Resource Stall stats tell us the VMs are interfering with each other

37

HT VMs Overall

thruput of

all VMs

PCPU

UTIL

%

PCPU

USED

%

CORE

UTIL

%

Cycles/

second

per VM

Instr/

second

per VM

IPC Total L3

cache

accesses

Total L3

cache

misses

Total

Resource

Stalls

off 1 1.0 99% 121% - 2.7G 4.11G 1.52 23.4G 2.95G 442G

off 2 1.03 99% 121% - 1.4G 2.15G 1.53 20.9G 3.01G 478G

ON 2 1.20 99% 60% 99% 2.7G 2.44G 0.90 25.7G 7.25G 998G

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 38: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Yes, that was complex!

• VMs interfere with each other

– I/O latency

– Cache hit rate

– Memory latency

– Core resources

• HyperThreading can give a boost

– Often 1.2-1.3X boost

• Have to use a variety of tools for a detailed analysis

– Looking just at mpstat or even CORE UTIL% and PCPU UTIL% would have been misleading

• Do not oversubscribe above available resources!

38

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 39: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

What we did NOT say• We did not say don’t overcommit

– Just that your VMs cannot use more resources than the hardware can offer

• We did not say don’t believe Linux tools

– Just that when there is a discrepancy between mpstat and what the app claims, use esxtop to investigate

• We did not say use hardware event counters as a first resort

– It’s a microscope

– PMC gives you insight into the processor, e.g. when VMs are interfering with other

• We said: performance troubleshooting can be complex. Use the tools available at different layers of vertical stack to get a full picture

39

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 40: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Conclusion

• vSphere OOTB Performance is Excellent

• Tuning Required for Specific Workloads or Corner Cases

• Performance Requires Understanding HW to App

• Leverage Existing Best Practice Documentation for Support

• Performance is an Onion, Peel Back the Layers

• Links:

• https://blogs.vmware.com/performance/

• https://communities.vmware.com/community/vmtn/performance

#SER2724BU CONFIDENTIAL 40

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 41: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Extreme Performance Series – Las Vegas

#SER2724BU CONFIDENTIAL 41

• SER2724BU Performance Best Practices

• SER2723BU Benchmarking 101

• SER2343BU vSphere Compute & Memory Schedulers

• SER1504BU vCenter Performance Deep Dive

• SER2734BU Byte Addressable Non-Volatile Memory in vSphere

• SER2849BU Predictive DRS – Performance & Best Practices

• SER1494BU Encrypted vMotion Architecture, Performance, & Futures

• STO1515BU vSAN Performance Troubleshooting

• VIRT1445BU Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BU Optimize & Increase Performance Using VMware NSX

• VIRT2550BU Reducing Latency in Enterprise Applications with VMware NSX

• VIRT1052BU Monster VM Database Performance

• VIRT1983BU Cycle Stealing from the VDI Estate for Financial Modeling

• VIRT1997BU Machine Learning and Deep Learning on VMware vSphere

• FUT2020BU Wringing Max Perf from vSphere for Extremely Demanding Workloads

• FUT2761BU Sharing High Performance Interconnects across Multiple VMs

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 42: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Extreme Performance Series – Barcelona

#SER2724BU CONFIDENTIAL 42

• SER2724BE Performance Best Practices

• SER2343BE vSphere Compute & Memory Schedulers

• SER1504BE vCenter Performance Deep Dive

• SER2849BE Predictive DRS – Performance & Best Practices

• VIRT1445BE Fast Virtualized Hadoop and Spark on All-Flash Disks

• VIRT1397BE Optimize & Increase Performance Using VMware NSX

• VIRT1052BE Monster VM Database Performance

• FUT2020BE Wringing Max Perf from vSphere for Extremely Demanding Workloads

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 43: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Extreme Performance Series – Hand on Labs

• Don’t miss these popular Extreme Performance labs:

• HOL-1804-01-SDC: vSphere 6.5 Performance Diagnostics & Benchmarking

– Each module dives deep into vSphere performance best practices, diagnostics, and optimizations using various interfaces and benchmarking tools.

• HOL-1804-02-CHG: vSphere Challenge Lab

– Each module places you in a different fictional scenario to fix common vSphere operational and performance problems.

#SER2724BU CONFIDENTIAL 43

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 44: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

Performance Survey

The VMware Performance Engineeringteam is always looking for feedback about your experience with theperformance of our products, ourvarious tools, interfaces and wherewe can improve.

Scan this QR code to access ashort survey and provide us directfeedback.

Alternatively: www.vmware.com/go/perf

Thank you!

#SER2724BU CONFIDENTIAL 44

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 45: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 46: SER2724BU Extreme Performance Series: or distribution · Mark Achtemichuk, VCDX, Staff Engineer, VMware Reza Taheri, Principal Engineer, VMware Valentin Bondzio, Senior Staff TSE,

VMworld 2017 Content: Not fo

r publication or distri

bution