right-sizing your sql server virtual machine
TRANSCRIPT
“Right-Sizing” YourSQL Server VMDBA-304David Klee, Founder
Heraflux Technologies
I want more compute resources
on my SQL Server.
My VM admin wants to take some away.
What is the right answer?
Recurring Question
About David Klee
@kleegeekdavidklee.netgplus.to/kleegeeklinked.com/a/davidaklee
Specialties / Focus Areas / Passions:
• Performance Tuning• Virtualization• Infrastructure• Troubleshooting
• High Availability• Disaster Recovery• Capacity Management• Health & Efficiency
Please silence cell phones
Explore Everything PASS Has to Offer
Free SQL Server and BI Web Events Free 1-day Training Events Regional Event
Local User Groups Around
the World
Free Online Technical Training
This is Community Business Analytics Training
Session Recordings PASS Newsletter
Session Evaluations
ways to access
Go to
passsummit.com/evals
Download the GuideBook App
and search: PASS Summit 2014
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 11:59 PM ESTFriday Nov. 7 toWIN prizes
Your feedback is important and valuable.
Evaluation Deadline:
11:59 PM EST, Sunday Nov. 16
• What is “right-sizing”, and why
• Profiling the system stack components
• CPU / memory / storage
• Analyzing environment
• Workload analysis
• Perfmon data review
Agenda
• Abstraction layer between hardware and OS
• Resources
• Queues
• Limits in the environment
• Resource limitations (hard)
• Queue contention (soft)
What Is Virtualization - for DBAs?
• VM resource allocations
• vCPU
• Memory
• Storage presentation
• One size does not fit all workloads
• Inappropriate resource allocations
can hurt VM performance
“Right-Sizing”?
• Single compute node hardware
• Total cluster compute capacity
• Storage speed (IOPs, throughput)
• VM maximums
• Interconnect path speed
Hard Limits (Resources) Soft Limits (Queues)
• Memory oversubscription
• CPU scheduler contention
• Shared resource utilization
• Variable resource utilization levels
• “Noisy Neighbors”
Hard vs. Soft Limits
Resources
150 GHzCPU
4 TBMemory
4x10GbE
Network20 TBTier 1
Storage
40 TBTier 2
Storage
VM16 vCPU
128 GB vRAM
VM8 vCPU
64 GB vRAM
VM2 vCPU16 GB vRAM
VM2 vCPU16 GB vRAM
VM2 vCPU16 GB vRAM
VM2 vCPU16 GB vRAM
VM2 vCPU16 GB vRAM
VM2 vCPU16 GB vRAM
V I R T U A L I Z A T I O N
Queues
Hy
pe
rv
iso
r CPU SchedulerCPU
ExecutionCPU Scheduling Queue
Memory AllocatorMem R / W
Mem Allocation Queue
Disk SchedulerDisk R / W
Disk Scheduling Queue
Network SchedulerNetworkTran / Rec
Network Scheduling Queue
VM TASK
VM TASK
VM TASK
VM TASK
VM TASK
• Resource limits are easy to detect / work around
• Queue contention much harder
• Time in queue = time lost from VM
• Silent performance killer
• Everything in a VM must be scheduled
• … including idle resources
• Queue processing is not always FIFO
Goal: Minimize Queue Waits
Four vCPU Scheduling
VM
SM
P T
ASK
S
vCPU0
vCPU1
vCPU2
vCPU3
vCPU scheduling queues by pCPU core
Scheduling queue waits
High vCPU queue contention
Two vCPU Scheduling
VM
SM
P T
ASK
S
vCPU0
vCPU1
vCPU scheduling queues by pCPU core
Scheduling queue waits
High vCPU queue contention
• 24x7 performance metric
collection
• CRITICAL
• Metrics from every piece
of the system stack
Performance Metric Collection
Net
wo
rkin
g
Inte
rco
nn
ects
Physical Server
Virtualization
Operating System
SQL Server Instance
SQL Server DB
Application
Storage
• SQL Server
• Raw CPU / mem / disk
usage
• NUMA memory usage
• Signal waits
• Storage latency by DB file
• Wait statistics
• Glenn Berry @ bit.ly/1wdMB8n
Which High Level Metrics?
• Windows
• CPU & memory consumption
• Storage IOPs / latency /
throughput
• Processes (SQL Server vs other)
• Perfmon how-to @ bit.ly/1sqSVns
• @ bit.ly/1xW4jzJ
Capture all metrics as granularly as possible!
• Virtualization
• Resource consumption by
VM
• Resource utilization by host
• CPU scheduling queue wait
• Overcommitment metrics
• VMware vSphere: CPU Ready
• MS Hyper-V: CPU Wait Time
per Dispatch
Which High Level Metrics?
• Storage
• IOPs / latency / throughput
• By LUN
• By disk group
• Controller
• Interconnect path utilization
• Controller cache hit metrics
Capture all metrics as granularly as possible!
• Overlay all data streams
• Understand / classify:
• Workload periods
• Workload sources
• Business time period
• Goal: metrics by time
period
How to Analyze
• Median & Percentile analysis
• Explain & filter statistical
anomalies
• Statistics
• Min / Average / Max / Median
• Percentile
How to Size vCPUs
• vCPU counts matter!
• Size for what you need today
• Too many vCPUs = BAD (probably)
• Too few vCPUs = BAD (usually)
• Workload / server specific
vCPU Sizing
• Not done at just vCPU count
• vNUMA configuration also matters
• Closely align with pNUMA
• Adds efficiency by aligning with underlying hardware
• Performance difference improves with larger VMs
CPU Sizing - NUMA
• Get physical machine configuration
• Try to fit VM inside one NUMA node
• Otherwise, balance across number
of NUMA nodes
• Test configurations for best results
CPU Sizing - vNUMA
• Example: 16 vCPU VM
• What’s better?
• 2 vSocket x 8 vCore?
• 4 vSocket x 4 vCore?
• 8 vSocket x 2 vCore?
• Varies by workload,
hardware
• Test it for yourself!
CPU Sizing – vNUMA Results
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
8 16 64 256
Tra
nsa
ctio
ns
/ m
inConcurrent HammerDB Users
vNUMA SQL Server Scalability - 16 vCPUs - HammerDB
4socket x 4CPU 8socket x 2CPU 2socket x 8CPU
• SQL Server
• CPU consumption by DB
• Top waits
• Signal waits
• Scrape parallelism from
execution plan @ bit.ly/1rTs9UX
CPU - Metrics
• Windows
• CPU usage per core
• SQL Server vs. background
• VM Host
• CPU utilization over 80%
• VM CPU queue waits high
• Understand the workload parallelism, concurrent volume
• Determine averages, maximums, and percentiles
• Determine the appropriate profiling period
• < 40% utilization avg – too many CPUs
• > 60% utilization avg – too few CPUs
• Factor CPU waits inside SQL Server
• Vary according to your circumstances
How To Size vCPUs
How To Size vRAM
• SQL Server data must be in buffer pool
• More memory ≈ less I/O
• Less I/O = less waiting on shared storage & queues
• NO HOST MEMORY OVERCOMMITMENT
• Too much memory = lower VM consolidation ratio
• Balancing act
Memory
• SQL Server
• Page Life Expectancy
• Buffer Cache Hit Ratio
• High page fault count
• High recompile ratio
• RESOURCE_SEMAPHORE
waits
• Memory grants pending
Memory - Metrics
• Windows
• MB free
• Paging
• VM Host
• Memory consumption > 90%
• Memory ballooning /
dynamic memory expansion
• How much memory?
• Slow storage? More RAM!
• Fast storage? Less RAM?
• More RAM = less host-level consolidation
• More SQL Server licensing (possibly)
• Table / index compression
How To Size vRAM
Handling Storage
• Much less variable in nature
• Most shared resource
• Most critical
• Most complex
• Most problematic
• Slowest piece of the stack
• Random I/O disk patterns
• Many individual points of contention
Storage
Storage – Contention Points
Controller
Controller
LUN
LUN
LUN
LUN
Disk Pool
VM
VM
VM
VM
• Test raw performance• SQLIO Batch
bit.ly/1mEAS9W
• DiskSpdbit.ly/1CeQauw
• Collect metrics:
• I/Os per second (IOPs)
• Latency (ms)
• Throughput (MB/s)
Storage - Maximums
0.00
10000.00
20000.00
30000.00
40000.00
50000.00
60000.00
70000.00
1 2 4 8 16 32 64 128
IOps
Thread Intensity
IOps Per Operations per Thread
Sequential Read
Random Read
Sequential Write
Random Write
Storage – Spread Out WorkloadSAN
DB
E:
FG1
FG2
DF4
DF3
DF2
DF1
G:
F:
Win
do
ws
Serv
er
OS
x
Vir
tualiz
ati
on
Hard
ware
HB
A4
HB
A3
HB
A2
HB
A1
Inte
rco
nn
ect
Sw
itch
Co
ntr
olle
r 1
Co
ntr
olle
r 2
SA
N D
isk
Gro
upLU
N 2
LUN
1
HB
A4
HB
A3
HB
A2
HB
A1
Steady-State Review
• Determine your runtime stats & percentiles
• Determine load thresholds
• Review estimated requirements
• Change VM configuration
• Incremental changes, not huge ones
• Test and retest
Collected Metrics - Now What?
Metric Review Sample
• Workloads & applications change
• DBs are added / removed
• Perform a right-sizing analysis as necessary
• Adjust the VM resources accordingly
• Recommended: Periodic review of VM sizing
• Quarterly for volatile environments
Not a One-Time Process
• One VM size does not fit all workloads
• Profile and record your workload performance
characteristics
• Analyze the numbers
• Adjust VM configuration and validate
• Repeat as often as your workload changes
Summary
This is a pain!
Shouldn’t this be easier?
But Wait!
Introducing…
• Automate the estimation of a VM’s
“right-sized” resource assignment
•FREE!• Beta to be available
soon!
SQL Server VM “Right Size” Estimator
Scripts Available:www.davidklee.net
Contact Me!davidklee.net @kleegeek
Questions?Thanks for attending!