sql performance 2011/12 joe chang, solidq [email protected]
TRANSCRIPT
SQL Performance 2011/12
Joe Chang, SolidQwww.qdpma.com http://sqlblog.com/blogs/joe_chang/[email protected]
2011
• Hardware is Powerful & Cheap– CPU (cores), memory, and now IO too!• Quad-core since 2006, 1GHz since 2000
• Is Performance still a concern?– Yes, along with fundamentals
• Modern Performance Strategy– Can handle minor inefficiencies– Identify and circumvent the really bad things
Modern Hardware
• 4-12 cores per processor socket• 16GB DIMM at less than $1K• SSD– Enterprise grade SSD still moderately expensive• Both SLC and MLC
– Consumer SSD – really cheap, $3-4K per TB• Uneven performance characteristics over time
• Desired IO performance: 1-2GB/s, 20-50K IOPS
Hardware Baseline 2011
Entry• 1 Xeon E3 quad-core• 16GB memory
– 4 x 4GB unbuffered ECC
• SSD options:– 2-4 SATA SSDs– 1-2 PCI-E SSDs
Mid-range • 2 Xeon 5600 6-core• 48 – 192GB
– 6 x 8GB to 12 x 16GB
• SSD options– 16+ SATA SSDs– 4-5 PCI-E SSDs
Performance Fundamentals
• Network round-trips– Owner qualified, case correct
• Log write latencies– Sufficiently low to support transaction volume• Not necessarily separate data and log disks
• Normalization – correct data trumps all!• Indexes – a few good ones, and not too many!• SQL – that the optimizer
What can go wrong?
• With immense hardware resources• And a great database engine– What can go wrong?
• Following a fixed set of rules and procedures– Basic transactions processing should work well– If your process does something unanticipated• Some things can go horribly wrong
Performance Concepts
• Query Optimizer• Execution plan operators– Formula for component operation
• Data distribution statistics – to estimate rows & pages, automatically updated– Rules when estimate not possible
• Stored procedure compile rules– Parameters and variables
Stored Procedure
Basics
Parameters and Variables
• On compile– Parameter values used to for row estimate– Variables – assume unknown value
• Consider effect of skewed distribution
Parameter & Variable
6 rows for value 1
4 rows for value unknown
Consider impact for skewed data distributions
Stored Procedure Compile Options
• WITH RECOMPILE• OPTIMIZE FOR• Plan Guide
• Temp table– KEEP PLAN, KEEPFIXED PLAN
Compile & Execute Time
• Plan reuse desired when– Compile cost is high relative to execute cost
• Recompile desired when– Execute cost is high relative to compile cost
Statistics
Basics
Statistics
• No statistics – table variables• Temp table – statistics auto recompute– 6 row modified, 500 rows, every 20% thereafter
• Statistics sampling– Random page, how to handle skewed distribution?
• Upper and lower bounds– Problems caused by incrementing columns
• Propagation errors
Statistics Recompute
• Scenario: start with accurate statistics• Update column with new values– That did not previously exist
• If fewer than 20% of rows updated– Auto-recompute is not triggered
Sampling
• Default sampling percentage is usually good• Caution: not a random row sample!– Random sampling of page• From nonclustered index if available
• If there is correlation between pages & values– Then serious over estimation possible
Out of range
• Statistics sampling tries to identify lower and upper bound
Bad Execution Plan Examples
Not comprehensive
Scenarios
• Or condition• Multiple optional search arguments• Skewed distributions– 1 business logic for Small and large data sets– Reports for 1 day, 1 week, 1 month, 1 year
• Statistics related problems– Resulting in horrible execution plan
Not comprehensive
When the Query Optimizer
Does not understand you
Simple OR Conditions
Simple Index Seek
OR Condition in Join
Alternative – Union
UNION and UNION ALL
• UNION– Only distinct rows– Sort to eliminate duplicates• Can be expensive for high row counts
• UNION ALL– All rows– No sort to eliminate duplicates
Multiple Optional SARGs
This was suppose to work, but does not
Parameterized SQL
Skew and Range Variation
Statistics Out-of Range
Statistics Out of Range (cont.)
Table Variable
No Statisticsassumes 1 row, 1page Why? No recompiles
Loop Join – Scan Inner Source
Really Bad News
Estimate 1 row
The Correct Plan
Hash Join forcedwith hint
Estimate 1 row
Loop Join vs. Hash Join
Alternative Plan with Index
Temp Table versus CTE
• Consider options– SELECT xxx INTO #Temp– FROM SqlMain Expression
– WITH tmp AS (SELECT xxx FROM Sql)Main Expression