modern performance - sql server joe chang jchang6 @ yahoo
TRANSCRIPT
About Joe• SQL Server consultant since 1999• Query Optimizer execution plan cost formulas (2002)• True cost structure of SQL plan operations (2003?)• Database with distribution statistics only, no data
2004• Decoding statblob/stats_stream
– writing your own statistics• Disk IO cost structure• Tools for system monitoring, execution plan analysis See http://www.qdpma.com/Download: http://www.qdpma.com/ExecStatsZip.htmlBlog: http://sqlblog.com/blogs/joe_chang/default.aspx
Overview
• General SQL Server Performance• Why performance is still important today?– Brute force?• Yes, but …
• Special Topics – spectacular fails• Automating data collections• SQL Server Engine– What developers/DBA need to know?
Not in this session
• List of rules to be followed blindly• without consideration for the underlying reason
• and whether rule actually applies in the current circumstance
DBA skill: cause and effect analysis & assessment
Common Themes?
• execution plan– Very large (multiple order of magnitude) error in
row estimate• Single (execute) of large operation – Might still be tolerable
• Multiple (executes) of large operations
select a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol ,a.Split_rep, a.Sales_Person_Name,cast(sum(a.January) as float) as January ,cast(sum(a.February) as float) as February ,cast(sum(a.March) as float) as March ,cast(sum(a.April) as float) as April ,cast(sum(a.May) as float) as May ,cast(sum(a.June) as float) as June ,cast(sum(a.July) as float) as July ,cast(sum(a.August) as float) as August ,cast(sum(a.September) as float) as September ,cast(sum(a.October) as float) as October ,cast(sum(a.November) as float) as November ,cast(sum(a.December) as float) as December ,cast(sum(a.Total) as float) as Totalfrom( select cast(hdr.Header as varchar(100)) as Header ,cast(AcctSec.CUSIP as varchar(100)) as CUSIP ,cast(AcctSec.Sec_No as varchar(100)) as SecNo ,cast(AcctSec.Sec_Desc1 as varchar(100)) as Security ,cast(AcctSec.Symbol as varchar(100)) as Symbol ,case when RefMonth.[MonthName] = 'January' then fct.Comm else 0 end as January ,case when RefMonth.[MonthName] = 'February' then fct.Comm else 0 end as February ,case when RefMonth.[MonthName] = 'March' then fct.Comm else 0 end as March ,case when RefMonth.[MonthName] = 'April' then fct.Comm else 0 end as April ,case when RefMonth.[MonthName] = 'May' then fct.Comm else 0 end as May ,case when RefMonth.[MonthName] = 'June' then fct.Comm else 0 end as June ,case when RefMonth.[MonthName] = 'July' then fct.Comm else 0 end as July ,case when RefMonth.[MonthName] = 'August' then fct.Comm else 0 end as August ,case when RefMonth.[MonthName] = 'September' then fct.Comm else 0 end as September ,case when RefMonth.[MonthName] = 'October' then fct.Comm else 0 end as October ,case when RefMonth.[MonthName] = 'November' then fct.Comm else 0 end as November ,case when RefMonth.[MonthName] = 'December' then fct.Comm else 0 end as December ,fct.Comm as Total ,AcctEmp.split_rep ,AcctEmp.Sales_Person_Name from PayoutSystemDW.[dbo].[PS_FactAccountSummary] fct join PayoutSystemDW.dbo.PS_DimensionRptBus RptBus on fct.DimRptBusID = RptBus.DimRptBusID join PayoutSystemDW.dbo.PS_DimensionHeader hdr on fct.DimHeaderID = hdr.DimHeaderID join PayoutSystemDW.dbo.PS_DimensionCurrency cur on fct.DimCurID = cur.DimCurID and cur.DimCurID = 1 join PayoutSystemDW.dbo.PS_DimensionAcctEmp AcctEmp on fct.DimAcctEmpID = acctemp.DimAcctEmpID and AcctEmp.Empno = 8125 and AcctEmp.Split_rep in ('PB54') join PayoutSystemDW.dbo.PS_DimensionAcctSec AcctSec on fct.DimAcctSecID = AcctSec.DimAcctSecID join PayoutSystemDW.dbo.PS_DimensionRefBuySell bs on fct.DimRefBuySellID = bs.DimRefBuySellID join PayoutSystemDW.[dbo].[PS_DimensionAcctOrg] AcctOrg on fct.DimAcctOrgID = AcctOrg.DimAcctOrgID and AcctOrg.OrgCode in ('38C') join PayoutSystemDW.[dbo].[PS_DimensionAcctClt] as AcctClt on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID and AcctClt.ClientName = 'BRACY DENNIS M' join PayoutSystemDW.dbo.PS_DimensionTradeInd ti on ti.DimTradeIndID = fct.DimTradeIndID and ti.[Trade_Ind_Year] = 2014 join PayoutSystemDW.dbo.PS_DimensionRefMonth RefMonth on RefMonth.MonthID = ti.Trade_Ind_Month where RptBus.ReportID = 1) agroup by a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol,a.Split_rep,a.Sales_Person_Name
select fct.Comm as Total, …From FactAccountSummary fctjoin DimensionRptBus RptBus on fct.DimRptBusID = RptBus.DimRptBusIDJoin DimensionCurrency cur on fct.DimCurID = cur.DimCurIDjoin DimensionRefBuySell bs on fct.DimRefBuySellID = bs.DimRefBuySellIDjoin DimensionAcctOrg] AcctOrg on fct.DimAcctOrgID = AcctOrg.DimAcctOrgIDjoin DimensionAcctClt as AcctClt on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID
QPI
CPU & Memory 2001 versus 2014
Xeon E7 v2 (Ivy Bridge), 15 cores, 3 QPI4 x 15 = 60 cores3TB (96 x 32GB) 24 DIMMs per socket40 PCI-E gen3 lanes + x4 g2 / socket
2001 – 4 sockets, 4 coresPentium III Xeon, 900MHz 4-8GB memory?
Xeon MP 2002-4
FSB
PL2
P P P
MCH
Each core today is more than 10x over Pentium III (700MHz?)
Mem___2013 __ 2014 16GB __ $191 __ $18032GB __ $794 __ $65064GB _____ __ $4510
PCH
DM
I
x4x4 x4 x4
MC
GFX
QPI
QPI
QPI
QPI
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
CPU & Memory 2001 versus 2012
Xeon E5 (Sandy Bridge), 8 cores, 2 QPI4 x 8 = 32 cores totalWestmere-EX 1TB (64x16GB) (3 QPI)Sandy Bridge E5: 768GB (48 x 16GB) (2 QPI)
FSB
PL2
P P P
MCH
QPI
QPI
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
2001 – 4 sockets, 4 coresPentium III Xeon, 900MHz 4-8GB memory?
Xeon MP 2002-4
Each core today is more than 10x over Pentium III (700MHz?)
Mem___2013 __ 2014 16GB __ $191 __ $18032GB __ $794 __ $65064GB _____ __ $4510
QPI
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-EQPI
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0
Microprocessor Pipeline
Branch PredictInstruction FetchDecodeRegister Allocate & RenameRe-Ordering BufferScheduleExecuteFlagsRetire
BP
BP IF ID RAT ROB Sch Exec Flags1st Retire
BP IF ID RAT ROB Sch Exec Flags2nd Retire
3GHz0.33ns clock
5 ns from start to finish200MHz
Microprocessor (core) is (multi-lane) assembly lineEach core is superscalarProcessor (socket) has multiple coresSystem has multiple sockets
CPU Access Times
L1 I L1 DL2 Unified
L3 Slice
DRAM
Core – 3.33GHz 1 CPU cycle = 0.3ns
L1 cache – 4 CPU clocks (1ns)
L2 cache 12 CPU cycles (4ns?)
L3 cache 29+ cycles
Local node memory28 cycles + 49 ns (open page)28 cycles + 56 ns (random page)
Remote node (1-hop) memory 28 + 100ns
2-hop 150-300ns+?
Logical 0 Logical 1
Latency Orders of Magnitude
PCH
DM
I
x4x4 x4 x4
MC
GFX
CoreCore – 3.33GHz 1 CPU cycle = 0.3ns
L1 cache – 4 CPU clocks (1ns)
L2 cache 12 CPU cycles (4ns?)
L3 cache 29+ cycles
Local node memory28 cycles + 49 ns (open page)28 cycles + 56 ns (random page)
Remote node (1-hop) memory 28 + 100ns
2-hop 150-300ns+?
L1 Cache
L1 Cache
LLC
Westmere-EX 8-Socket System
QPI
QPI
QPI
QPI
QPI
IOH 0
QPI
IOH 1
QPI
IOH 2
QPI
QPI IOH 3
PCI-E x8PCI-E x8
PCI-E x4
ESI
PCI-E x8PCI-E x8 PCH
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
C2
C1C0
C7
C8C9
C4 C5QPI QPI
C3 C6
MC
LLC
MC
QPI QPI
Large server systems are very complicated
Software developed without consideration for system architecture will likely have severe problems
This applies to the OS, SQL Server and the application
SMB
SMB
SMB
SMB
SMB
SMB
SMB
SMB
Storage 2001 versus 2012/13QPI
QPI192 GB
PC
Ie x8
PC
Ie x8
PC
Ie x8
PC
Ie x8
PC
Ie x8
PC
Ie x4
IBRAID RAID RAIDRAID10GbE
HDD HDD HDD HDD
SSD SSD SSD SSD
2001 100 x 10K HDD 125 IOPS each = 12.5K IOPSIO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth)
201364 SSDs, >10K+ IOPS each, 1M IOPS total possible10-20GB/s+ IO Bandwidth easy6.4GB/s on each PCIe G3 x8
SAN vendors – questionable BW
PC
I
PC
I
PC
I
PC
I
MCH
RAID RAID RAID RAID
HDD
HDD
HDD
HDD
HDD
HDD
HDD
HDD
http://www.qdpma.com/Storage/Storage2013.htmlhttp://www.qdpma.com/ppt/Storage_2013.pptx
SAN
SSD 10K 7.2K Hot Spares
Auto-tier pools
Switch Switch
SP A SP B
8 Gb FC
x4 SAS 2GB/s
24 GB 24 GB
HBA HBA
PC
Ie
PC
Ie
or10Gb FCOE
0.8 GB/s
x4 SAS 2GB/s
Data 5 Data 6 Data 7
Data 1 Data 2 Data 3 Data 4
Data 8
Data 9
Data 13
Data 10
Data 14
Data 11
Data 15
Data 12
Data 16
SSD 1 SSD 2 SSD 3 SSD 4
Log 1 Log 2 Log 3 Log 4
Node 1
1024 GB
Node 2
1024 GB
Switch Switch
SP A SP B
8 Gb FC
24 GB 24 GB
SSD
x8 x8
SSD
x8 x8
x8x8
x8
SSD
x8
SSD
Node 1 Node 2
768 GB 768 GB
http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx
Performance Past, Present, Future
• When will servers be so powerful that …– Been saying this for a long time
• Today – 10 to 100X overkill– 32-cores in 2012, 60-cores in 2014– Enough memory that IO is only sporadic– Unlimited IOPS with SSD
• What can go wrong?
Today’s topic
SQL Performance
Natural keys with unique indexes, not SQL
The Execution Plan links all the elements of performanceIndex tuning alone has limited valueOver indexing can cause problems as well
Index and Statistics maintenance policy
1 Logic may need more than one execution plan?
Compile cost versus execution cost?
Tables and SQL combined implement business logic
Plan cache bloat?
SQL Tablesnatural keys
Indexes
Execution Plan
Statistics& Compile parameters
Compile Row estimate propagation
errors
Storage Engine
Hardware
DOP MemoryParallel plans
Recompiletemp table /
table variable
Query Optimizer
Index & Stats Maintenance
API Server Cursors: open, prepare, execute, close?
SET NO COUNT Information
messages
Factors to Consider
SQL Tables Indexes
Query Optimizer
Statistics
Compile Parameters
Storage Engine
Hardware
DOPmemory
Special Topics
• Data type mismatch• Multiple Optional Search Arguments (SARG)– Function on SARG
• Parameter Sniffing versus Variables• Statistics related (big topic)• OR, AND/OR combinations IN/NOT IN, EXISTS• Complex Query with sub-expressions• Parallel Execution
Not in order of priority
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
1a. Data type mismatchDECLARE @name nvarchar(25) = N'Customer#000002760'SELECT * FROM CUSTOMER WHERE C_NAME = @name
SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name).NET auto-parameter discovery?
Unable to use index seek
Table column is varcharParameter/variable is nvarchar
1b. Type Mismatch – Row EstimateSELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%'SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer#00000276%'
Row estimate error could have severe consequences in a complex query
SELECT TOP + Row Estimate Error
SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID]
Data type mismatch – results in estimate rows highTop clause – easy to find first 1000 rows
In fact, there are few rows that match SARGWrong plan for evaluating large number of rows
http://www.qdpma.com/CBO/Relativity.html
2. Multiple Optional SARGDECLARE @Orderkey int, @Partkey int = 1
SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey)
AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)
IF blockDECLARE @Orderkey int, @Partkey int = 1
IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey)
ELSE IF (@Partkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY = @Partkey)
Need to consider impact of Parameter Sniffing,Consider the OPTIMIZER FOR hint
These are actually the stored procedure parameters
Dynamically Built Parameterized SQLDECLARE @Orderkey int, @Partkey int = 1, @SQL nvarchar(500), @Param nvarchar(100)SELECT @SQL = N‘/* Comment */SELECT * FROM LINEITEM WHERE 1=1‘, @Param = N'@Orderkey int, @Partkey int'
IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey'IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey'PRINT @SQLexec sp_executesql @SQL, @Param, @Orderkey, @Partkey
IF block is easier for few optionsDynamically built parameterized SQL better for many optionsConsider /*comment*/ to help identify source of SQL
2b. Function on column SARGSELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1
SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31'
DECLARE @Startdate date, @Days int = 1SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate)
3 Parameter Sniffing-- first call, procedure compiles with these parametersexec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31'
-- subsequent calls, procedure executes with original planexec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07'
Need different execution plans for narrow and wide rangeOptions: 1) OPTIMIZE FOR – one plan for all ranges2) WITH RECOMPILE – compile on each execute3) main procedure calls 1 of 2 identical sub-proceduresOne sub-procedure is only called for narrow rangeOther called for wide range
Skewed data distributions also importantExample: Large & small customers
Assuming date data type
4 Statistics
• Auto-recompute points• Sampling strategy– How much to sample - theory?– Random pages versus random rows– Histogram Equal and Range Rows– Out of bounds, value does not exist– etc.
Statistics Used by the Query Optimizer in SQL Server 2008Eric N. Hanson and Yavor Angelov, Contributor: Lubor KollarOptimizing Your Query Plans with the SQL Server 2014 Cardinality EstimatorJoseph Sack
http://msdn.microsoft.com/en-us/library/dd535534.aspx
Statistics Structure
• Stored (mostly) in binary field Scalar values
Density Vector – limit 30, half in NC, half Cluster key
HistogramUp to 200 steps
Consider not blindly using IDENTITY on critical tablesExample: Large customers get low ID valuesSmall customers get high ID values
http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx
Statistics Auto/Re-Compute
• Automatically generated on query compile• Recompute at 6 rows, 500, every 20%?
Has this changed? 2008 R2Trace 2371 – lower threshold auto recomputed for large tables
http://support.microsoft.com/kb/2754171
Statistics Sampling
• Sampling theory– True random sample– Sample error - square root N • Relative error 1/ N
• SQL Server sampling– Random pages • But always first and last page???
– All rows in selected pages
Row Estimate Problems (at source)
• Skewed data distribution• Out of bounds• Value does not exist
Row estimate errors at source – is classified under statistics topic
Loop Join - Table Scan on Inner Source
Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source
Default statistics rules may lead to serious ETL issuesConsider custom strategy
Compile Parameter Not ExistsMain procedure has cursor around view_ServersFirst server in view_Servers is ’CAESIUM’Cursor executes sub-procedure for each Serversql:
SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName
But CAESIUM does not exist in TReplWS!
SqlPlan Compile Parameters<?xml version="1.0" encoding="utf-8"?><ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)
FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1"StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound">
<StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" />
<QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar"
EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0">
</RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence></ShowPlanXML>
Compile parameter values at bottom of sqlplan file
5a Join 2 Tables, OR in SARG-- subsequent calls, procedure executes with original planSELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITYFROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEYWHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099
5a UNION (ALL) instead of ORSELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826UNION (ALL)SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) --
Caution: select list should have keys to ensure correct rowsUNION removes duplicates (with Sort operation)UNION ALL does not
-- Hugo Kornelis trick --
5b AND/OR Combinations• Hash Join is good method to process many rows – Requirement is equality join condition–
• AND/OR, IN NOT IN, EXISTS NOT EXISTS combinations– Query optimizer may not be to determine that equality
join condition exists– Execution plan will use loop join, – and attempt to force hash join will be rejected
• Re-write using UNION in place of OR• And LEFT JOIN in place of NOT IN
SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2)SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3)
More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html
Complex Queries
• High Compile effort– Many joins, Many indexes– Estimated plan cost correlation
• Row estimation errors after multiple operations
Row estimate errors at source – is classified under statistics topic
Complex Query with Sub-expression
• Query complexity – really high compile cost
• Repeating sub-expressions (including CTE)– Must be evaluated multiple times
• Main Problem - Row estimate error propagation
• Solution/Strategy – Get a good execution plan– Temp table when estimate is high, actual is low.
More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.htmlhttp://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work?
More Plan Details
Query with joining 6 tablesEach table has too many indexesRow estimate is high – plan cost is highQuery optimizer tries really really hard to find better planActual rows is moderate, any plan works
Temp Table and Table Variable
• Forget what other people have said– Most is cr@p
• Temp Tables – subject to statistics auto/re-compile
• Table variable – no statistics, assumes 1 row
• Question: In each specific case: does the statistics and recompile help or not?– Yes: temp table– No: table variable
Is this still true?
Parallelism
• Designed for 1998 era– Cost Threshold for Parallelism: default 5– Max Degree of Parallelism – instance level– OPTION (MAXDOP n) – query level
• Today – complex system – 32 cores– Plan cost 5 query might run in 10ms?– Some queries at DOP 4– Others at DOP 16?
More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.htmlhttp://www.qdpma.com/CBO/ParallelismOnset.html
Really need to rethink parallelism / NUMA strategies
Number of concurrently running queries x DOP less than number of logical/physical processors?
Tables with computed columns may inhibit parallelism?
varchar(max) stored in lob pages
• Disk IO to lob pages is synchronous?– Must access row to get 16 byte link?– Feature request: index pointer to lob
SQL PASS 2013Understanding Data Files at the Byte LevelMark Rasmussen
legacy
• API Server Cursors / Cursor Stored Procedures– sp_prepare / sp_prepexec, sp_execute,
sp_unprepare– sp_cursoropen, sp_cursorfetch, sp_cursorclose– sp_cursorprepare / sp_cursorprepexec,
sp_cursorexecute, sp_cursorunprepare
• Guess which is not called?– Symptom: sp_reset_connection
http://technet.microsoft.com/en-us/library/ms187088(v=sql.105).aspx API Server Cursorshttp://technet.microsoft.com/en-us/library/ms187801(v=sql.120).aspx Cursor Stored Procedures
Summary
• Hardware today is really powerful– Storage may not be – SAN vendor disconnect
• Standard performance practice– Top resource consumers, index usage
• But also Look for serious blunders
http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.htmlhttp://www.qdpma.com/CBO/Relativity.htmlhttp://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspxKevin Boles – Common TSQL Mistakes
Special Topics
• Data type mismatch• Multiple Optional Search Arguments (SARG)– Function on SARG
• Parameter Sniffing versus Variables• Statistics related (big topic)• AND/OR• Complex Query• Parallel Execution
SQL Server Edition Strategies
• Enterprise Edition – per core licensing costs– Old system strategy• 4 (or 2)-socket server, top processor, max memory
– Today: How many cores are necessary• 2 socket system, max memory (16GB DIMMs)
• Is standard edition adequate– Low cost, but many important features disabled
• BI edition – 16 cores– Limited to 64GB for SQL Server process
New Features in SQL Server• 2005
– Index included columns– Filtered index– CLR
• 2008– Partitioning– Compression
• 2012– Column store (non-clustered)
• 2014– Column store clustered– Hekaton
SQL Performance General
• Client-side architecture – Connection pooling– stored procedures versus SQL, parameterized
• Database Architecture– Cluster key, primary key, natural keys, foreign keys
• SQL – • Indexing• Indexes & Statistics Maintenance
Client-side Architecture
• Connection pooling: – Connection.Open, Execute, Connection.Close– Sp_reset_connection
• Stored procedures – parameterized SQL– Stored procedure name is short– Parameterized SQL may not be • Larger than 1 Ethernet packet? 2?, 8?
Database Architecture
• Normalization• Cluster key• Primary Key & other unique / natural keys• Foreign keys
QPI
CPU & Memory 2001 versus 2014x
Xeon E7 v2 (Ivy Bridge, 3 QPI)4 x 15 = 60 cores3TB (96 x 32GB) 24 DIMMs per socket40 PCI-E gen3 lanes + x4 g2 / socket
2001 – 4 sockets, 4 coresPentium III Xeon, 900MHz 4-8GB memory?
Xeon MP 2002-4
FSB
PL2
P P P
MCH
Each core today is more than 10x over Pentium III (700MHz?)
Mem___2013 __ 2014 16GB __ $191 __ $18032GB __ $794 __ $650
PCH
DM
I
x4x4 x4 x4
MC
GFX
QPI
QPI
QPI
QPI
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
DM
I 2
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
Work in progress
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLCBCD
C
E
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0
QPI
MI
PCI-E
MIC1C2C3C4
C8C7C6C5
LLC
DM
I 2PCI-EPCI-EPCI-EPCI-EPCI-E
QPI
MI
PCI-E
MI
C1C2C3
C0
C4
C8C7C6
C9
C5
LLC111213
10
14
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
PCI-E
MI
PCI-E
C1 C6C2 C5C3 C4
LLC
QPI
MIC7C0