sql performance tuning for developers
DESCRIPTION
TRANSCRIPT
SQL SERVER 2005/2008
Performance tuning for
the developer
Michelle Gutzait
Blog: http://michelle-gutzait.spaces.live.com/default.aspx
2
Whoami?
SQL Server Team Lead @ www.pythian.com
24/7 Remote DBA services
I live in Montreal
Blog: http://michelle-gutzait.spaces.live.com/default.aspx
33
Agenda – Part I
General concepts of performance and
tuning• Performance bottlenecks
• Optimization tools
• Table and index
• The data page
• the optimizer
• Execution plans
44
Agenda – Part II
Development performance Tips• T-SQL commands
• Views
• Cursors
• User-defined functions
• Working with temporary tables and table variables
• Stored Procedures and functions
• Data Manipulation
• Transactions
• Dynamic SQL
• Triggers
• Locks
• Table and database design issues
55
“The fact that I can
does not mean that I
should !”
Kimberly Tripp (?)
66
Always treat your
code as if it‟s
running:
Frequently
On large amount of data
In a very busy environment
7
The goal
Min response time and Max
throughput
Reduce network traffic, disk I/O
and CPU time
Start optimizing as early as
possible as it will be harder
later.
8
Design and Tuning Tradeoffs
9
Network Communication
Database Applications
Presentation Layer
Application Logic
Client OS
Network
Network
OS/IO Subsystem
SQL Server
Operating
System and
Hardware
Client
Side
Server
Side
Client/Server Tuning Levels
10
The Typical Performance
Pyramid
Application / Query / Database Design
Operating Environment
HardwareBeware: In certain
environments this pyramid
may be upside down!
11
Application & performance
1212
The result
“Ugly” code may perform
much better
13
Performance bottlenecks - tools
Windows Performance Monitor
SQL Server Profiler
SQL Server Management Studio
1414
Performance bottlenecks – tools
(Cont…)
Database Engine Tuning Advisor
DMVs and statistics
SQL Server 2008 Activity Monitor
151515
Performance bottlenecks - tools
3-rd party tool
1616
Let’s remember few basic
concepts…
17
Tables and Indexes
Disk
Possible
bottleneck
Possible bottleneck
Possible
bottleneck
1818
Rows On A Page
Page Header
Row A
Row C
Row B
ABC
Data rows
Row Offset Table2 bytes each
96
bytes
8,096
bytes
19
The Data Row
Header Fixed data NB VB Variable data
Null
Block
Variable
Block
4 bytes
Data
20202020
Data access methods
2121
Index
• Helps locate data more rapidly
2222
Index Structure: Nonclustered Index
2323
Structure of Clustered Index
242424
Covering Index
252525
Index
26
Heap table
• A table with no clustered index
RID is built from file:page:row
2727
Table Scan
Will usually
be faster
using a
clustered
index
2828
Parsing
Normalization
Sequence Tree
Is SQL?
Trivial Plan
Optimization
Syntatic
Transformation
SQL
Optimization
Execution Plan
T-SQL
YesIs Cheap
Enough?
SARG Selection
Index Selection
JOIN Selection
NO
Caching
Memory Allocation
Execution
Execution
Plan – cost
based
optimization
Optimizer hints
View optimizer info
29292929
Few concepts in the Execution
Plan algorithm…
303030
Search ARGuments
SARG Always isolate Columns
SARG NOT SARG
where MonthlySalary > 600000/12 where MonthlySalary * 12 > 600000
where ID in (select ID from vw_Person) where dbo.fu_IsPerson(ID) = 1234
where firstname like 'm%' where SUBSTRING(firstname,1,1) = 'm’
SARG:
= BETWEEN, >, <, LIKE ‟x%‟, EXISTS
Not SARGABLE:
LIKE „%x‟, NOT LIKE, NOT EXISTS, FUNCTION(column)
AND creates a single SARG
OR creates multiple SARG‟s
31
Table, column and index statistics
Step
AL
CA
IL
IL
OR
TX
WA
WY
Sales…
…………………………………………………………
…
…………………………………………………………
state
ALAKCACACACTILILILILIL
MTORORPATXTXWAWAWAWIWY
Step #
0
1
2
3
4
5
6
7
statblob
ALCAILIL
ORTXWAWY
…
……………………
…
……………………
sys.sysobjvalues (internal)
323232
Update statistics - “Rules of thumb”
Use auto create and auto update statistics
5% of the table changes
Still bad query:
Create statistics
Update statistics with FULLSCAN
Use multi-column statistics when queries have multi-
column conditions
Use AUTO_UPDATE_STATISTICS_ASYNC
database option
No stats for temporary objects and functions
333333
Join selection
JOIN Types
NESTED LOOP
MERGE
HASH
Factors:
• JOIN strategies
• JOIN order
• SARG
• Indexes
3434
HASH Joins are used when no useful index
exists on one or both of the JOIN inputs.
These can be converted to MERGE or LOOP
joins through effective indexing.
Joins - Optimization tip
3535
Index intersection
SELECT *
FROM authors
WHERE au_fname = ‘Fox' AND au_lname
= ‘Mulder'
36363636
Tuning with indexes…
Index
37
Index tips
MORE indexes – for queries, LESS indexes – for updates
More indexes – more possibilities for optimizer
Having a CLUSTERED INDEX is almost always a good
idea…
Sort operations: TOP, DISTINCT, GROUP BY, ORDER BY
and JOIN; WHERE
As narrow as possible to avoid excessive I/O
Use integer values rather than character values
Values with low selectivity
covering index - faster than a clustered index
38
Index tips 2
CLUSTERED index key in all non-clustered indexes (otherwise RID is used)
Frequently updated column and clustered index
Drop costly UNUSED indexes
High volume inserts – incremental Clustered index
Surrogate integer primary key (identity ?)
Clustered index for random modifications and index bottleneck
CLUSTERED index on non-unique columns – 16 bytes added (uniqueidentifier)
39
Creating index before rare heavy operations
When Changing/dropping CLUSTERED index, drop all
NON-CLUSTERED indexes first.
Don‟t forget to recreate them later
Indexes are almost always in cache, therefore are faster
Column referenced by OR and no index on the column
table scan.
PRIMARY KEY and UNIQUE CONSTRAINTS create
indexes
Foreign Keys do NOT create indexes
Index tips 3
40
Wide and fewer indexes are sometimes better
than many and narrower indexes
INCLUDE columns for covering index
Indexes are used to reduce the number of rows
fetched, otherwise they are not necessary
If TEMPDB resides on different physical disk,
you may use SORT_IN_TEMPDB
Index tips 4
414141
Analyze execution plans and
statistics
Demo - Indexes
424242
Fill Factor and PAD_INDEX
Default Fillfactor 0 – data pages 100% full
4343434343
Data modifications…
44
Page Header
Row A
Row C
Row B
ABC
Row E
44
Data modifications
4444
Data rows
Row Offset Table2 bytes each
96
bytes
8,096
bytes
Row D
In-place
direct
Row A Ver 2
Row A
Page Header
Row A
Row C
Row B
ABC
Row E
45
Data modifications
4545
Data rows
Row Offset Table2 bytes each
96
bytes
8,096
bytes
Row D
In-place
indirect
46
Page Header
Row A
Row C
Row B
ABC
Row E
46
Data modifications
4646
Data rows
Row Offset Table2 bytes each
96
bytes
8,096
bytes
Row D
Differed update –forwarded
In a heap – rows are forwarded leaving old address in place
474747474747
Index fragmentation
48
INDEXES - fragmentationDBCC SHOWCONTIG ('Orders‘)
DBCC SHOWCONTIG scanning 'Orders' table...Table: 'Orders' (21575115); index ID: 1, database ID: 6TABLE level scan performed.- Pages Scanned................................: 20- Extents Scanned..............................: 5- Extent Switches..............................: 4- Avg. Pages per Extent........................: 4.0- Scan Density [Best Count:Actual Count].......: 60.00% [3:5]- Logical Scan Fragmentation ..................: 0.00%- Extent Scan Fragmentation ...................: 40.00%- Avg. Bytes Free per Page.....................: 146.5- Avg. Page Density (full).....................: 98.19
SELECT *
FROM sys.dm_db_index_physical_stats
(DatabaseID, TableId, IndexId, NULL, Mode)
49
Indexed Views
SELECT t1.Col2, t2.Col3,
count(*) as Cnt
FROM Table_1 t1
INNER JOIN Table_2 t2
ON t1.Col1 = t2.Col1
GROUP BY t1.Col2, t2.Col3
Possible bottleneck
50
“Performance tuning SQL Statements
involves doing things to allow the optimizer
make better decisions”
Your options for performance
tuning are indexing or rewriting
51
Questions
End of Part I…
52
Agenda – Part II
Development performance Tips• T-SQL commands
• Views
• Cursors
• User-defined functions
• Working with temporary tables and table variables
• Stored Procedures and functions
• Data Manipulation
• Transactions
• Dynamic SQL
• Triggers
• Locks
• Table and database design issues
535353
Returning/processing too much
data…
5454
Disk
Database Applications
Presentation Layer
Application Logic
Client OS
Network
Network
OS/IO Subsystem
SQL Server
55
What could possibly be “wrong”
with this query ?
SELECT * FROM MyTable WHERE Col1 = „x‟
SELECT Col1 FROM MyTable1, MyTable2
SELECT TOP 2000000 Col1 FROM MyTable1
Looping on the Client side: WHILE @i < 10000
Update tb1 WHERE Col = @i@i = @i + 1
5656
What could possibly be wrong
with this query (cont) ?
SELECT *FROM MyTable t1INNER JOIN MyTable_2 t2 on t1.Col1 = t2.Col1INNER JOIN MyTable_3 t3 on t1.Col1 = t3.Col1LEFT JOIN MyTable_4 t4 on t1.Col1 = t4.Col1LEFT JOIN MyTable_5 t5 on t1.Col1 = t5.Col1 LEFT JOIN MyTable_6 t6 on t1.Col1 = t6.Col1LEFT JOIN MyTable_7 t7 on t1.Col1 = t7.Col1LEFT JOIN MyTable_8 t8 on t1.Col1 = t8.Col1LEFT JOIN MyTable_9 t9 on t1.Col1 = t8.Col1LEFT JOIN MyTable_10 t10 on t1.Col1 = t8.Col1 ……
5757
What is the difference?
Short Long(er) ?
IF EXISTS
(SELECT 1 FROM MyTable)
SELECT @rc=COUNT(*)
FROM MyTable
IF @rc > 0
IF EXISTS
(SELECT 1 FROM MyTable)
IF EXISTS
(SELECT * FROM MyTable)
IF EXISTS
(SELECT 1 FROM MyTable)
IF NOT EXISTS
(SELECT 1 FROM MyTable)
SELECT MyTable1.Col1,
MyTable1.Col2
FROM MyTable1
INNER JOIN MyTable2
ON MyTable1.Col1 = MyTable2.Col1
SELECT MyTable1.Col1,
MyTable1.Col2
FROM MyTable1
WHERE MyTable1.Col1 IN
(SELECT MyTable2.Col1
FROM MyTable2)
585858
What is the difference?
Short Long(er) ?
SELECT MyTable1.Col1,
MyTable1.Col2
FROM MyTable1
WHERE MyTable1.Col1 IN
(SELECT MyTable2.Col1
FROM MyTable2)
SELECT MyTable1.Col1,
MyTable1.Col2
FROM MyTable1
WHERE EXISTS
(SELECT 1
FROM MyTable2.Col1
WHERE MyTable2.Col1 =
MyTable1.Col1)
59595959
Sorting the data…
60
What is the difference?
Sort No sort
SELECT Col1
FROM Table1
UNION
SELECT Col2
FROM Table2
SELECT Col1
FROM Table1
UNION ALL
SELECT Col2
FROM Table2
SELECT DISTINCT Col1
FROM Table1
SELECT Col1
FROM Table1
SELECT Col1
FROM Table1
WHERE col2 IN (SELECT DISTINCT Col3
FROM Table2)
SELECT Col1
FROM Table1
WHERE col2 IN (SELECT Col3
FROM Table2)
CREATE VIEW VW1
SELECT * FROM DB2..Table1
ORDER BY Col1
CREATE VIEW VW1
SELECT * FROM
DB2..Table1
6161
Which one is BETTER ?
Sort No sort
SELECT Col1
FROM Table1
WHERE ModifiedDate
IN (SELECT TOP 1
FROM Table1
ORDER BY ModifiedDate
DESC)
SELECT Col1
FROM Table1
WHERE ModifiedDate =
(SELECT MAX(ModifiedDate )
FROM Table1)
6262
The OR operator
636363
What is the difference?OR No OR
SELECT Col1
FROM Table1
WHERE Col1 = „x‟
OR Col2 = „y‟
SELECT Col1
FROM Table1
WHERE Col1 = „x‟
UNION
SELECT Col1
FROM Table1
WHERE Col2 = „y‟
SELECT Col1
FROM Table1
WHERE Col1 IN
(SELECT C1 FROM Table2)
OR Col1 IN
(SELECT C2 FROM Table2)
SELECT Col1
FROM Table1
WHERE EXISTS (SELECT 1 FROM Table2
WHERE Col1 = C1)
UNION ALL
SELECT 1 FROM Table2
WHERE Col1 = C2)
SELECT *
FROM Table1
WHERE Col1 IN
(SELECT C1 FROM Table2)
OR Col2 IN
(SELECT C2 FROM Table2)
SELECT *
FROM Table1
????
64
Locks
65
•Row Locks
•Page Locks
•Table Locks
Lock granularity
6666
Row Locks
Page LocksTable Locks
Lock granularity
> 5000
locks
Principal lock types
S UX
6868
Dirty Read
•WITH (NOLOCK)
• SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
6969
Nonrepeatable Read
• Default
7070
Phantom Read
7171
ANSI Isolation Level
Dirty Reads Nonrepeatable Reads
Phantom Reads
Level 0
Level 1
Level 2
Level 3
Read uncommitted
Read committed (DEFAULT)
Repeatable reads
Serializable
SNAPSHOT
7272
Programming with isolation
level locks
Database
Transaction
Statement/table
737373
Isolation levels - example
USE pubs
GO
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
SELECT au_lname FROM authors WITH (NOLOCK)
GO
The locks generated are:
EXEC sp_lock
GO
747474
EXEC Sp_lock
SELECT object_name(85575343)
GO
-----------------------------
authors
spid dbid ObjId IndId Type Resource Mode Status
51 5 0 DB S GRANT
51 10 85575343 2 KEY (a802b526c101) RangeS-S GRANT
51 10 85575343 2 KEY (54013f7c6be5) RangeS-S GRANT
51 10 85575343 2 KEY (b200dbb63a8d) RangeS-S GRANT
51 10 85575343 2 KEY (49014dc93755) RangeS-S GRANT
51 10 85575343 2 KEY (170130366f3d) RangeS-S GRANT
51 10 85575343 2 PAG 1:1482 IS GRANT
51 10 85575343 2 KEY (c300d27116cf) RangeS-S GRANT
51 10 85575343 0 TAB IS GRANT
51 10 85575343 2 KEY (1101ed75c8f8) RangeS-S GRANT
51 10 85575343 2 KEY (2802f6d3696b) RangeS-S GRANT
51 10 85575343 2 KEY (0701fdd03550) RangeS-S GRANT
51 10 85575343 2 KEY (7f00d0d5506b) RangeS-S GRANT
7575
Temporary Objects
767676
Temporary objects
#tmp
##GlobalTmp
Tempdb..StaticTmp
@TableVariable
Table-valued functions
Common Table Extention (CTE)
View ?
FROM (SELECT …)
77777777777777
Stored Procedures…
7878
What are the benefits of
Stored Procedures? Reduce network traffic
Reusable execution plans
Efficient Client execution requests
Code reuse
Encapsulation of logic
Client independence
Security implementation
As a general rule of thumb, all Transact-SQL code should be called from stored procedures.
7979
Stored Procedures tips
SET NOCOUNT ON
No sp_
Owned by DBO
Exec databaseowner.objectname
Select from databaseowner.objectname
Break down large SPs
8080
SP Recompilations
#temp instead of @Temp table variables
DDL statements
Some set commands
Use SQL Server Profiler to check recompilations
8181
Which one is better and why?
IF @P = 0
SQL Statement Block1
ELSE
SQL Statement Block2
IF @P = 0
Exec sp_Block1
ELSE
Exec sp_Block2
8282
What could be problematic
here?CREATE PROC MySP
@p_FROM INT, @p_TO INT
AS
SELECT count(*) FROM MyTableWHERE PK between @p_FROM and @p_TO
PK
0
5
10
34
87
…
198,739
….
3,898,787
CREATE … WITH RECOMPILE
EXECUTE … WITH RECOMPILE
sp_recompile objname
MyTable
7 million rows
8383
Dynamic SQL… Sp_exectusql VS. execute
8484
Which one is better and why?
EXEC („SELECT Col1 FROM Table1 „ +
„WHERE „ + @WhereClause)
Exec sp_executesql @SQLString
Exec sp_executesql @SQLString,
@ParmDefinition, @PK = @IntVariable
8585
Reusable Execution (Query) Plan -
generated by sp_executesql
8686868686868686
Cursors…
8787
Cusrors - implications
Resources Required at Each Stage
88
What could possibly replace
cursors? Loops ?
Temp tables
Local variables (!)
CTEs
CASE statements
Multiple queries
AND…
89
Replacing cursor
Tip #1
Select Seq=identity(int,1,1),
Fld1,
Fld2,
……
Into #TmpTable
From Table1
Order by …
Seq Fld1 Fld2 …..
1 Aaa 45.7
2 Absb 555.0
3 Adasd 12.8
4 oioiooi 0.0
….. ….. ….. …..
9090
Replacing cursor
Tip #2
declare @var int
set @var = 0
Update Table1set @Var = Fld2 = Fld2 + @VarFrom Table1 with (index=pk_MyExampleTable)option (maxdop 1)go
91
Cursor Example…
92
TRY ME….
9393
Optimizer Hints…
94
Optimizer Hints
Most common
WITH (ROWLOCK)
WITH (NOLOCK)
WITH (INDEX = IX_INDEX_NAME)
WITH (HOLDLOCK)
SET FORCEPLAN ON
OPTION (MAXDOP 1)
Join hints (MERGE/HASH/LOOP)
Isolation levels WITH (SERIALIZABLE, READ COMMITED)
Granularity level (UPDLOCK, TABLOCK, TABLOCKX)
95959595
What is possibly wrong here?
BEGIN TRAN
UPDATE MyTable SET Col1 = ‘x’
WHERE Col1 IN
(SELECT Col1 from MyTable_2)
COMMIT TRAN
Col1
x
x
y
y
y
…
m
….
z
MyTable
BEGIN TRAN
UPDATE MyTable SET Col1 = ‘x’
WHERE Col1 IN
(SELECT Col1 from MyTable_2 WITH (NOLOCK) )
COMMIT TRAN
9696
Tip…
If your database is Read Only in
nature, change it to be as such!
97
The Transaction Log…
T-LOG
989898
What is wrong here?
BEGIN TRAN
UPDATE MyTable SET Col1 = ‘x’
WHERE Col1 = ‘y’
IF @@ROWCOUNT <> 10
ROLLBACK TRAN
COMMIT TRAN
Col1
x
x
y
y
y
…
m
….
z
MyTable
1000 rows with Col1 = „y‟
99999999
What could be possibly
wrong here?
BEGIN TRAN
DELETE MyTable
COMMIT TRAN
Col1
x
x
y
y
y
…
m
….
z
MyTable
7 million rows
T-Log size
Concurrency
How do we “solve” this ?
What if we have a WHERE clause in the DELETE ?
100
Transaction Habits
As short as possible
Long transactions:
Reduce concurrency
Blocking and deadlocks more likely
Excess space in transaction log to not be
removed.
T-log IO
No “logical” ROLLBACKS!
101101
Triggers…
102102102102
What is wrong here?
CREATE TRIGGER TRG_MyTable_UP
ON MyTable
AFTER INSERT
AS
UPDATE MyTableSET InsertDate = getdate()
FROM MyTable
INNER JOIN inserted ON MyTable.PK = inserted.PK
PK Insert
Date
1
5
13
67
89
…
1234
….
345667
MyTable
103
Typical Trigger Applications
• Cascading modifications through related tables
• Rolling back changes that violate data integrity
• Enforcing restrictions that are too complex for rules or constraints
• Maintaining duplicate data
• Maintaining columns with derived data
• Performing custom recording
• Try to use constraints instead of triggers, whenever possible.
104104
Tables Design Issues…
105
Column name Type Property Key/index
Employee ID Int NOT NULL
Identity (values are unique)
Clustered
First Name Char(100) NOT NULL
Last Name Char(100) NOT NULL
Hire Date Datetime NULL
Description Varchar(8000) NULL
ContractEndDate Char(8) NOT NULL Index
SelfDescription Varchar(8000) NOT NULL default „‟
Picture Image NULL
Comments Text NULL
Application rules:
All queries fetch EmployeeID , FirstName, LastName and HireDate WHERE EmployeeIDequals or BETWEEN two values, where ContractEndDate >= getdate()
All other column are fetched only when user drills down from application
FirstName, LastName, HireDate and ContractEndDate rarely change
Comments , Description And SelfDescription are rarely filled up and they never appear in the WHERE clause
Picture column is ALWAYS updated after row already exists.
Once the contract ends, the data should be saved but will not be queried by application
Employees
table
106106
Column name Type Property Key/index
Employee ID Int NOT NULL
Identity (values are unique)
Clustered
First Name Char(100) NOT NULL
Last Name Char(100) NOT NULL
Hire Date Datetime NULL
Description Varchar(8000) NULL
ContractEndDate Char(8) NOT NULL Index
SelfDescription Varchar(8000) NOT NULL default „‟
Picture Image NULL
Comments Text NULL
Clustered
UNIQUE
Varchar(100)
Varchar(100)
Datetime
Varbinary(MAX)
Varchar(MAX)
NULL
First…
107107107
Column name Key/index
Employee ID Clustered PK
First Name
Last Name
Hire Date
ContractEndDate Index
Column name Key/index
Employee ID Clustered PK
Description
SelfDescription
Picture
Comments
Employees (active
employees)
Column name Key/index
Employee ID Clustered PK
First Name
Last Name
Hire Date
Description
ContractEndDate
SelfDescription
Picture
Comments
OldEmployees (inactive
employees)
4 different tables?
Employees details
This is vertical
partitioning…
1:1
108
Column name Type
Employee ID INT
First Name Varchar(100)
Last Name Varchar(100)
Hire Date Datetime
ContractEndDate Datetime
Column name Type
Employee ID INT
First Name Varchar(100)
Last Name Varchar(100)
Hire Date Datetime
ContractEndDate Datetime
Column name Type
Employee ID INT
First Name Varchar(100)
Last Name Varchar(100)
Hire Date Datetime
ContractEndDate Datetime
Contract Date < 2008-01-01
Contract Date >= 2008-01-01
and < 2009-01-01
Contract Date >= 2009-01-01
Horizontal partitioning
109109109
Tips for the application side…
110
Server-side cursors prior to .NET 2.0
Sorts and grouping on the client
End-user reporting
Default Transaction isolation levels
Intensive communication with database
Connection pooling
Long transactions
Ad-hoc T-SQL
SQL injection…
Beware of…
111111
Performance Audit Checklist
Does the Transact-SQL code return more data than needed?
Is the interaction between the application and the Database Server too often.
Are cursors being used when they don't need to be? Does the application uses server-side cursors?
Are UNION and UNION ALL properly used?
Is SELECT DISTINCT being used properly?
Is the WHERE clause SARGable?
Are temp tables being used when they don't need to be?
Are hints being properly used in queries?
Are views unnecessarily being used?
Are stored procedures and sp_executesql being used whenever possible?
Inside stored procedures, is SET NOCOUNT ON being used?
Do any of your stored procedures start with sp_?
Are all stored procedures owned by DBO, and referred to in the form of databaseowner.objectname?
Are you using constraints or triggers for referential integrity?
Are transactions being kept as short as possible? Does the application keep transactions open when the user is modifying data?
Is the application properly opening, reusing, and closing connections?
112
Questions/
Autographs
End of Part II…