advanced sql programming for sql server for performance . ... •performance management •emergency...
TRANSCRIPT
Advanced SQL Programming for SQL Server
Indexing for Performance
About Soaring Eagle
Since 1997, Soaring Eagle Consulting has been helping enterprise clients improve their overall system performance at the database tier, arguably the most volatile and critical component of distributed application architecture. Our clients range in size from startups through Fortune 100 companies and leading financial institutions. Soaring Eagle has been a leader in managed services, database architecture, performance and tuning databases, while promoting mentoring and training all over the world for over a decade and a half. Many of our employees, have written books, and often speak at seminars about leading edge technologies. We have expertise in all business tiers, financial; health, manufacturing, government agencies and many ecommerce businesses. We understand that for all business types and sizes PERFORMANCE MATTERS
Consulting • Performance & Tuning • Data Performance
Management • Emergency Triage • Performance & Security
Audits • Staff Augmentation • Project management • Database architecture • Scalability assessment and
planning
Training • Onsite/Web based
• Microsoft • Sybase • Oracle • APM • Six Sigma
Software • Application Performance
Management • Database performance
management • Database Security
Managed Services • Remote Database
Management • Performance management • Emergency db Service • Proactive mitigation • Problem notification • Problem resolution
2 - 60 © Soaring Eagle Consulting 8/19/2013
Acknowledgements
• Microsoft SQL server, SSMS are trademarks of Microsoft Inc.
• This presentation is copyrighted by Soaring Eagle Consulting, August 7, 2013
• This presentation is not for re-sale
• This presentation shall not be used, modified, or redistributed without express written consent of Soaring Eagle Consulting, Inc.
3 - 60 © Soaring Eagle Consulting 8/19/2013
Topics
• Examine detailed topics in query optimization
• Indexes with SARGs
• Improvised SARGs
• Clustered vs. nonclustered indexes
• Queries with OR
• Index covering
• Forcing index selection
• Filtered Indexes
4 - 60 © Soaring Eagle Consulting 8/19/2013
SQL Server Search Techniques
• SQL Server uses three basic search
techniques for query resolution
– Table Scans
– Index Searches
– Covered Index Searches
5 - 60 © Soaring Eagle Consulting 8/19/2013
Table Scans
• If SQL Server can’t resolve a query any other way, it does a table scan – Scans are expensive
– Table scans may be the best way to resolve a query
– If there is a clustered index on the table, SQL Server will try and use it instead of performing a table scan
Table Scan Search
select * from pt_tx where id = 1
6 - 60 © Soaring Eagle Consulting 8/19/2013
Table Scans (Cont’d)
Query Plan
• Verify table scans with: – set statistics io on
Table 'pt_tx'. Scan count 1, logical reads 38, physical reads 0, read-ahead reads 0
7 - 60 © Soaring Eagle Consulting 8/19/2013
Table Scan Output: Update
showplan
update pt_tx set id = id + 1
8 - 60 © Soaring Eagle Consulting 8/19/2013
Index Selection
Topics
– Optimizer selection criteria
– When indexes slow access
– Index statistics and usage
9 - 60 © Soaring Eagle Consulting 8/19/2013
Optimizer Selection Criteria
• During the index selection phase of optimization the optimizer decides which (if any) indexes best resolve the query
• Identify which indexes match the where and join clauses
• Estimate rows to be returned
• Estimate page reads
10 - 60 © Soaring Eagle Consulting 8/19/2013
SARG Matching
• Indexes usually correspond with Search Arguments
• Useful indexes will specify a row or rows or set boundaries for the result set
• An index may be used if any column of the index matches the SARG
11 - 60 © Soaring Eagle Consulting 8/19/2013
where dob between '3/3/1941' and '4/4/65'
create unique index nci on authors
(au_lname, au_fname)
SARG Matching (Cont’d)
Which of the following queries (if any) could be helped by the index?
If there are not enough rows in the table, indexes that look useful may
never be used
12 - 60 © Soaring Eagle Consulting 8/19/2013
select * from authors
where au_fname = 'Jim' and au_lname = 'Smith'
select * from authors
where au_fname = 'Jim'
select * from authors
where au_lname = 'Smith' or au_fname = 'Jim'
create index nci on authors
(au_lname, au_fname)
Index Selection
• Review of index types
• Optimizer selection criteria
• When indexes slow access
• Index statistics and usage
Topics
13 - 60 © Soaring Eagle Consulting 8/19/2013
Index Types
• SQL Server provides three types of indexes
– Clustered
– Nonclustered
– Full text
• One clustered index per table
– Data is maintained in clustered index order
• 248 nonclustered indexes per table
– Nonclustered indexes maintain pointers to rows
• Full text is discussed in a later section
14 - 60 © Soaring Eagle Consulting 8/19/2013
Clustered Index Mechanism
• With a clustered index,
there will be one entry
on the last intermediate
index level page for
each data page
• The data page is the
leaf or bottom level of
the index
• (Assume a clustered
index on last name)
Houston
Exeter
Brown
Albert
Quincy
Mason
Jones
Albert
Loon
Klein
Jude
Jones
Paul
Parker
Neenan
Mason
Alexis, Amy, ...
Root Page
Intermediate PageData Page
Amundsen, Fred, ...
Baker, Joe, ...
Best, Elizabeth, ...
Albert, John, ...
Masonelli, Irving, ...
Narin, Mabelle, ...
Naselle, Juan, ...
Neat, Juanita
Mason, Emma, ...
...
...
...
15 - 60 © Soaring Eagle Consulting 8/19/2013
Non-clustered Index Mechanism
• The nonclustered index has an extra, leaf level for page / row pointers
• Data placement is not affected by non-clustered indexes
• (Assume an NCI
on first name)
Dave
Bob
Amy
Zelda
Elizabeth
Elizabeth
GeorgeGeorge
Amy
...
...
...
...
...
...
Sam
Sam
Alexis, Amy, ...
Root Page
Intermediate Page
Data Page
Amundsen, Fred, ...
Baker, Joe, ...
Best, Elizabeth, ...
Albert, John, ...
Masonelli, Irving, ...
Narin, Anabelle, ...
Naselle, Amy, ...
Neat, Juanita
Mason, Emma, ...
Zelda
...
...
...
Amy
Amy
...
...
Emma
...
Leaf Page
Anabelle
...
16 - 60 © Soaring Eagle Consulting 8/19/2013
Clustered vs. Nonclustered
• A clustered index tends to be 1 I/O faster than a nonclustered index for a single-row lookup
• Clustered indexes are excellent for retrieving ranges of data
• Clustered indexes are excellent for queries with order by
• Nonclustered indexes are a bit slower, take up much more disk space, but are frequently the next best alternative to a table scan
• Nonclustered indexes may cover the query for maximal retrieval speed
– For some queries; covered queries, nonclustered indexes can be faster
NOTE:
– When creating a clustered index, you need free space in your database approximately equal to 120% of the total table size
– ALWAYS have a clustered index on all of your tables
17 - 60 © Soaring Eagle Consulting 8/19/2013
Using Indexes
Clustered Index Indications
• Columns searched by range of values
• Columns by which the data is frequently sorted (order by or group by)
• Sequentially accessed columns
• Join columns (if other than the primary key)
• Static columns (why?)
Nonclustered Index Indications
• NCI selection tends to be much more effective if less than about 20% (well, closer to 5% really) of the data is to be accessed
• NCIs help sorts, joins, group by clauses, etc., if other column(s) must be used for the CI
• Index covering
18 - 60 © Soaring Eagle Consulting 8/19/2013
Other Index Limitations
• Maximum 16 columns
• Maximum 900 bytes index width
Note: – “Include” columns do not count toward limitations
19 - 60 © Soaring Eagle Consulting 8/19/2013
Primary Key vs. Clustering vs. Nonclustering
• A primary key is a logical concept, not a physical concept
• Indexes are physical concepts, not logical concepts
• There is a strong correlation between the logical concept of a key and the physical
concept of an index
• By default, when you define relationships as part of table design, you will build
indexes to support the joins / lookups
• By default, when you define a primary key, you will create a unique clustered
index on the table
– Unique is good, clustered isn’t always good
• When you define a clustered index, the server automatically appends the key
column(s) (plus a unique identifier, if necessary) to the nonclustered indexes
20 - 60 © Soaring Eagle Consulting 8/19/2013
Key / index features
• Columns that are not part of the index key can be included in nonclustered indexes. Including the nonkey columns in the index can speed queries (Index covering) and can exceed the current index size limitations of a maximum of 16 key columns and a maximum index key size of 900 bytes
• The new ALLOW_ROW_LOCKS and ALLOW_PAGE_LOCKS options in CREATE INDEX and ALTER INDEX can be used to control the level at which locking occurs for the index
• The query optimizer can match more queries to indexed views than in previous versions, including queries that contain scalar expressions, scalar aggregate and user-defined functions, interval expressions, and equivalency conditions
• Indexed view definitions can also now contain scalar aggregate and user-defined functions with certain restrictions.
– (More on “Views”)
21 - 60 © Soaring Eagle Consulting 8/19/2013
Optimizer Selection Criteria
• During the index selection phase of optimization the
optimizer decides which (if any) indexes best resolve
the query
– Identify which indexes match the clauses
– Estimate rows to be returned
– Estimate page reads
22 - 60 © Soaring Eagle Consulting 8/19/2013
Index Selection Examples
1. What index will optimize this query?
2. What indexes optimize these queries?
3. In the second query, what would the net effect be of changing the range
to this?
23 - 60 © Soaring Eagle Consulting 8/19/2013
select title
from titles
where title = ‘Stranger in a Strange Land’
select title
from titles
where price between $5. and $15.
between $500 and $600
CI vs. NCI
Table facts:
2,000,000 titles (= 14492 pages)
138 rows / page
1 million rows in the range
24 - 60 © Soaring Eagle Consulting 8/19/2013
Index used Page reads
Clustered index 7,247 + index levels
Non-clustered index (worst case) 1,000,000 + index
pages
No index (table scan) 14492
select title from titles where price between $5. and $15.
CI vs. NCI
• It is feasible, occasionally likely, that a table scan is faster than using a nonclustered index for specific queries
• The server evaluates all options at optimization time and selects
the plan that returns the query fastest
25 - 60 © Soaring Eagle Consulting 8/19/2013
Or Indexing
Questions
– What indexes should (could) be used?
– Will a compound index help?
– Which column(s) should be indexed?
26 - 60 © Soaring Eagle Consulting 8/19/2013
select title
from titles
where price between $5. and $10.
or type = 'computing'
Or Indexing (Cont’d)
– How is the following query different (from a processing standpoint)?
– What is a useful index for?
27 - 60 © Soaring Eagle Consulting 8/19/2013
select title
from titles
where price between $5. and $15.
and type = 'computing'
select *
from authors
where au_fname in ('Fred', 'Sally')
Or Clauses
Format
28 - 60 © Soaring Eagle Consulting 8/19/2013
select *
from authors
where au_lname = 'Smith'
or au_fname = 'Fred'
select *
from authors
where au_lname in ('Smith', 'Jones', 'N/A')
(How many indexes may be useful?)
SARG or SARG
Or Strategy
• An or clause may be resolved via a table scan, a multiple match index or
using or strategy
Table Scan
• Each row is read, and criteria applied
• Matching rows are returned in the result set
• The cost of all the index accesses is greater than the cost of a table scan
• At least one of the clauses names a column that is not indexed, so the only
way to resolve the clause is to perform a table scan
29 - 60 © Soaring Eagle Consulting 8/19/2013
Or Strategy (Cont’d)
Multiple match index
• Using each part of the or clause, select an index and retrieve the row
• Only used if the results sets can not return duplicate rows
• Rows are returned to the user as they are processed
30 - 60 © Soaring Eagle Consulting 8/19/2013
Or Strategy (Cont’d)
Or Strategy
• Each component of the or is retrieved
• Result set is merge-joined (think “union”) to eliminate duplicates
• Distinct rows are returned to the client
31 - 60 © Soaring Eagle Consulting 8/19/2013
Index Selection and the Select List
Questions
– What is the best index?
– Do the columns being selected have a bearing on the index?
32 - 60 © Soaring Eagle Consulting 8/19/2013
select * from publishers where pub_id = 'BB1111'
Index Selection and the Select List
Question
– Should there be a difference between the utilization of the following two indexes?
select royalty
from titles
where price between $10 and $20
create index idx1 on titles (price) /* or */ create index idx2 on titles (price, royalty)
33 - 60 © Soaring Eagle Consulting 8/19/2013
Composite Indexes
• Composite (compound) indexes may be selected by the server if the
first column of the index is specified in a where clause, or if it is a
clustered index
create index idx1
on employee (minit, job_id , job_lvl)
34 - 60 © Soaring Eagle Consulting 8/19/2013
Composite Indexes (Cont’d)
Which queries may use the index?
select * from employee
where minit = 'A'
and job_id != 4
and job_lvl = 135
select * from employee
where job_id != 4
and job_lvl = 135
select *
from employee
where minit = 'A'
and job_lvl = 135
create index idx1
on employee (minit, job_id , job_lvl)
35 - 60 © Soaring Eagle Consulting 8/19/2013
Final Exam Question
• CI or NCI on type
• CI or NCI on price
• One index on each of type & price
• Composite on type, price
• Composite on price, type
• CI or NCI on type, price, pub_id, title, notes
Which are the best options in which circumstances?
select pub_id, title, notes
from titles
where type = 'Computer'
and price > $15.
36 - 60 © Soaring Eagle Consulting 8/19/2013
Index Usefulness
• It is imperative to be able to estimate rows returned for an
index. Therefore, the server will estimate rows returned before
index assignation
• If statistics are available (When would they not be?) the server
estimates number of rows using the histogram or index density
• SQL Server automatically generates statistics about index key
distributions using efficient sampling algorithms
• If you have an equality join on a unique index, the server knows
only one row will match and doesn't need to use statistics
• The database engine tuning advisor can analyze a query and
recommend indexes
• The more selective an index is, the more useful the index
37 - 60 © Soaring Eagle Consulting 8/19/2013
Data Distribution
You have a 1,000,000 row table.
The unique key has a range (and random distribution) of 0 to 10,000,000
Questions
• How many rows will be returned by the following query?
• How does the optimizer know whether to use an index or table scan?
select *
from table
where key between 1000000 and 2000000
38 - 60 © Soaring Eagle Consulting 8/19/2013
Index Statistics
• There is distribution data stored for every index
• The optimizer uses this information to estimate the number of rows returned
for a query
• The distribution information is built at index creation time and maintained by
the server if set to automatically do so during idle CPU cycles
– (this is the default, but most shops set up a job to do it periodically to make sure it happens)
39 - 60 © Soaring Eagle Consulting 8/19/2013
Viewing Index Statistics
Viewed with the dbcc show_statistics
Continued next page
dbcc show_statistics (table_name, index_name)
40 - 60 © Soaring Eagle Consulting 8/19/2013
Viewing Index Statistics (Cont’d)
Continued next page
41 - 60 © Soaring Eagle Consulting 8/19/2013
When to Force Index Selection
Don't Do it
– With every release of the server, the optimizer gets better at
selecting optimal query paths
– Forcing the optimizer to behave in a specific manner does not allow
it the freedom to change selection as data skews
– It also does not permit the optimizer to take advantage of new
strategies as advances are made in the server software
42 - 60 © Soaring Eagle Consulting 8/19/2013
When to Force Index Selection (Cont’d)
Exceptions
• When you (the developer) have information about a table that
SQL Server will not have at the time the query is processed
(i.e., using a temp table in a nested stored procedure)
• Occasions when you've proven the optimizer wrong
43 - 60 © Soaring Eagle Consulting 8/19/2013
How to Force Index Selection
Instead, identify why the optimizer picked incorrectly
select *
from titles (index(titleind)) join publishers
(index( UPKCL_pubind) )
on titles.pub_id = publishers.pub_id
44 - 60 © Soaring Eagle Consulting 8/19/2013
Filtered Indexes
• Essentially an index with a where clause
• A nonclustered index designed to cover queries for a specific subset of data
• The filter predicate is a where clause defined in the index definition; this predicate limits
the index to a subset of the rows in the table
• By filtering the index to a specific set, preferably a significantly smaller set, SQL Server
has a smaller index structure to parse, producing a faster queries and less index
overhead
• A well-designed filtered index can improve query performance, reduce index
maintenance costs, and reduce index storage costs compared with full-table indexes
45 - 60 © Soaring Eagle Consulting 8/19/2013
Design Considerations
• This is dependent on what queries are used by your application and
how it is queried
• Identify key subsets of data
• Some common subsets
– Columns with mostly NULL values
– Columns with heterogeneous categories of values
– Columns with distinct ranges of values
46 - 60 © Soaring Eagle Consulting 8/19/2013
Filtered Indexes for Subsets of Data
• When a column only has a very limited domain
• For example
– if you have a column with only 4 valid values
• A separate filtered index can be created for each of the values
• Each index will be smaller than a single index for the whole table
47 - 60 © Soaring Eagle Consulting 8/19/2013
Examples
• Here are some examples of filter predicates for the Production.BillOfMaterials
table (AdventureWorks):
• WHERE StartDate > '20000101' AND EndDate <= '20000630'
• WHERE ComponentID IN (533, 324, 753)
• WHERE StartDate IN ('20000404', '20000905') AND EndDate IS NOT NULL
48 - 60 © Soaring Eagle Consulting 8/19/2013
Identifying Missing Indexes
• Microsoft has implemented a data management view (DMV) to identify missing
indexes; that is, a list of indexes that SQL Server optimizer thinks will improve
performance if added
– sys.dm_db_missing_index_details
– sys.dm_db_missing_index_group_stats
– sys.dm_db_missing_index_groups
• For reference, see article: http://msdn.microsoft.com/en-us/library/ms345405.aspx
• Do not simply apply all index recommendations without some analysis:
– Make sure each index is recommended for a query which is running often
enough to merit an index
– Make sure there’s not a lot of overlap (there often is, especially with “include”
recommendations)
49 - 60 © Soaring Eagle Consulting 8/19/2013
Summary
• The optimizer uses indexes to improve query performance when possible
• All indexes are not the same
• Queries with OR may require a table scan
• Try to take advantage of covered queries
• Be careful when forcing an index
• Filtered indexes: Cool new toy for SQL 2008
50 - 60 © Soaring Eagle Consulting 8/19/2013
Syntax
• CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
– ON <object> ( column [ ASC | DESC ] [ ,...n ] )
– [ INCLUDE ( column_name [ ,...n ] ) ]
– [ WHERE <filter_predicate> ]
– [ WITH ( <relational_index_option> [ ,...n ] ) ]
– [ ON { partition_scheme_name ( column_name ) | filegroup_name | default } ]
– [ FILESTREAM_ON { filestream_filegroup_name | partition_scheme_name | "NULL" } ]
• <relational_index_option> ::= { – PAD_INDEX = { ON | OFF } |
– FILLFACTOR = fillfactor |
– SORT_IN_TEMPDB = { ON | OFF } |
– IGNORE_DUP_KEY = { ON | OFF } |
– STATISTICS_NORECOMPUTE = { ON | OFF } |
– DROP_EXISTING = { ON | OFF } |
– ONLINE = { ON | OFF } |
– ALLOW_ROW_LOCKS = { ON | OFF } |
– ALLOW_PAGE_LOCKS = { ON | OFF } |
– MAXDOP = max_degree_of_parallelism |
– DATA_COMPRESSION = { NONE | ROW | PAGE}
51 - 60 © Soaring Eagle Consulting 8/19/2013
Lab: Indexes vs. Table Scans
1. Use execution plans to observe what ranges of values tend to use an index
versus a table scan to resolve this query.
2. Retrieve and analyze the statistics for the index NCkey2.
3. What indexes can be added to improve the query performance?
4. Add the indexes and check the query plan.
5. Change street1 in the query to key2. How does this affect your query plan?
select count(street1)
from pt_sample_NCkey2
where key2 between ? and ?
52 - 60 © Soaring Eagle Consulting 8/19/2013
Jeff Garbus – Email me for a copy! [email protected]
813.641.3434
mssqlperformance.blogspot.com
http://www.youtube.com/user/soaringeagledba/
Microsoft Transact - SQL, The Definitive Guide
– 35% off from jblearning.com. Use code "GARBUS"
Upcoming Webinars
Check our web site: www.soaringeagle.guru
Find us on Social Media @SoaringEagleDBA
Like us on Facebook: Facebook.com/SoaringEagleDBA
Thank You! – Questions?
53 - 60 © Soaring Eagle Consulting 8/19/2013