comprehensive indexing via automated execution plan analysis (execstats) joe chang jchang6 @ yahoo ...

Comprehensive Indexing via Automated Execution Plan

Analysis (ExecStats) Joe Chang

jchang6 @ yahoo www.qdpma.com

Slide deck here

About Joe

• SQL Server consultant since 1999• Query Optimizer execution plan cost formulas (2002)• True cost structure of SQL plan operations (2003?)• Database with distribution statistics only,

– no data 2004• Decoding statblob-stats_stream

– writing your own statistics• Disk IO cost structure• Tools for system monitoring, execution plan analysisSee ExecStats http://www.qdpma.com/ExecStats/SQLExecStats.htmlDownload: http://www.qdpma.com/ExecStatsZip.htmlBlog: http://sqlblog.com/blogs/joe_chang/default.aspx

What is not in this session

• List of rules – to be followed blindly

• without consideration for the underlying reason

– and whether rule actually applies in the current circumstance

DBA skill: cause and effect analysis & assessment- Not enthusiastic, prolific, indiscriminate collector of rules

Why this topic on Indexing?

• Ideal – Good indexes, but no more than necessary

• Reality– Too many nonclustered indexes

• How did this happen?– Poor choice of cluster key (my opinion)– Indexes added one at a time for specific query

• Instead of modifying similar existing index

– DBA not inclined to remove indexes • Afraid to touch the cluster key

Indexes can be created or dropped for specific operations

Going from Current to Good Indexes

• Determine the better choice of cluster key– Can eliminate many nonclustered index by itself• Not hard to determine, but down time required

• Drop indexes not used (over long period)• Consolidate indexes – with leading keys in the same order– Determine if indexes with same keys in different

order can be consolidated• Need to find SQL that uses each index?

Simple

Does this involve work/risk?I don’t like work/risk!Execution plan links SQL to index usage

• Complexities & depth SQL performance– Cause and Effect

• Focus on the execution plan– Inefficient plans – missing indexes– very large estimate/actual row discrepancies

• Comprehensive Index Strategy– few good indexes, but no more than necessary

Preliminary: Correct Results

• Normalization– Data stored once, avoid anomalies

• Unique Keys– Avoid duplicate rows

• Foreign Keys– Avoid orphaned rows

Incorrect architecture requires use of SELECT DISTINCT etc. to correct architecture deficienciesWhich may cause performance problems as wellCorrect action is to address the architecture mistakes before the performance issue.

Performance Big Picture

Natural keys with unique indexes, not SQL

The Execution Plan links all the elements of performanceIndex tuning alone has limited valueOver indexing can cause problems as well

Index and Statistics maintenance policy

1 Logic may need more than one execution plan?

Compile cost versus execution cost?

Tables and SQL combined implement business logic

Plan cache bloat?

SQL Tableskeys & const

Indexes

Execution Plan

StatisticsSampling & Re-compute

Compile parameters &

variables

Storage Engine

Hardware

DOP MemoryParallel plans Recompile

temp table / table variable

Query Optimizer

Index & Stats Maintenance

API Server Cursors: open, prepare, execute, close?

SET NO COUNT Information

messages

Row estimate propagation errors

Indexing Principles

• Good cluster key choice– Grouping + unique, not too wide

• Good nonclustered indexes– For key queries, not necessarily every query – Covered indexes where practical– Create and drop custom indexes for maintenance ops/special circum.

• No more indexes than necessary– Update overhead– Compile overhead– May tolerate occasional scans to avoid update maintenance

Note emphasis on good, not perfect

Using DMVs – Execution Plan

dm_exec_query_stats

dm_exec_sql_text

dm_exec_query_plandm_exec_text_query_plan

dm_db_index_usage_statsdm_db_index_operational_statsdm_db_index_physical_stats

DBCC SHOW_STATISTICSSTATS_DATE(object_id, stats_id)dm_db_stats_properties

Execution PlanIndexes, joinsCompile parameters

System viewsIndexes, key columns,Include list, filter, XML,Columns store etc.

sys.dm_db_stats_properties, is available in SQL Server 2012 starting with Service Pack 1 and in SQL Server 2008 R2 starting with SP2. last_updated, rows, rows_sampled, steps, unfiltered_rows, modification_counter

dm_exec_query_profiles2014 Real time query progress?

Execution Plan maps SQL to Indexes

SQL IndexesExecution Plan

dm_db_index_usage_stats

dm_exec_query_statssql_handle & plan_handle

dm_exec_sql_text dm_exec_query_plandm_exec_text_query_plan

Performance Oriented Approach

• Getting Top SQL from dm_exec_query_stats– Manually examining top execution plans

• Index Reduction – dm_db_index_usage_stats– Drop unused indexes (based on long period)– Consolidating indexes with similar keys

– Infrequently used indexes?• Must hunt down SQL, possibly low item in query stats• Can it use another index?

Systematic Approach

• Get full list of:– stored procedures : schema + name– Scalar Functions (FN) schema + name– Inline & Table Valued Functions (IF,TF)• Need parameter list

– Triggers (should be obsolete, my opinion)• Generate execution plan – match to indexes– Alternative, maintain a list of SQL

Real World Example

July 20143.6B rows, 3.4TB, 1.5TB data 1.8TB indexesKey tables have 21, 8, 14, 4 and 14 nonclustered indexes

Index Reduction

Key tables have 6, 4, 3, 2 and 4 nonclustered indexes(some nonclustered indexes compressed)

Dec 2014 4.6B rows (30%), 2.3TB, 1.7TB data 0.5TB indexes

Compression – base tables

May 2015 5.7B rows, 0.99TB, 0.62TB data 0.36TB indexes

Dec 2014 4.6B rows, 2.3TB, 1.7TB data 0.5TB indexes

Table – 21 nonclustered Indexes

Note: All indexes are full rows, not filtered

Table with 21 NC IndexesJul 2014 5 frequently used NC indexes of 21

Dec – 6 Nonclustered Indexes

Infrequently used indexes could probably be removed by re-working the query

Jul 2014 5 frequently used NC indexes of 21

Dec 2014 4 frequently used, 2 Filtered IX, note lead column

May 2015 – 2 NC used

May 2015 3 frequently used NC, 1 disabled pending removal

Jul 2014 5 frequently used NC indexes of 21

Dec 2014 4 frequently used, 2 Filtered IX, note lead column

Note on Filtered Index Strategy

Query is SELECT xx FROM NARSplit WHERE IsActive = 1 AND CessionId IN (list)

Then the index strategy used is CREATE INDEX IX ON table (CessionId, xx, IsActive) WHERE IsActive = 1

Balancing Select vs Update

In the above table, the nonclustered index with several included columns nearly eliminates key lookups

One column, IsActive, in the include list was frequently updatedRemoving that column reduces need for update maintenance

Since a key lookup is needed anyways, may as well remove all include columns

Compression Notes

• Very high compression was achieved– Because all keys were 16-byte GUID– Even on dimensions, when natural key would have

been 1, 2 or 4 bytes!• Core data + indexes – 648GB data 230GB indexes w/o compression– 174GB data 185GB indexes w/compression

• Reduction in I/O even with SSD storage far outweigh compression overhead!

System memory: 256GB (220) Storage: Violin (NAND Flash)

Violin

Compression& statistics

Indexreduction

HDD StorageGDC

Index Theory - Locality

• Example Database– 10 B rows, 80 bytes per row, 800GB, 100M pages• (100M x 8KB/page = 800GB, 100 rows per page)

• Suppose 1% of rows are active, i.e. 100M rows– There could be 1 active row in each page (100M)• Possible if each table were clustered on a row guid

– Possible for active rows to be in only 1M pages• All rows in each of these page happen to be active

• Build the cluster key to tend toward 2nd option

Index Key SARG + Group/Order

SELECT TransactionType, SUM(Amount)FROM Table WHERE ReportDate = ‘value’AND other SARGSGROUP BY TransactionType

Index should lead with the key SARG, then Group or Order, less selective SARGs can be in Include List

Selective SARG

Grouping

Index Key SARG + Order By

SELECT TransactionType, SUM(Amount)FROM Table WHERE ReportDate = ‘value’AND other SARGSGROUP BY TransactionType

Index should lead with the key SARG, then Group or Order, less selective SARGs can be in Include List

Selective SARG

Grouping

Index Example w & w/o PartitioningWithout partitioningRequired indexes leading with:1) ReportDate - for grouping2) RowId - for single row access

With partitioningPartition asCREATE UNIQUE CLUSTERED INDEX UCX ON Table (RowId, ReportDate) ON psdate (ReportDate)

Partition key is not lead key of cluster indexSearch on RowId must check each partition

Oracle Skip-Scan would be nice

ExecStats

Database view

File IO view

Table view columns

Indexes - continued

Number of execution plans that reference the index in Seeks, Scans, Lookups, Insert/Updates and Deletes

Literal identifying the execution plans that reference the index in Seeks, Scans, Lookups, Insert/Upd & Deletes

Query Execution Stats - 1

Query Execution Stats - more

Dataspace – Partition Scheme view

Partition View

Procedure and Functions

• Columns• Dbid, schema, object, object_id, type,• Create date, modify date, • Number of references (NumRef) • (literal) plan reference (from QExec Stats)• Caller reference (Functions only)

Volumes

Slides not used

Performance Strategy

• Tables – support business logic– Normalization, uniqueness etc.

• SQL – clear SARG, Query optimizer interpretable– 1 Logic maps to X Execution plans

• Indexes – good cluster key choice– Good nonclustered indexes, no more than necessary

• Statistics – sample strategy & update frequency• Compile parameter strategy• Temp table / Table variable strategy: Recompile & Row est. prop. error• Parallel execution plans: DOP and CTOP strategy

Identity key / alternative: large & small customers

• Identify (weight) important SQL statements– stored procedure: parameter values & code path• Recompile impact for temp tables

• Execution plan cross references SQL & indexes– Actual plan is better than estimate plan– Compile parameters & skewed statistics• Temp tables - Recompile impact

Automate Execution Plan analysis to fully cross-reference SQL to index usage

SQL & Execution Plan Sources

• Estimated Execution Plan– dm_exec_query_stats• Contents of plan cache + execution statistics

– List of stored procedures• SELECT name FROM sys.procedures

• Any SQL list– Plans not in cache, to be generated– Can also execute SQL for actual plans

sys.dm_exec_query_stats

• sql_handle – token for batch or stored procedure

• statement_start_offset – sql_handle + offset = SQL statement

• plan_handle – SQL (batch) can have multiple plans on recompile

• query_hash – identify queries with similar logic, – differing only by literal values

sys.procedures

• Get list of stored procedures in database– functions are called from procedure?

• Generate estimated execution plan for each– Default parameters

• Full map of index usage to stored procedure• No trigger details in estimated plan

SQL List

• Configuration file has SQL to retrieve SQL list– Can be • explicit SQL • or stored procedures with parameters

– Same procedure, multiple parameter set• To expose different code path (actual plan)• EXEC proc WITH @P1 RECOMPILE (estimated plan)

About ExecStats• General information

• Execution plan sources1. dm_exec_query_stats2. list of all stored procedures (estimated)3. List of SQL in table (estimated or actual plan)4. Trace file

• Correlates execution plans to index usage• Procedures, functions and triggers• Rollup file IO stats by DB, filegroup, disk/vol, data/log• Distribution Statistics• Output to Excel, sqlplan file, (sql in txt file)

ExecStats Output Files

• Txt – runtime info• Log – abbreviated SQL error logs• Excel –• Missing Indexes DMV • SQL plan directory

This can be sent to someone who can identify and fix your problem

Important Items

• Query cost – plan efficiency? Recompiles?– Compile parameters – skewed statistics

• CPU versus Duration (worker – elapsed time)– Disk IO, network transmission, parallel plan?

• Execution count – network roundtrip?• Plan cost – Parallelism– High volume of quick queries is bad, so is excessive DOP

• Index – current rows, rows at time stats generated, sample rows & date

Execution Plans estimate - actual

• Actual: estimated cost, actual rows, DOP– Compile parameters– Actual rows/executions versus estimated

Execute stored procedure once for each possible code path – with appropriate parameters

Execution Plans Analysis

• Predicate– index key columns does not matching full SARG– SQL has function on SARG, data type mismatch– Compile parameters & statistics– Actual and Estimated rows/execution mismatch– Large table scans: how many rows output?– Rebinds and Rewinds – key lookup– Parallelism

Execution Plans

• Pay attention to:– Compile parameters– Large table scans: how many rows output?– Predicate • search condition without suitable index

– Rebinds and Rewinds – key lookup– Parallelism

Index Usage – missing IX, excess IX?

• Index usage – seek, scan, lookup & update– Unused indexes (infrequent code?) can be dropped

– Infrequent usage: check plan references– Similar indexes (leading keys)• Same keys, different order• Check plan reference – consolidate if possible

• Scans to large tables or even nonclustered IX– Is it real (SELECT TOP 1 may not be a real scan)

• Lookups – can these be reduced?

SQL Server Skills & Roles

DevelopersSQL code

ArchitectTable structure,

unique keys

Data Architect normalization

DBAIndex + Statistics

MaintenanceHardware & Storage

Performance

SQL Server Performance History

• Before DMVs (SQL Server 2000)– Profiler/Trace to get top SQL– Execution plans – not really exportable– Which indexes are actually used?

• Today– Trace/Extended Events sometimes not necessary• If the dm_exec_query_stats content is good

– Execution plans are exportable– Index Usage Stats

How much can be automated?

• Data collection all, of course– Top resource consumers, etc.

• Assessment sometimes– Is there a problem– Can it be fixed or improved

• Fix/Change sometimes– Indexes– SQL – sometimes– Table structure, architecture no

If problems could be solved by pushing a button, what would be the skill requirements to be a DBA?

Great accomplishments – 99% perspiration 1% inspiration

Performance Approaches

• Check against list of “Best Practices”• Manual DMV scripts approach– Find Top 5 or 10 SQL– Fix it if/when there is a problem

• All Indexes and procedures/SQL– Examine the complete set of stored procedures– Or the full list of SQL statements– Good indexes for all SQL, no more indexes than

Why bother when there are no problems?

• No problems for over 1 year– Never bothered to collect performance baseline

• Problem Today – Find it with DMV, fix it – the problem was xxx– but why did it occur today & not before?

• Probably statistics or compile parameters, but prove it?

• Why ExecStats– SQL scripts? – too much manual work– Third party tools? – only find problem

Rigorous Optimization

• Table structure, SQL, Client-side• Cluster Key• Good (nonclustered) Indexes– All indexes are actually used

• No more indexes than necessary– Consolidate similar indexes

• same keys, same order, or reverse order?

– What SQL is impacted?• Statistics update• Index maintenance

Must consider the full set of SQL/procedures in removing indexes?

SQL versus programming languages

• SQL – great for data access– Not good for everything else– When SQL becomes horribly complicated– What would the code looks like in VB/Java/Cxx

Client-side program C#

Performance Information

• Server, Storage• OS & SQL Server Settings• SQL Server– SQL, query execution statistics, execution plan– Compile parameters– Indexes and index usage statistics– Statistics sampling – when? percentage? skew?

comprehensive indexing via automated execution plan analysis (execstats) joe chang jchang6 @ yahoo ...

unique indexes

nonclustered indexes

execution plan links

execution cost

opinion indexes

indexes afraid

cluster key indexes

ideal good indexes

Documents

indexing & retrieval. approaches to indexing key word...

indexing techniques for multimedia databases multimedia...

quantitative performance analysis joe chang...

global indexing & etfs - watchtower wayback...meetup drupal...

yahoo. com ar. soufi yahoo. com nezamdehnavi

tpc-h studies joe chang jchang6@yahoo.com

storage performance for data warehousing joe chang...

sql server scaling on big iron (numa) systems joe chang...

exceptional performance evolution at yahoo! steve souders...

insert, update & delete performance joe chang...

multidimensional indexing: spatial data management & high...

1312 ieee transactions on knowledge and data...

multidimensional indexing: spatial data management & high...

yahoo vs yahoo

yahoo! my yahoo! mail make y! your

sql server 2005 performance enhancements for large queries...

【fy17 q3-q4】yahoo!ダイレクトオファー...

new special machines 512 october 2019 · 2020. 2. 6. ·...

dri2020 - universitetet i oslo...semantic classification of...

file processing - indexing mvnc1 indexing jim skon