isel-deetc-ssti - lara santos application performance 1 application performance (based on c....

81
ISEL-DEETC-SSTI - Lara Sant os Application Performance 1 Application Performance Application Performance (based on C. Mullins, Database administration) (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

Upload: bernard-gilmore

Post on 30-Dec-2015

230 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 1

Application PerformanceApplication Performance(based on C. Mullins, Database administration)(based on C. Mullins, Database administration)

ISEL-DEETC-SSTI

Lara Santos

Page 2: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 2

Designing Applications for Relational Access

Design issues to examine when application performance suffers:

• Type of SQL• Programming language• Transaction design and processing• Locking strategy• COMMIT strategy• Batch processing• Online processing

Page 3: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 3

Relational Optimization

• The DBA must be familiar with the optimization techniques used by each DBMS in the organization

• Developers must code efficient SQL and understand how to optimize SQL

• The optimizer is the heart of relational database management system.

Page 4: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 4

The optimizer

The optimizer is an inference engine reponsible for determining the best possible database navigation strategy for any given SQL request

WHAT WHERE

HOW

developer DBMS

optimizer

Page 5: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 5

Relational Optimization

Page 6: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 6

• Relational optimization is very powerful because it allows queries to adapt to a changing database environment.

• Regardless of how the data is physically stored and manipulated, SQL can be used to access data, and the DBMS will take the current state of the database into account to optimize that data access.

• This separation of access criteria from physical storage characteristics is called physical data independence.

Page 7: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 7

• Modern relational optimizers are cost based, meaning that the optimizer will attempt to formulate an access path for each query that reduces overall cost.

• To function in this manner, the optimizer must evaluate and analyze multiple factors, including estimated CPU and I/O costs, database statistics, and the actual SQL statement.

Page 8: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 8

CPU and I/O costs

• Based on CPU information, the optimizer can arrive at a rough estimate of the CPU time required to run the query using each optimized access path it analyzes.

• The optimizer estimates the cost of I/O to the query by using a series of formulas based on the database statistics, the data cache efficiency, and the cost of I/O to intermediate work files

Page 9: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 9

Database Statistics

• A relational optimizer is of little use without accurate statistics.

•• A relational DBMS provides a utility program or

command to gather statistics about database objects and to store them for use by the optimizer (or by the DBA for performance monitoring)

• to collect statistics in SQL Server the UPDATE STATISTICS command is issued.

Page 10: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 10

Database Statistics• The DBA should collect modified statistics whenever a

significant volume of data has been added or modified

• The DBMS collects statistical information such as:• Number of rows in the tablespace, table, or index • Number of unique values stored in the column • Most frequently occurring values for columns • Index key density • Details on the ratio of clustering for clustered tables • Correlation of columns to other columns • Structural state of the index or tablespace • Amount of storage used by the database object

• Create a script to populate production statistics into the test system.

Page 11: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 11

Query Analysis

• The query analysis scans the SQL statement to determine its overall complexity

• The complexity of the query, the number and type of predicates, the presence of functions, and the presence of ordering clauses enter into the estimated cost that is calculated by the optimizer

Page 12: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 12

During query analysis…• … , the optimizer analyzes aspects of the SQL

statement and the database system, such as:

Which tables in which database are required • Whether any views are required to be broken down into underlying

tables • Whether table joins or sub selects are required • Which indexes, if any, can be used • How many predicates (WHERE clauses) must be satisfied • Which functions must be executed • Whether the SQL uses OR or AND • How the DBMS processes each component of the SQL statement • How much memory has been assigned to the data cache(s) used by

the tables in the SQL statement • How much memory is available for sorting if the query requires a sort

Page 13: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 13

Query Analysis

• query analysis breaks down the SQL statement into discrete tasks that must be performed to return the query results

• A large part of query analysis is index selection. After the optimizer determines the indexes available to be used for each predicate, it will decide whether to use a single index, multiple indexes, or no index at all.

Page 14: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 14

JOINS

• Combining information from multiple tables is known as joining

• When multiple tables are accessed, the optimizer figures out how to combine the tables in the most efficient manner

• The DBMS can utilize several different methods for joining tables

Page 15: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 15

DBMS must make several decisions and perform certain operations

1. The first decision is to choose the table to process first — this table is referred to as the outer table

2. Next, a series of operations are performed on the outer table to prepare it for joining

3. Then, rows from that table are then combined with rows from the second table, called the inner table

4. A series of operations are also performed on the inner table before the join occurs, as the join occurs, or both.

Page 16: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 16

Join methods

Although all joins are similar in functionality, each join method works differently behind the scenes. Let's investigate two common join methods:

the nested-loop join the merge-scan join.

Page 17: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 17

The nested-loop join… The nested-loop join works by comparing qualifying rows of the outer table to the inner table

1. A qualifying row is identified in the outer table, and then the inner table is scanned for a match

2. A qualifying row is one in which the predicates for columns in the table match

3. When the inner table scan is complete, another qualifying row in the outer table is identified

Page 18: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 18

…The nested-loop join

4. The inner table is scanned for a match again, and so on

5. The repeated scanning of the inner table is usually accomplished with an index to avoid undue I/O costs.

6. The smaller the size of the inner table, the better a nested-loop join performs, because fewer rows need to be scanned for each qualifying row of the outer table.

Page 19: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 19

The merge-scan join…

• In a merge-scan join, the tables to be joined are ordered by the keys

• This ordering can be accomplished by a sort or by access via an index

• After ensuring that both the outer and inner tables are properly sequenced, each table is read sequentially, and the join columns are matched

Page 20: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 20

…the merge-scan join

• During a merge-scan join, no row from either table is read more than once

• Merge-scan joins are useful when an appropriate index is not available on one (or both) of the tables.

Page 21: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 21

Join order

• The optimizer determines the optimal order in which the tables should be accessed to accomplish the join

• To find the optimal join access path, the optimizer uses built-in algorithms containing knowledge about joins and data volume

• It matches this intelligence against the join predicates, databases statistics, and available indexes to estimate which order is more efficient

• In general, the optimizer will deploy an algorithm that minimizes the number of times the inner table must be accessed for qualifying outer table rows

Page 22: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 22

Access Path Choices

• Joins

• Table Scans

• Indexed Access – Using Indexes to Avoid Sorts– Why Wasn't the Index Chosen?

• Hashed Access

• Parallel Access

Page 23: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 23

Table Scans

• Table scans are the simplest form of data access

• A table scan is performed simply by reading every row of the table

Page 24: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 24

Tablespace scan

• Depending on the DBMS, an alternate type of scan may exist, called a tablespace scan

• The tablespace scan reads every page in the tablespace, which may contain more than one table

• Obviously, a tablespace scan will run slower than a table scan because additional I/O will be incurred reading data that does not apply.

Page 25: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 25

Partition scan

• If the DBMS can determine that the data to be accessed exists in certain partitions of a multipartition table (or tablespace), it can limit the data that is scanned to the appropriate partitions

• A partition scan should outperform a table scan or tablespace scan because the amount of I/O required is reduced.

Page 26: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 26

The optimizer will choose to scan data for one of the following reasons:• The query cannot be satisfied using an index possibly

because no index is available, no predicate matches the index, or the predicate precludes the use of an index.

• A high percentage of the rows in the table qualify. In this case, using an index is likely to be less efficient because most of the data rows need to be read anyway.

• The indexes that have matching predicates have low cluster ratios and are only efficient for small amounts of data.

• The table is so small that use of an index would actually be detrimental. For small tables, adding index access to the table access can result in additional I/O, instead of less I/O.

Page 27: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 27

Data prefetch

To assist the performance of a scan, the optimizer can invoke data prefetch

• Data prefetch causes the DBMS to read data

pages sequentially into the data cache even before they are requested

• Data prefetch is a read-ahead mechanism—when data scans get around to requesting the data, it will already exist in memory

Page 28: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 28

Data prefetch• particularly useful for table and tablespace scans • practical for any type of sequential data access

Whether is available, when and how it is used, depends on the DBMS

The optimizer may choose to deploy it when the access path is formulated, or the DBMS may choose to turn on data prefetch when the query is being run.

As a DBA, you should learn how and why your particular DBMS prefetches data

Page 29: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 29

Indexed Access

• before the relational optimizer will use an index to satisfy a query, an appropriate index must already exist

• The DBMS is not capable of using an index for every WHERE clause

• You must learn what types of predicates can use indexes to ensure that the appropriate indexes are created for the queries in your database applications

Page 30: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 30

Indexed Access

• Every DBMS has a different list of what is, and what is not, indexable

• what is indexable tends to change from version to version of each DBMS.

• Types of indexed access:– direct index lookup – index scan

• matching index scan• nonmatching index scan

Page 31: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 31

Direct index lookup

• The relational optimizer can choose to use an index in many different ways

• The simplest type of indexed access is the direct index lookup.

Page 32: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 32

To perform a direct index lookup, the

DBMS initiates the following steps:

1. The value in the SQL predicate is compared to the values stored in the root page of the index. Based on this comparison, the DBMS will traverse the index to the next lowest set of pages.

2. The appropriate leaf page is read; the index leaf page contains pointer(s) to the actual data for the qualifying rows.

3. If intermediate nonleaf pages exist, the appropriate nonleaf page is read, and the value is compared to determine which leaf page to access.

4. Based on the pointer(s) in the leaf page index entries, the DBMS reads the appropriate table data pages.

Page 33: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 33

Exemplo: Direct index lookup

consider the following query :

SELECT last_name, first_name, middle_initial, empno

FROM employee WHERE position = 'MANAGER' AND work_code = 1AND dept = '001000';

assume that an index exists on the position, work_code, and dept columns

Page 34: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 34

Exemplo (conclusão)

• The DBMS can perform a direct index lookup using the index and the values supplied in the predicate for each of the columns

• For a direct index lookup to occur, all three columns must appear in the SQL statement

• If only one or two of these columns are specified as predicates, a direct index lookup cannot be chosen because the DBMS cannot match the full index key. Instead, an index scan could be chosen

Page 35: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 35

Index scans

• Index scans are similar to table and tablespace scans.

• When an index scan is invoked, the leaf pages of the index are read sequentially, one after the other.

• There are two basic types of index scans:– matching index scans – nonmatching index scans

Page 36: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 36

Index scan: matching index scan

• A matching index scan is sometimes called absolute positioning

• A matching index scan begins at the root page of an index and works down to a leaf page in much the same manner as a direct index lookup does

• However, because the complete key of the index is not available, the DBMS must scan the leaf pages of the index looking for the values that are available, until all matching values have been retrieved

Page 37: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 37

Matching index scan

Page 38: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 38

• For a matching index scan to be requested, you must specify the high-order column in the index key—in other words, for the first column specified in the index DDL.

• For the preceding example, the high-order column is the position column. The high-order column provides the starting point for the DBMS to traverse the index structure from the root page to the appropriate leaf page.

Page 39: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 39

Index scan: non-matching index scanConsider the consequences of not specifying the high-order column in the query. For example, suppose we take the original query and remove the predicate for position, but retain the other two, leaving the following SQL statement:

SELECT last_name, first_name, middle_initial, empno

FROM employee WHERE work_code = 1 AND dept = '001000';

assume that an index exists on the position, work_code, and dept columns

Page 40: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 40

Non-matching index scan

Page 41: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 41

• In such situations, the DBMS can deploy a nonmatching index scan, sometimes referred to as relative positioning.

• When a starting point cannot be determined because the first column in the index key is not specified, the DBMS cannot use the index tree structure. However, it can use the index leaf pages

• A nonmatching index scan begins with the first leaf page in the index and scans subsequent leaf pages sequentially, applying the available predicates.

Page 42: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 42

• A nonmatching index scan can be more efficient than a table or tablespace scan, especially if the data pages that must be accessed are in clustered order. Of course, a nonmatching index scan be done on a nonclustered index also

• Any of the above methods for indexed access can be used with both clustered and unclustered indexes.

Page 43: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 43

Using Indexes to Avoid Sorts.

• The DBMS may need to sort data to satisfy SQL requests.

• Sorting is quite cost prohibitive and should be avoided if possible.

• The DBA can use indexes to avoid sorts by creating them on the columns that need to be sorted.

Page 44: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 44

Sorting might occur when the following clauses are specified:

• DISTINCT: When this clause is specified the DBMS requires every column of the resulting data to be in order so that duplicate rows can be removed from the results set.

• UNION: This operation requires the columns in each SELECT list to be ordered because the results set can have no duplicate rows.

• GROUP BY: When this clause is specified, the DBMS requires data to be sorted by the specified columns in order to aggregate data.

• ORDER BY: When this clause is specified, the DBMS will ensure that the results set is sorted by the specified columns.

Page 45: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 45

• Consider the following SQL statement: SELECT last_name, first_name, middle_initial, empno,

position

FROM employee

WHERE position in ('MANAGER', 'DIRECTOR', 'VICE PRESIDENT')

ORDER BY last_name;

If an index exists on the last_name column, the query can use this index and avoid sorting.

Page 46: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 46

• Using an index to avoid a sort trades off the additional CPU cost required to sort for the additional I/O cost required for indexed access.

• Of course, if the index is going to be used anyway, the choice is a no-brainer.

• Whether or not using an index is actually faster than scanning the data and sorting will depend on • Number of qualifying rows • Speed of the sort • Index characteristics (e.g., clustered or nonclustered)

Page 47: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 47

Why Wasn't the Index Chosen?

• Situations sometimes arise where you think the optimizer should have chosen an index, but it didn't.

• Any number of reasons can cause the optimizer to avoid using an index.

Page 48: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 48

checklist for ways to encourage index selection.

• Does the query specify a search argument? If no predicate uses a search argument, the optimizer cannot use an index to satisfy the query.

• Are you joining a large number of tables? The optimizer within some DBMSs may produce unpredictable query plan results when joining a large number of tables.

• Are statistics current? If large amounts of data have been inserted, updated, and/or deleted, database statistics should be recaptured to ensure that the optimizer has up-to-date information upon which to base its query plans.

Page 49: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 49

• Are you using stored procedures? Sometimes the DBMS provides options whereby a stored procedure, once compiled, will not reformulate a query plan for subsequent executions. You may need to recompile or reoptimize the stored procedure to take advantage of up-to-date statistics, new indexes, or any other pertinent database changes.

• Are additional predicates needed? A different WHERE clause might possibly enable the optimizer to consider a different index.

Page 50: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 50

Hashed AccessA hash is similar in operation to a direct

index lookup. • The optimizer will also consider using any existing

hashing structures when formulating access paths. • A hash is similar in operation to a direct index lookup.

Hashes are most appropriate for random I/O of small amounts of data.

• To retrieve data based on a hashing algorithm, the DBMS uses a randomizing routine to translate the value supplied for the hash key to a physical location.

• This algorithm will give the offset of the row in the actual database table.

Page 51: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 51

Parallel Access

• The relational optimizer may choose to run queries in parallel. When query parallelism is invoked by the DBMS, multiple simultaneous tasks are invoked to access the data. Three basic types of parallelism can be supported by the DBMS: – I/O parallelism – CPU parallelism – system parallelism

Page 52: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 52

I/O parallelism

• I/O parallelism enables concurrent I/O streams to be initiated for a single query.

• Running parallel I/O tasks can significantly enhance the performance of I/O bound queries.

• Breaking the data access for the query into concurrent I/O streams executed in parallel can reduce the overall elapsed time for the query.

Page 53: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 53

CPU parallelism • CPU parallelism enables multitasking of CPU

processing within a query.

• Invoking CPU parallelism also invokes I/O parallelism because each CPU engine requires its own I/O stream.

• CPU parallelism decomposes a query into multiple smaller queries that can be executed concurrently on multiple processors. CPU parallelism can further reduce the elapsed time for a query.

Page 54: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 54

System parallelism

• DBMS can deploy system parallelism to further enhance parallel query operations.

• System parallelism enables a single query to be broken up and run across multiple DBMS instances.

• By allowing a single query to take advantage of the processing power of multiple DBMS instances, the overall elapsed time for a complex query can be decreased even further.

Page 55: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 55

• Ensuring that proper query plans are formulated with the correct index usage is a time consuming process, but one that can pay huge dividends in the form of enhanced performance.

• The DBA should train the application development staff to understand relational optimization and to create optimal SQL. Of course, the onus falls on the application developer to code efficient SQL and program logic. However, the DBA is the sentry of relational database performance.

• When performance problems occur, the DBA is the one who has to search for the cause of the problem and suggest remedies to resolve it. Furthermore, the DBA should conduct design reviews to seek out and tune inefficient SQL before suboptimal access paths and programs are migrated to production status.

Page 56: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 56

Additional Optimization Considerations

• The optimizer makes additional decisions regarding the manner in which data is accessed for SQL queries that will impact performance. Some additional optimization considerations:– View Access – Query Rewrite – Rule-Based Optimization

Page 57: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 57

View Access

• One of the decisions that must be made during query optimization is how to access data from views.

• When the optimizer determines the access path for the query containing the view, it must also determine how to resolve the view SQL. Keep in mind that both the view and the SQL accessing the view may reference multiple tables and additional views.

Page 58: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 58

Two methods can be used to optimize SQL that references views:

– view materialization – view merging

• The more efficient of the two methods is view merging.

• Each DBMS has its own set of rules that determine when view materialization must be used instead of view merging. Generally, column functions, or operations requiring sorts to be invoked, tend to require view materialization.

Page 59: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 59

view merging

• As the name implies, when view merging is deployed, the SQL in the view DDL is merged with the SQL that references the view.

• The merged SQL is then used to formulate an access path against the base tables in the views.

Page 60: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 60

view materialization• When the optimizer cannot combine the SQL in

the view with the SQL accessing the view, it creates an intermediate work file to hold the results of the view.

• The SQL accessing the view is then run against the work file that contains the view data.

• View materialization is not as efficient as view merging because data must be retrieved and stored in a temporary work file.

Page 61: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 61

Query Rewrite• Some relational optimizers are intelligent enough

to rewrite SQL more efficiently during the optimization process.

• For example, the optimizer might convert a subquery into an equivalent join.

• Alternatively, it might test out equivalent but different predicate formulations to determine which one creates the better access path.

Page 62: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 62

Example: Query Rewrite

• For example, since the following two predicates are equivalent, the optimizer may rewrite the query both ways to see which one produces the best access path:

WHERE column1 >= 1 AND column1 <= 100

WHERE column1 BETWEEN 1 AND 100

Page 63: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 63

Query Rewrite (cont.)

• Additionally, the optimizer may rewrite queries by creating inferred predicates.

• One example of this is a feature known as predicate transitive closure, in which the optimizer adds a predicate to the query to improve performance.

Page 64: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 64

Consider the following SQL statement:

SELECT d.dept_name, e.last_name, e.empno FROM employee e, department d WHERE e.deptno = d.deptno AND d.deptno = '808'; That SQL statement is functionally equivalent to the

following SQL statement:

SELECT d.dept_name, e.last_name, e.empno FROM employee e, department d WHERE e.deptno = d.deptno AND e.deptno = '808';

Page 65: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 65

• The only difference is the second predicate, but because deptno is the same in both tables (due to the first join predicate), it does not matter whether we check deptno from the employee table or the department table.

• However, it might make a difference in terms of performance. For example, an index might exist on one of the deptno columns, but not the other, or perhaps one of the tables is significantly larger than the other.

• A query is usually more efficient when the predicate is applied to the larger of the two tables because the number of qualifying rows will be reduced.

Page 66: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 66

• If the optimizer can perform predicate transitive closure, the SQL developer need not worry about this.

• The optimizer will consider the access path for both columns regardless of which is coded in the predicate. In essence, the optimizer will rewrite the query to add the redundant predicate.

• The DBA should find out whether the relational optimizer in use can perform any form of query rewrite.

• Additionally, the rules for what type of queries can be rewritten varies from DBMS to DBMS and optimizer to optimizer. For example, the optimizer may not be able to perform predicate transitive closure on some predicates, such as IN or LIKE clauses.

Page 67: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 67

Rule-Based Optimizationcost-based optimization ≠ rule-based optimization

• Most relational optimizers are cost based, meaning they base their access path formulation decisions on an estimation of costs. Lower-cost access paths are favored over costlier access paths.

• However, some DBMSs support a different type of optimization that is based on heuristics, or rules.

Page 68: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 68

• Rule-based optimizer bases its optimization decisions on SQL syntax and structure, placement of predicates, order of tables in the SELECT statement, and availability of indexes.

• With a rule-based optimizer, the SQL developer has to be aware of the rules as he writes SQL. Query performance can suffer simply by reordering columns in the SELECT list or tables in the FROM clause.

Page 69: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 69

• Cost-based optimization is the trend for DBMSs because SQL statements need not be coded following a set of esoteric "rules."

• An optimizer that estimates the cost of different access paths produces efficient query execution plans more reliably.

Page 70: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 70

Reviewing Access Paths

• The programmer or DBA can examine the access paths chosen by the relational optimizer. The commands and process used to accomplish this depend on the DBMS. Usually the command to externalize access paths is EXPLAIN or SHOWPLAN

• Analysis tools can make it easier for the DBA to interpret the access paths being used.

Page 71: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 71

• SHOWPLAN – SQL SERVER EXAMPLE

Page 72: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 72

Forcing Access Paths

• Some DBMSs allow you to force the use of specific access paths or the order in which tables are joined

• SQL Server example (not common practice!!!)

SELECT t1.a, t2.b FROM Tab1 t1, Tab1 t2 WHERE t1.a = t2.a

OPTION(LOOP JOIN)

Page 73: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 73

Forcing Access Paths• Techniques that force access path selection

criteria should be used with caution. It is usually better to let the relational optimizer choose the appropriate access paths on its own unless :

– You have in-depth knowledge of the amount and type of data stored in the tables to be joined

– You are reasonably sure that you can determine the optimal join order better than the optimizer

– Database statistics are not up-to-date, so the optimizer is not working with sufficient information about the database environment

Page 74: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 74

• Alternative methods are available to encourage the optimizer to select different access paths.

• The general method of encouraging access path selection is to modify the SQL based on in-depth knowledge of the relational optimizer. This is sometimes called tweaking SQL.

• Since the optimizer within each DBMS is very different, few SQL tweaks are useful across multiple DBMSs.

• The DBA must learn the fundamentals of SQL tuning and the types of tweaking that make sense for each DBMS that he manages. Furthermore, whenever such tweaks are deployed, be sure to fully document the reason.

Page 75: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 75

SQL Coding and Tuning for Efficiency

• SQL tuning is a complex, time consuming, and error-prone process.

• Coding and tuning SQL is one of the most time consuming DBA tasks.

• It requires cooperation and communication between the business users and application programmers for the first three steps, and between the application programmers and the DBA for the remaining steps.

Page 76: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 76

The DBA is responsible for ensuring that the following steps occur for each SQL

statement in the organization:1. Identify the business data requirements. 2. Ensure that the required data is available within existing

databases.3. Translate the business requirements into SQL. 4. Test the SQL for accuracy and results. 5. Review the access paths for performance. 6. Tweak the SQL for better access paths. 7. Code optimization hints. 8. Repeat steps 4 through 7 until performance is acceptable. 9. Repeat step 8 whenever performance problems arise or a new

DBMS version is installed. 10. Repeat entire process whenever business needs change.

Page 77: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 77

SQL Rules of Thumb • Some rules of thumb that apply generally to SQL

development regardless of the underlying DBMS:

– Rule 1: "It Depends!" – Rule 2: Be Careful What You Ask For – Rule 3: KISS – Rule 4: Retrieve Only What Is Needed – Rule 5: Avoid Cartesian Products – Rule 6: Judicious Use of OR – Rule 7: Judicious Use of LIKE – Rule 8: Know What Works Best – Rule 9: Issue Frequent COMMITs – Rule 10: Beware of Code Generators – Rule 11: Consider Stored Procedures

Page 78: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 78

Additional SQL Tuning Tips SQL tuning requires a full-length book of its own. SQL tuning is a complicated task. The following SQL tuning suggestions are useful for DBAs to apply, regardless of the DBMS:

• Use indexes to avoid sorting. • Create indexes to support troublesome queries. • Whenever possible, do not perform arithmetic in SQL

predicates. Use the host programming language…• Use SQL functions to reduce programming effort. • Build proper constraints into the database to minimize

coding edit checks. • Do not forget about the "hidden" impact of triggers. A

delete from one table may trigger many more operations. Although you may think the problem is a poorly performing DELETE, the trigger may be the true culprit.

Page 79: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 79

Identifying Poorly Performing SQL • A SQL performance monitor is the best approach to

identify poorly performing statements.

• A large part of the task of tuning SQL is identifying the offending code. A SQL performance monitor is the best approach to identify poorly performing statements. Such a tool constantly monitors the DBMS environment and reports on the resources consumed by SQL statements.

• Some DBMSs provide rudimentary bundled support for

SQL monitoring, but many third-party tools are available. These tools provide in-depth features such as the ability to identify the worst performing SQL without the overhead of system traces, integration to SQL coding and tuning tools, and graphical performance charts and triggers.

Page 80: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 80

Summary (1)

• Application performance management and SQL tuning is a complex area that requires the active participation of programmers and DBAs.

• Each DBMS operates differently, and DBAs as well as programmers will need to understand all of the minute details of SQL and application performance management for their DBMS.

Page 81: ISEL-DEETC-SSTI - Lara Santos Application Performance 1 Application Performance (based on C. Mullins, Database administration) ISEL-DEETC-SSTI Lara Santos

ISEL-DEETC-SSTI - Lara Santos Application Performance 81

Summary (2)• The relational optimizer combines access

path strategies to form an efficient access path for each SQL request.

• However, the optimizer is a very complex piece of software, and the DBMS vendors do not share with their customers all the intricacies of how the optimizer works.

• Therefore, quite often, SQL performance tuning becomes an iterative artistic process, instead of a science.