oracle awr trending - ioug select magazine -quarter 4 2010

6
Page 4 4th Qtr 2010 Oracle AWR (Automatic Workload Repository) Trending By Kapil Goyal T rending is very important for database performance analysis. It exposes the performance profile of a database in terms of IO, CPU usage, wait-event response time, etc. Starting in Oracle Database 10 g, AWR performance data that is collected and stored out of the box is very helpful in creating historical and comparative trend analysis to gain insight into critical database performance issues. AWR provides a rich history of Oracle performance statistics that shows how an Oracle database system has been trending for as long as the AWR data is retained. So Where Do We Start? Any statistic in AWR can be trended quite easily. The AWR report consists of SQLs running against various “DBA_HIST_%” views taking the difference between two snapshots at a time. This implies that we can develop and execute similar SQL against those views to report the trend for any required statistic or database wait event. When analyzing AWR reports during the time a performance problem manifested, and some wait event or response time figures seem high or a new set of SQL statements come up, it is always a good idea to go back in history and check how the identified events, statistics, SQL or job behaved during similar times previously—it could yesterday or last week or even last month. Of course, this Side Note: AWR is not a replacement for real time monitoring; it contains historical data and can be used to investigate what happened or what caused the performance issue. One important difference between STATSPACK in Oracle9i and AWR in Oracle Database 10g is that 9i exposes the source code for statspack while 10g does not. However, you can use most of the 9i code scripts to understand AWR structures making it less hard to write queries against 10g DBA_HIST* tables. For your information, STATSPACK source code in 9i can be seen in the $ORACLE_HOME/rdbms/admin/sprepins.sql file. This file provides a fair idea about the “STATS$_%” tables used to store and generate STATSPACk reports. In most cases, you can use the same SQL and replace the corresponding “STAT$_%” tables with “DBA_HIST_%” tables. comparison needs to be done when same type of user jobs were running (same load), just to rule out whether that average response time is normal for this database. If a job had the same response time earlier and users were not complaining, it usually means that the problem is somewhere else and we need to dig further, probably using specific SQL trace for that portion of the application. What is AWR? AWR takes periodic snapshots of performance statistics (by default every hour) and exposes this using DBA_HIST* views. Please note that AWR is a licensed product under Diagnostic Pack; hence, even to access these views directly, requires licenses needs to be purchased. Oracle Database 11g further expands on this. For example, there are 79 DBA_HIST* views available in Oracle 10g (10.2.0.4). SQL> select count (*) from dictionary where table_name like ‘DBA_HIST%’; COUNT(*) ---------- 79 On the other hand, there are 100 DBA_HIST* views available in Oracle 11g (11.1.0.7) Select count(*) from dictionary where table_name like ‘DBA_HIST%’; COUNT(*) ---------------- 100 How Much Space Does AWR Use? The following query can be used to see the current occupied space by AWR data. Select SPACE_USAGE_KBYTES from v$sysaux_occupants where occupant_name like ‘%AWR%’; SPACE_USAGE_KBYTES ------------------ 4,384,640 1 row selected. Note that this size will vary depending on retention period, the snapshot frequency, the number of datafiles, etc. You can monitor this space in a development/test instance to project space requirements in case you want to increase the retention period in production. Retention You can use the following query to see the current retention policy. Select extract( day from snap_interval) *24*60+ extract( hour from snap_interval) *60+ extract( minute from snap_interval ) “Snapshot Interval”, extract( day from retention) *24*60+ extract( hour from retention) *60+ extract( minute from retention ) “Retention Interval(in Minutes)” , extract(day from retention) “Retention (in Days)” from dba_hist_wr_control; Snapshot Interval Retention Interval(in Minutes) Retention (in Days) ----------------- ------------------------------ ------------------- 30 129,600 90

Upload: goyalkapil

Post on 05-Apr-2015

1.707 views

Category:

Documents


8 download

DESCRIPTION

Trending is very important for database performanceanalysis. It exposes the performance profile of adatabase in terms of IO, CPU usage, wait-event responsetime, etc. Starting in Oracle Database 10g, AWR performancedata that is collected and stored out of the box is very helpfulin creating historical and comparative trend analysis togain insight into critical database performance issues. AWRprovides a rich history of Oracle performance statistics thatshows how an Oracle database system has been trending foras long as the AWR data is retained.

TRANSCRIPT

Page 1: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

Page 4 ■ 4th Qtr 2010

Oracle AWR (Automatic Workload Repository) Trending

By Kapil Goyal

T rending is very important for database performance analysis. It exposes the performance profile of a database in terms of IO, CPU usage, wait-event response

time, etc. Starting in Oracle Database 10g, AWR performance data that is collected and stored out of the box is very helpful in creating historical and comparative trend analysis to gain insight into critical database performance issues. AWR provides a rich history of Oracle performance statistics that shows how an Oracle database system has been trending for as long as the AWR data is retained.

So Where Do We Start?Any statistic in AWR can be trended quite easily. The AWR report consists of SQLs running against various “DBA_HIST_%” views taking the difference between two snapshots at a time. This implies that we can develop and execute similar SQL against those views to report the trend for any required statistic or database wait event.

When analyzing AWR reports during the time a performance problem manifested, and some wait event or response time figures seem high or a new set of SQL statements come up, it is always a good idea to go back in history and check how the identified events, statistics, SQL or job behaved during similar times previously—it could yesterday or last week or even last month. Of course, this

Side Note: AWR is not a replacement for real time monitoring; it contains historical data and can be used to investigate what happened or what caused the performance issue. One important difference between STATSPACK in Oracle9i and AWR in Oracle Database 10g is that 9i exposes the source code for statspack while 10g does not. However, you can use most of the 9i code scripts to understand AWR structures making it less hard to write queries against 10g DBA_HIST* tables. For your information, STATSPACK source code in 9i can be seen in the $ORACLE_HOME/rdbms/admin/sprepins.sql file. This file provides a fair idea about the “STATS$_%” tables used to store and generate STATSPACk reports. In most cases, you can use the same SQL and replace the corresponding “STAT$_%” tables with “DBA_HIST_%” tables.

comparison needs to be done when same type of user jobs were running (same load), just to rule out whether that average response time is normal for this database. If a job had the same response time earlier and users were not complaining, it usually means that the problem is somewhere else and we need to dig further, probably using specific SQL trace for that portion of the application.

What is AWR?AWR takes periodic snapshots of performance statistics (by default every hour) and exposes this using DBA_HIST* views. Please note that AWR is a licensed product under Diagnostic Pack; hence, even to access these views directly, requires licenses needs to be purchased. Oracle Database 11g further expands on this. For example, there are 79 DBA_HIST* views available in Oracle 10g (10.2.0.4).

SQL> select count (*) from dictionary where table_name like ‘DBA_HIST%’;

COUNT(*)---------- 79

On the other hand, there are 100 DBA_HIST* views available in Oracle 11g (11.1.0.7)

Select count(*) from dictionary where table_name like ‘DBA_HIST%’;

COUNT(*)---------------- 100

How Much Space Does AWR Use?The following query can be used to see the current occupied space by AWR data.

Select SPACE_USAGE_KBYTES from v$sysaux_occupants where occupant_name like ‘%AWR%’;

SPACE_USAGE_KBYTES------------------ 4,384,640

1 row selected.

Note that this size will vary depending on retention period, the snapshot frequency, the number of datafiles, etc. You can monitor this space in a development/test instance to project space requirements in case you want to increase the retention period in production.

Retention You can use the following query to see the current retention policy.

Select extract( day from snap_interval) *24*60+ extract( hour from snap_interval) *60+ extract( minute from snap_interval ) “Snapshot Interval”, extract( day from retention) *24*60+ extract( hour from retention) *60+ extract( minute from retention ) “Retention Interval(in Minutes)” , extract(day from retention) “Retention (in Days)” from dba_hist_wr_control;

Snapshot Interval Retention Interval(in Minutes) Retention (in Days)----------------- ------------------------------ ------------------- 30 129,600 90

Page 2: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

4th Qtr 2010 ■ Page 5

I personally prefer to have 35 days retention, so that it can cover the whole month. If you can afford to store this data for longer periods, then I would strongly suggest you do that. In any case, to trend data for a specified period, you will need to retain the AWR data for that period.

Benefits of AWR?The following list includes some key benefits of Automatic Workload repository:

•• Easy to find recent spike in load•• Helpful in capacity planning•• Design Load testing based on current capacity and load•• Easy to write queries against AWR tables•• SQL statistics history•• Easy to find if the SQL execution plan for particular SQL statement that

changed recently

I have detailed the following AWR scripts that I use in my day-to-day tasks. These are based upon DBA_HIST* tables and trends that same data that a set of AWR reports would have otherwise provided. You can modify the script to trend the data for particular hours, days, weeks or for the whole data available based on the AWR retention you have set for that database.

System Event TrendingThe following script can be very helpful when you want to see how an event trended over a period of time. The script takes two arguments—date and event name.

event_response.sqlalter session set nls_date_format=’dd-mon-yy’;set lines 150 pages 100 echo off feedback offcol date_time heading ‘Date time|mm/dd/yy_hh_mi_hh_mi’ for a25col event_name for a26col waits for 99,999,999,999 heading ‘Waits’col time for 99,999 heading ‘Total Wait|Time(sec)’col avg_wait_ms for 99,999 heading ‘Avg Wait|(ms)’prompt “Enter the date in DD-Mon-YY Format:”

WITH system_event AS (select sn.begin_interval_time begin_interval_time, sn.end_interval_time end_interval_time, se.event_name event_name, se.total_waits e_total_waits, lag(se.total_waits,1) over (order by se.snap_id) b_total_waits, se.total_timeouts e_total_timeouts, lag(se.total_timeouts,1) over (order by se.snap_id) b_total_timeouts, se.time_waited_micro e_time_waited_micro, lag(se.time_waited_micro,1) over (order by se.snap_id) b_time_waited_micro from dba_hist_system_event se, dba_hist_snapshot sn where trunc(sn.begin_interval_time) =’&Date’ and se.snap_id = sn.snap_id and se.dbid = sn.dbid and se.instance_number = sn.instance_number and se.dbid = (select dbid from v$database) and se.instance_number = (select instance_number from v$instance) and se.event_name=’&event_name’)selectto_char(se1.BEGIN_INTERVAL_TIME,’mm/dd/yy_hh24_mi’)|| to_char(se1.END_INTERVAL_TIME,’_hh24_mi’) date_time,se1.event_name,se1.e_total_waits-nvl(se1.b_total_waits,0) waits,(se1.e_time_waited_micro - nvl(se1.b_time_waited_micro,0)) / 1000000 time,((se1.e_time_waited_micro - nvl(se1.b_time_waited_micro,0)) / 1000) / (se1.e_total_waits - nvl(se1.b_total_waits,0)) avg_wait_msfrom system_event se1where (se1.e_total_waits-nvl(se1.b_total_waits,0)) > 0 and nvl(se1.b_total_waits,0) > 0/

continuedonpage6

Sample Output

SQL> @event_response“Enter the date in DD-Mon-YY Format:”Enter value for date: 29-jan-10old 15: trunc(sn.begin_interval_time) =’&Date’new 15: trunc(sn.begin_interval_time) =’29-jan-10’Enter value for event_name: db file sequential readold 21: and se.event_name=’&event_name’new 21: and se.event_name=’db file sequential read’

Total AvgDate time Wait Time Waitmm/dd/yy_hh_mi_hh_mi EVENT_NAME Waits (sec) (ms)----------------------- ------------------------- ---------- ---------- -------01/29/10_01_00_02_00 db file sequential read 551,356 4,500 801/29/10_02_00_03_00 db file sequential read 1,114,616 7,921 701/29/10_03_00_04_00 db file sequential read 764,481 5,926 801/29/10_04_00_05_00 db file sequential read 845,195 6,633 801/29/10_05_00_06_00 db file sequential read 1,385,501 8,501 601/29/10_06_00_07_00 db file sequential read 3,785,824 14,703 401/29/10_07_00_08_00 db file sequential read 2,393,513 6,996 301/29/10_08_00_09_00 db file sequential read 2,590,092 6,273 201/29/10_09_00_10_00 db file sequential read 2,322,715 5,390 201/29/10_10_00_11_00 db file sequential read 2,806,934 6,913 201/29/10_11_00_12_00 db file sequential read 2,691,573 3,501 101/29/10_12_00_13_00 db file sequential read 1,737,420 3,349 201/29/10_13_00_14_00 db file sequential read 489,453 2,297 501/29/10_14_00_15_00 db file sequential read 791,114 2,842 4

Load Profile TrendingThe first page of the AWR report gives us lot of information about database behavior such as whether database is more read intensive or write intensive (depending upon Physical Reads/sec or Physical Writes/Sec statistics), whether it does lots of logical IO/sec, has a high OLTP component (as derived from higher number of Transactions/Sec), or does a lot of parsing (Hard or soft). By looking at a one-hour-interval report you can get some idea, but if you can look at the trend for the whole day, you will get a broader picture as some databases trend from an OLTP workload during the day and DSS workload during off hours.

The Load Profile section also helps to determine if load has changed over time compared to the baseline (i.e., an AWR report when the system was healthy). There is no good or best value for these statistics. Although these numbers varies by database and application, when the number of Logons/sec more than 10 or the database has a higher hard parse/sec (>100 or so), this could be an indication of an underlying configuration or application design issue that implements itself as a performance issue. In this regard, the number of Logical Reads/sec is also a good statistics to look at.

The script below trends the physical reads/sec.

lp.sql

alter session set nls_date_format=’dd-mon-yy’;set lines 130 pages 1000 echo off feedback offcol stat_name for a25col date_time for a20col BEGIN_INTERVAL_TIME for a20col END_INTERVAL_TIME for a20prompt “Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘redo size’,’physical reads’,’physical writes’,’session logical reads’ etc.”

WITH sysstat AS ( select sn.begin_interval_time begin_interval_time,

Page 3: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

Page 6 ■ 4th Qtr 2010

sn.end_interval_time end_interval_time, ss.stat_name stat_name, ss.value e_value, lag(ss.value,1) over (order by ss.snap_id) b_value from dba_hist_sysstat ss, dba_hist_snapshot sn where trunc(sn.begin_interval_time) =’&Date’ and ss.snap_id = sn.snap_id and ss.dbid = sn.dbid and ss.instance_number = sn.instance_number and ss.dbid = (select dbid from v$database) and ss.instance_number = (select instance_number from v$instance) and ss.stat_name=’&stat_name’)selectto_char(BEGIN_INTERVAL_TIME,’mm/dd/yy_hh24_mi’)|| to_char(END_INTERVAL_TIME,’_hh24_mi’) date_time,stat_name,round((e_value-nvl(b_value,0))/(extract( day from (end_interval_time-begin_interval_time) )*24*60*60+extract( hour from (end_interval_time-begin_interval_time) )*60*60+extract( minute from (end_interval_time-begin_interval_time) )*60+extract( second from (end_interval_time-begin_interval_time)) ), 0) per_secfrom sysstatwhere (e_value-nvl(b_value,0)) > 0 and nvl(b_value,0) > 0/

Oracle AWR (Automatic Workload Repository) Trending continuedfrompage5

Sample Output

SQL> @lp

Session altered.

“Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘redo size’,’physical reads’,’physical writes’,’session logical reads’ etc.”Enter value for date: 29-jan-10old 11: trunc(sn.begin_interval_time) =’&Date’new 11: trunc(sn.begin_interval_time) =’29-jan-10’Enter value for stat_name: physical readsold 17: and ss.stat_name=’&stat_name’new 17: and ss.stat_name=’physical reads’

DATE_TIME STAT_NAME PER_SEC-------------------- ------------------------- ----------01/29/10_01_00_02_00 physical reads 434701/29/10_02_00_03_00 physical reads 455401/29/10_03_00_04_00 physical reads 470801/29/10_04_00_05_00 physical reads 497201/29/10_05_00_06_00 physical reads 679601/29/10_06_00_07_00 physical reads 668501/29/10_07_00_08_00 physical reads 475801/29/10_08_00_09_00 physical reads 583201/29/10_09_00_10_00 physical reads 521701/29/10_10_00_11_00 physical reads 486701/29/10_11_00_12_00 physical reads 5685

Page 4: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

4th Qtr 2010 ■ Page 7

01/29/10_12_00_13_00 physical reads 444301/29/10_13_00_14_00 physical reads 385801/29/10_14_00_15_00 physical reads 4364SQL>

Time Model Statistics Trend—Time Matters The Time Model Statistics is a great feature in Oracle Database 10g. It tells us where exactly the time is being spent. Industry performance experts like Anjo Kolk, Cary Millsap (method-r.com) have always talked about Response time, and that is exactly what matters.

Response time=service time+wait time

Time Model Statistics is based on Response time as well, as it shows the time spent in the database calls by different operation types like DB Time, sql execution elapsed time, DB CPU, parsing, and hard parsing. If you are comparing two AWR reports for good and bad period and you don’t see much difference in “DB Time” statistics, then most likely the issue is not within the database but somewhere else (like client side or middleware/app layer, etc.).

The most important statistics of the time model statistics is DB time. This statistics represents the total time spent in database calls and is an indicator of the total instance workload. It is calculated by aggregating the CPU and wait times of all sessions not waiting on idle wait events (non-idle user sessions).

If the load•increases on the system, then DB•time•increases. More users mean more database calls; hence, higher DB time. If performance degrades like higher•IO•time•or•wait•time, then also DB•Time•increases (usually because there are many more sessions waiting for non-idle events) and focus should be where the time is being spent and how can we reduce it.

The following script can be used to see the trend for Time Model Statistics. It can also be used to see when the database was the busiest—just look for highest DB time.

time_model.sql

alter session set nls_date_format=’dd-mon-yy’;

set lines 150 pages 1000col date_time heading ‘Date time’ for a25

col stat_name heading ‘Statistics Name’ for a25col time heading ‘Time (s)’ for 99,999,999,999prompt “Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘DB time’, ‘DB CPU’, ‘sql execute elapsed time’, ‘PL/SQL execution elapsed time’, ‘parse time elapsed’, ‘background elapsed time’”

WITH systimemodel AS ( select sn.begin_interval_time begin_interval_time, sn.end_interval_time end_interval_time, st.stat_name stat_name, st.value e_value, lag(st.value,1) over (order by st.snap_id) b_value from DBA_HIST_SYS_TIME_MODEL st, dba_hist_snapshot sn where trunc(sn.begin_interval_time) =’&Date’ and st.snap_id = sn.snap_id and st.dbid = sn.dbid and st.instance_number = sn.instance_number and st.dbid = (select dbid from v$database) and st.instance_number = (select instance_number from v$instance) and st.stat_name=’&stat_name’)select

continuedonpage8

to_char(BEGIN_INTERVAL_TIME,’mm/dd/yy_hh24_mi’)|| to_char(END_INTERVAL_TIME,’_hh24_mi’) date_time,stat_name,round((e_value-nvl(b_value,0))/1000000) timefrom systimemodelwhere (e_value-nvl(b_value,0)) > 0 and nvl(b_value,0) > 0/

Sample Output

SQL> @”time_model.sql”“Enter the date in DD-Mon-YY Format and Stats you want to trend like ‘DB time’, ‘DB CPU’, ‘sql execute elapsed time’, ‘PL/SQL execution elapsed time’, ‘parse time elapsed’, ‘background elapsed time’”Enter value for date: 29-jan-10old 11: trunc(sn.begin_interval_time) =’&Date’new 11: trunc(sn.begin_interval_time) =’29-jan-10’Enter value for stat_name: DB timeold 17: and st.stat_name=’&stat_name’new 17: and st.stat_name=’DB time’

Date time Statistics Name Time (s)------------------------- ------------------------- ---------------01/29/10_01_00_02_00 DB time 45,78901/29/10_02_00_03_00 DB time 54,30801/29/10_03_00_04_00 DB time 62,48001/29/10_04_00_05_00 DB time 57,67701/29/10_05_00_06_00 DB time 80,02801/29/10_06_00_07_00 DB time 76,76501/29/10_07_00_08_00 DB time 45,68001/29/10_08_00_09_00 DB time 53,92601/29/10_09_00_10_00 DB time 45,77801/29/10_10_00_11_00 DB time 40,40201/29/10_11_00_12_00 DB time 39,80101/29/10_12_00_13_00 DB time 31,82101/29/10_13_00_14_00 DB time 17,86601/29/10_14_00_15_00 DB time 20,88801/29/10_15_00_16_00 DB time 18,000

By looking at above data, it shows instance was busiest between 05:00-06:00 am.

Following is a sample AWR where application was spending most of it’s time on CPU.

Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- ---------Begin Snap: 14137 23-Nov-09 10:15:18 414 175.7 End Snap: 14138 23-Nov-09 10:18:33 425 175.0 Elapsed: 3.25 (mins) DB Time: 125.30 (mins)

Time Model Statistics -> Total time in database user-calls (DB Time): 7517.9s-> Statistics including the word “background” measure background process time, and so do not contribute to the DB time statistic-> Ordered by % or DB time desc, Statistic name

Statistic Name Time (s) % of DB Time------------------------------------------ ------------------ ------------sql execute elapsed time 7,462.7 99.3DB CPU 5,469.8 72.8PL/SQL execution elapsed time 1,573.7 20.9inbound PL/SQL rpc elapsed time 121.5 1.6parse time elapsed 117.0 1.6hard parse elapsed time 48.9 .7connection management call elapsed time 12.4 .2repeated bind elapsed time 1.9 .0hard parse (sharing criteria) elapsed time 0.6 .0PL/SQL compilation elapsed time 0.3 .0sequence load elapsed time 0.1 .0

Page 5: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

Page 8 ■ 4th Qtr 2010

failed parse elapsed time 0.0 .0DB time 7,517.9 N/Abackground elapsed time 16.8 N/Abackground cpu time 7.0 N/A -------------------------------------------------------------

As you can see, only in 3.25 minutes (195 sec) of walk clock time it chewed up 5469 CPU seconds. By looking at this number, you can estimate the minimum number of CPUs on this box. It is really nice to play with these numbers and to draw some conclusions. In other words, to consume 5469 CPU seconds in 195 seconds, we must have (5469/195=28.046) CPUs. Generally, I would expect it to be a 32 CPU box that is true as following is what “Operating System Statistics” shows in the AWR.

Operating System Statistics

Statistic Total-------------------------------- --------------------AVG_BUSY_TIME 18,644AVG_IDLE_TIME 816AVG_IOWAIT_TIME 558AVG_SYS_TIME 2,017AVG_USER_TIME 16,622BUSY_TIME 596,848IDLE_TIME 26,307IOWAIT_TIME 18,141SYS_TIME 64,732USER_TIME 532,116LOAD 0OS_CPU_WAIT_TIME 663,600

Oracle AWR (Automatic Workload Repository) Trending continuedfrompage7

RSRC_MGR_CPU_WAIT_TIME 0PHYSICAL_MEMORY_BYTES 134,217,728,000NUM_CPUS 32NUM_CPU_CORES 32

Want to Know if Execution Plan Changed Recently?In my experience, in a significant percentage of cases, sudden performance degradation occurs because the SQL execution plan for one or more key SQL queries changes. Usually, it exists for cases where clients state that nothing has changed, no new code or no load change occurred but performance has degraded drastically and queries are performing poorly. Whenever I conduct a performance analysis and identify if performance got degraded because of one or few SQLs, I always try to find why that particular SQL statement is consuming higher logical I/O (usually including increased physical I/O and CPU usage) compared to when it was running fine. The following SQL tells me when exactly the execution plan changed recently for the given SQLID.

sqlid_stat.sql

set lines 150 pages 150col BEGIN_INTERVAL_TIME for a23col PLAN_HASH_VALUE for 9999999999col date_time for a18col snap_id heading ‘SnapId’col executions_delta heading “No. of exec”col sql_profile heading “SQL|Profile” for a7col date_time heading ‘Date time’

Sample Output

SQL> @sqlid_stat.sqlEnter value for sqlid: 0pjnz23mbf3wmold 23: (‘&SQLID’)new 23: (‘0pjnz23mbf3wm’)

SnapId PLAN_HASH_VALUE Date time No. of exec LIO/exec CPUTIM/exec ETIME/exec PIO/exec ROWs/exec------------ ----------------- -------------------- ------------- ----------------- ------------- ------------- ------------- ------------- 105152 1415312706 12/31/09_0450_0500 1277 14.40 .00 .00 .00 9.23 105459 1415312706 01/02/10_0800_0810 166 20.68 .00 .01 .34 1.00 105460 1415312706 01/02/10_0810_0820 444 11.24 .00 .00 .20 .97 105461 1415312706 01/02/10_0820_0830 1081 13.84 .00 .00 .21 1.18 105462 1415312706 01/02/10_0830_0840 1239 16.59 .00 .00 .13 2.03 105464 1415312706 01/02/10_0850_0900 1194 16.75 .00 .00 .09 3.10 105465 1415312706 01/02/10_0900_0910 610 16.87 .00 .00 .19 7.08 105466 1415312706 01/02/10_0910_0920 673 6.94 .00 .00 .15 6.70 105470 1415312706 01/02/10_0950_1000 1909 14.46 .00 .00 .12 3.20 105471 1415312706 01/02/10_1000_1010 2242 16.68 .00 .00 .16 3.72 105472 1415312706 01/02/10_1010_1020 3030 16.72 .00 .00 .08 3.75 105473 1415312706 01/02/10_1020_1030 51 16.98 .00 .00 .98 224.00 105689 1415312706 01/03/10_2220_2230 2000 3.30 .00 .00 .00 .00 105746 1415312706 01/04/10_0750_0800 1458 6.38 .00 .00 .00 .42 105850 1415312706 01/05/10_0110_0120 46 13.59 .00 .00 .00 .02 105851 1415312706 01/05/10_0120_0130 220 3.14 .00 .00 .00 .01 105853 1415312706 01/05/10_0140_0150 2 253.50 .05 .05 .00 1.00 105854 1415312706 01/05/10_0150_0200 9 17.00 .00 .00 .00 1.22 105855 1415312706 01/05/10_0200_0210 1002 3.13 .00 .00 .00 .02 105881 1415312706 01/05/10_0620_0630 1569 11.79 .00 .00 .00 1.19 105987 731370628 01/06/10_0000_0010 118 10933.76 1.32 1.32 .00 .03 105988 731370628 01/06/10_0010_0020 965 10978.74 1.33 1.33 .00 .03 105989 731370628 01/06/10_0020_0030 2658 10717.09 1.31 1.31 .00 .01 105990 731370628 01/06/10_0030_0040 4019 10831.99 1.31 1.34 .00 .01 105991 731370628 01/06/10_0040_0050 4577 10715.70 1.30 1.33 .00 .01 105992 731370628 01/06/10_0050_0100 3488 10891.26 1.33 1.34 .00 .01 105993 731370628 01/06/10_0100_0110 2709 10833.78 1.32 1.32 .00 .09

Page 6: Oracle AWR Trending - IOUG SELECT Magazine -Quarter 4 2010

4th Qtr 2010 ■ Page 9

col avg_lio heading ‘LIO/exec’ for 99999999999.99col avg_cputime heading ‘CPUTIM/exec’ for 9999999.99col avg_etime heading ‘ETIME/exec’ for 9999999.99col avg_pio heading ‘PIO/exec’ for 9999999.99col avg_row heading ‘ROWs/exec’ for 9999999.99

SELECT distincts.snap_id ,PLAN_HASH_VALUE,to_char(s.BEGIN_INTERVAL_TIME,’mm/dd/yy_hh24mi’)|| to_char(s.END_INTERVAL_TIME,’_hh24mi’) Date_Time,SQL.executions_delta,SQL.buffer_gets_delta/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_lio,--SQL.ccwait_delta,(SQL.cpu_time_delta/1000000)/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_cputime ,(SQL.elapsed_time_delta/1000000)/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_etime,SQL.DISK_READS_DELTA/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_pio,SQL.rows_processed_total/decode(nvl(SQL.executions_delta,0),0,1,SQL.executions_delta) avg_row--,SQL.sql_profileFROMdba_hist_sqlstat SQL,dba_hist_snapshot sWHERESQL.instance_number =(select instance_number from v$instance)and SQL.dbid =(select dbid from v$database)and s.snap_id = SQL.snap_idAND sql_id in(‘&SQLID’) order by s.snap_id/

Note – Query’s result is restricted to the AWR data retention policy.

As you can see, the execution plan changed (plan_hash_value 731370628 was snapped instead of 1415312706 for that SQLID) starting on 01/06 midnight and the query’s logical IO jumped from 10 to 10,000 range. Larger logical IO per execution corresponds to higher CPU utilization, hence higher execution time. From this, we know that the query started performing poorly because of execution plan change. So with this SQL, we have identified the cause of the performance degradation. The following query can be used to identify the execution plan for both the plan hash values.

xp_awr.sql

select plan_table_output from table (dbms_xplan.display_awr(‘&sql_id’,null,null, ‘ADVANCED +PEEKED_BINDS’));

We should now ask the question, “Why did the plan change?” Usually, the first suspect would be the optimizer statistics. Generally, statistics collection jobs run at night, so it is quite possible that statistics were either not collected for one or more table(s) but completed for other table(s) involved in the query. Actually in this particular case, this was indeed the reason because multiple statistics collection jobs were running with different parameters at the same time. However, it could also have been a large data load to tables involved in the query as well—this can be investigated further.

In many cases, for accurate measurement, we will still need to use 10046 traces or may be ASH (Active Session History) to see what a session was doing in the past, so it all depends on the problem.

ConclusionAWR is a very powerful tool that can help with Oracle performance analysis. AWR provides history that can be analyzed for trends (although it is not only the tool for everything!). Over all, it is a wonderful tool to use and it really helps to solve critical performance issues. In this article, we listed a few SQLs that I have developed and use every day to trend performance statistics in order to spot and drill down into any performance and load changes. Feel free to modify and extend these scripts for your use, and do let me know if this helped.

C

■ ■ ■ About the AuthorKapil•Goyal is an Oracle performance specialist at Fidelity Investments handling enterprise-wide database performance escalations. He is currently principal database administrator on the database engineering team in his third year at Fidelity. His previous position working directly for Oracle’s consulting group gave him exposure to many different companies’ database performance challenges. Goyal is certified in all Oracle versions from 8i through 11g, is a frequent speaker at the Dallas Oracle User Group and has published several articles with DOUG and SELECT. For questions or clarification you may contact him via e-mail at [email protected].

USERS GROUP CALENDARFor the most updated calendar, please visit www.ioug.org

DECEMBER•2010December•8New•York•Oracle•Users•GroupNYC Metro Area Oracle Users Group Meeting8:30 a.m. - 5 p.m.New Yorker HotelEvent URL: www.nyoug.orgContact: [email protected]; 212-978-8890

ApRIl•2011April•10-14COllABORATE•11•–•IOUG••Forum,•2011Orange County Convention Center WestOrlando, FloridaEvent URL: http://collaborate10.ioug.org/