mark inman u.s. navy (naval sea logistics center) session #213 analytic sql for beginners
TRANSCRIPT
Mark InmanU.S. Navy (Naval Sea Logistics Center)
Session #213
Analytic SQL for Beginners
Speaker Qualifications
• Mark Inman – IT Specialist – U.S. Navy• Oracle Certified Professional Database Administrator
9i• Presented a similar presentation to coworkers.
Background
• Analytic SQL was introduced in Oracle 8i.• Where in Oracle Documentation?
– Data Warehousing Guide • “SQL for Analysis” or “SQL for Analysis and Reporting”.
• good for everyday use
Analytic Syntax
• Column List Only• Form Of
– Function– Partition Clause– Order by Clause– Windowing Clause
Analytic Universe
Non-Analytic
Available
PARTITION BY
ORDER BY
WINDOWING
Ranking
Windowing Aggregate
Reporting Aggregate
RATIO_TO_REPORT
LAG/LEAD
FIRST/LAST
Linear Regression
Inverse Percentile
Objective 1
• To get aggregate and detail data in the same query – without selecting the same table twice.
Simple Example Analyticselect object_id , owner , max(object_id) over () as max_object_id /* ANALYTIC EXPRESSION */from thingwhere rownum <= 5;
OBJECT_ID OWNER MAX_OBJECT_ID---------- ----- ------------- 20 SYS 44 44 SYS 44 28 SYS 44 15 SYS 44 29 SYS 44
Simple Example Analytic Query Plan
Execution Plan
------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 WINDOW (BUFFER)
2 1 COUNT (STOPKEY)
3 2 TABLE ACCESS (FULL) OF 'THING' (TABLE)
Duplicate Non-Analytic Attempt 1select object_id , owner , max(object_id)from thingwhere rownum <= 5group by object_id; , owner *ERROR at line 3:ORA-00979: not a GROUP BY expression
Oops!
Duplicate Non-Analytic Attempt 2select object_id , owner , max(object_id) max_object_idfrom thingwhere rownum <= 5group by object_id , owner;
OBJECT_ID OWNER MAX_OBJECT_ID---------- ----- ------------- 15 SYS 15 20 SYS 20 28 SYS 28 29 SYS 29 44 SYS 44
Duplicate Non-Analytic Success in 3
select object_id , owner , z.max_object_idfrom thing , ( select max(object_id) max_object_id from thing where rownum <= 5 ) zwhere rownum <= 5;
OBJECT_ID OWNER MAX_OBJECT_ID---------- ----- ------------- 20 SYS 44 44 SYS 44 28 SYS 44 15 SYS 44 29 SYS 44
Non-Analytic Query Plan
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 COUNT (STOPKEY)
2 1 NESTED LOOPS
3 2 VIEW
4 3 SORT (AGGREGATE)
5 4 COUNT (STOPKEY)
6 5 TABLE ACCESS (FULL) OF 'THING' (TABLE)
7 2 TABLE ACCESS (FULL) OF 'THING' (TABLE)
Two table scans – but so what – it is fast!
SET AUTOTRACE Statistics
• recursive calls• db block gets• consistent gets• physical reads• redo size• bytes sent via SQL*Net to client• bytes received via SQL*Net from client• SQL*Net roundtrips to/from client• sorts (memory)• sorts (disk)• rows processed
SET AUTOTRACE
• Emphasis on …– db block gets (current tkprof)– consistent gets (query tkprof)– sorts (memory)– table scans from “Execution Plan”
• No emphasis on … – physical reads– elapsed time (SET TIMING ON)
Statistics
Stat Analytic Non-Analytic
recursive calls 1 0
db block gets 0 0
consistent gets 4 9
physical reads 0 0
redo size 0 0
bytes sent … 605 605
bytes received … 508 508
SQL*Net roundtrips 2 2
sorts (memory) 1 0
sorts (disk) 0 0
SET AUTOTRACE
• Options– traceonly (no rows)– statistics– explain
• SQL*Plus command• PLUSTRACE role required – not a default role• @$ORACLE_HOME/rdbms/admin/plustrce.sql• Documentation
– SQL*Plus® User's Guide and Reference– Effective Oracle by Design by Thomas Kyte
Statistics - Scaling
Non-Analytic Buffer Gets
Analytic Memory Sorts
Analytic Buffer Gets
Analytic Memory Sorts
5 9 0 4 1
50 12 0 4 1
500 51 0 9 1
all 4329 0 642 1
Statistics - Scaling
Buffer Gets Versus Rows
4 4 9
Analytic, 642
9 12 51
Non-Analytic,
4329
-1000
0
1000
2000
3000
4000
5000
5 50 500 46254Rows
Bu
ffer
Get
s
Statistics - Scaling
Buffer Gets Versus Rows
4 49
Analytic, 642
Non-Analytic,
4329
51
129
1
10
100
1000
10000
5 50 500 46254Rows
Bu
ffer
Get
s
Objective 1 - To get aggregate and detail data in the same query – without selecting the same table twice.
• Better Performance• Scales Better• Smaller Query (Less Lines of Code)
Objective 1 - To get aggregate and detail data in the same query – without selecting the same table twice.
MAX OVER ()
Reporting Aggregate Function
{SUM | AVG | MAX | MIN | COUNT | STDDEV | VARIANCE ... }
([ALL | DISTINCT] {value expression1 | *})
OVER ([PARTITION BY value expression2[,...]])
Objective 2
• To compare traditional ranking and analytic ranking and show why analytic ranking is better.
Non-Analytic Top-1 Queryselect owner , object_name , object_type , last_ddl_time , rownumfrom ( select * from minman_dba.thing ORDER BY LAST_DDL_TIME ASC NULLS FIRST )where rownum = 1;
Analytic Top-1 Queryselect owner , object_name , subobject_name , object_type , MY_ROWNUMfrom ( select x.* , ROW_NUMBER() OVER ( ORDER BY LAST_DDL_TIME ASC NULLS FIRST ) AS MY_ROWNUM from minman_dba.thing x )where MY_ROWNUM = 1;
Top-1 Queriesselect owner , object_name , subobject_name , object_type , MY_ROWNUMfrom ( select x.* , ROW_NUMBER() OVER ( ORDER BY LAST_DDL_TIME ASC NULLS FIRST ) AS MY_ROWNUM from minman_dba.thing x )where MY_ROWNUM = 1;
select owner , object_name , object_type , last_ddl_time , ROWNUMfrom ( select * from minman_dba.thing ORDER BY LAST_DDL_TIME ASC NULLS FIRST )where ROWNUM = 1;
SQL KeywordAlias in Column List
Top-1 Queries - Plans
Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS 1 0 COUNT (STOPKEY) 2 1 VIEW 3 2 SORT (ORDER BY STOPKEY) 4 3 TABLE ACCESS (FULL) OF 'THING' (TABLE)
Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS 1 0 VIEW 2 1 WINDOW (SORT PUSHED RANK) 3 2 TABLE ACCESS (FULL) OF 'THING' (TABLE)
Top-1 Queries - Statistics
Statistics---------------------------------------------------------- 0 db block gets 642 consistent gets 0 physical reads 1 sorts (memory) 0 sorts (disk)
Statistics---------------------------------------------------------- 0 db block gets 642 consistent gets 0 physical reads 1 sorts (memory) 0 sorts (disk)
Top-1 Queries – Two Object Types – Non-Analyticselect owner , object_name , object_type , last_ddl_time , rownumfrom ( select * from minman_dba.thing where object_type =
'TABLE' order by last_ddl_time )where rownum = 1union all…
select owner , object_name , object_type , last_ddl_time , rownumfrom ( select * from minman_dba.thing where object_type =
'PROCEDURE' order by last_ddl_time )where rownum = 1
Top-1 Queries – Two Object Types – Non-Analytic
OWNER OBJECT_NAME OBJECT_TYP LAST_DDL_ ROWNUM---------- ------------ ---------- --------- ----------SYS UNDO$ TABLE 03-FEB-06 1SYS PSTUBT PROCEDURE 03-FEB-06 1
Top-1 Queries – Two Object Types – Non-Analyticselect y.owner , y.object_name , y.object_type , y.last_ddl_timefrom ( select object_type, min(last_ddl_time)
min_last_ddl_time from minman_dba.thing where object_type in ('TABLE','PROCEDURE') group by object_type ) x inner join minman_dba.thing y on x.min_last_ddl_time = y.last_ddl_time;
30 rows selected
Top-1 Queries – Two Object Types - Analyticselect owner , object_name , object_type , last_ddl_time , my_rownumfrom ( select t.* , row_number() over ( PARTITION BY OBJECT_TYPE order by last_ddl_time ) my_rownum from minman_dba.thing t WHERE OBJECT_TYPE IN ('TABLE','PROCEDURE') )where my_rownum = 1
Top-1 Queries – Two Object Types - Analytic
OWNER OBJECT_NAME OBJECT_TYP LAST_DDL_ ROWNUM---------- ------------ ---------- --------- ----------SYS UNDO$ TABLE 03-FEB-06 1SYS PSTUBT PROCEDURE 03-FEB-06 1
Top-1 Queries – Two Object Types – Query PlansExecution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS 1 0 UNION-ALL 2 1 COUNT (STOPKEY) 3 2 VIEW 4 3 SORT (ORDER BY STOPKEY) 5 4 TABLE ACCESS (FULL) OF 'THING' (TABLE) 6 1 COUNT (STOPKEY) 7 6 VIEW 8 7 SORT (ORDER BY STOPKEY) 9 8 TABLE ACCESS (FULL) OF 'THING' (TABLE)Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS 1 0 VIEW 2 1 WINDOW (SORT PUSHED RANK) 3 2 TABLE ACCESS (FULL) OF 'THING' (TABLE)
Top-1 Queries – Two Object Types – StatisticsStatistics---------------------------------------------------------- 0 db block gets 1284 consistent gets 0 physical reads 2 sorts (memory) 0 sorts (disk)
Statistics---------------------------------------------------------- 0 db block gets 642 consistent gets 0 physical reads 1 sorts (memory) 0 sorts (disk)
Top-1 Queries – All Object Types - Analyticselect owner , object_name , object_type , last_ddl_time , my_rownumfrom ( select t.* , row_number() over ( PARTITION BY OBJECT_TYPE order by last_ddl_time ) my_rownum from minman_dba.thing t )where my_rownum = 1
Top-1 Queries – Two Object Types - AnalyticOWNER OBJECT_NAME OBJECT_TYPE LAST_DDL_ MY_ROWNUM------ ------------------------ ------------------- --------- ----------SYS C_OBJ# CLUSTER 03-FEB-06 1SYS LOW_GROUP CONSUMER GROUP 03-FEB-06 1SYS REGISTRY$CTX CONTEXT 03-FEB-06 1SH CUSTOMERS_DIM DIMENSION 25-FEB-07 1SYS DATA_FILE_DIR DIRECTORY 25-FEB-07 1SYS AQ$_SCHEDULER$_JOBQTAB_V EVALUATION CONTEXT 03-FEB-06 1SYS GETTVOID FUNCTION 03-FEB-06 1SYS I_OBJ# INDEX 03-FEB-06 1SYSTEM LOGMNRC_GTCS_PK INDEX PARTITION 03-FEB-06 1EXFSYS EXPFILTER INDEXTYPE 03-FEB-06 1SYS /cc11c9d8_SerialVerFrame JAVA CLASS 03-FEB-06 1
… we are not showing the full result
RANK and DENSE_RANKselect owner , object_name , object_type , last_ddl_time , rn, r, drfrom ( select t.* , row_number() over (partition by object_type order by last_ddl_time) RN , rank() over (partition by object_type order by last_ddl_time) R , dense_rank() over (partition by object_type order by last_ddl_time) DR from minman_dba.thing t where object_Type in ('PROCEDURE') )where rn between 1 and 10;
RANK and DENSE_RANKowner object_name last ddl time RN R DR
SYS PSTUBT 20060203
212536 1 1 1
SYS PSTUB 20060203
212536 2 1 1
SYS SUBPTXT2 20060203
212536 3 1 1
SYS SUBPTXT 20060203
212536 4 1 1
SYS ODCIINDEXINFOFLAGSDUMP 20060203
212615 5 5 2
SYS ODCIINDEXINFODUMP 20060203
212615 6 5 2
SYS ODCIPREDINFODUMP 20060203
212615 7 5 2
SYS ODCIQUERYINFODUMP 20060203
212615 8 5 2
SYS ODCICOLINFODUMP 20060203
212615 9 5 2
SYS ODCISTATSOPTIONSDUMP 20060203
212615 10 5 2
Objective 2- To compare traditional ranking and analytic ranking and show why analytic ranking is better.
• Better Performance• Scales Better• Smaller Query (Less Lines of Code)• PARTITION BY
Objective 2- To compare traditional ranking and analytic ranking and show why analytic ranking is better.
ROW_NUMBER ( ) OVER ( [query_partition_clause] order_by_clause )
RANK ( ) OVER ( [query_partition_clause] order_by_clause )
DENSE_RANK ( ) OVER ( [query_partition_clause] order_by_clause )
Objective 3 - To show additional flexibility of analytic expressions.
create table another_thing ( first_col char(1) , second_col number )/
insert into another_thing values ('A',34897324123);insert into another_thing values ('A',57864511343);insert into another_thing values ('A',324863274233243);insert into another_thing values ('A',178234387613423);insert into another_thing values ('B',433298473219854);insert into another_thing values ('B',34231);insert into another_thing values ('B',34093487);
Additional Flexibilityselect first_col, second_col , row_number() over (partition by first_col order by second_col asc) my_1st_rownumfrom another_thing;
F SECOND_COL MY_1ST_ROWNUM- ---------- -------------A 3.4897E+10 1A 5.7865E+10 2A 1.7823E+14 3A 3.2486E+14 4B 34231 1B 34093487 2B 4.3330E+14 3
Additional Flexibilityselect first_col, second_col , row_number() over (partition by first_col order by second_col asc) my_1st_rownum , ROW_NUMBER() OVER (PARTITION BY FIRST_COL ORDER BY SECOND_COL DESC) MY_2ND_ROWNUMfrom another_thing;
F SECOND_COL MY_1ST_ROWNUM MY_2ND_ROWNUM- ---------- ------------- -------------A 3.4897E+10 1 4A 5.7865E+10 2 3A 1.7823E+14 3 2A 3.2486E+14 4 1B 34231 1 3B 34093487 2 2B 4.3330E+14 3 1
Additional Flexibilityselect first_col, second_col , row_number() over (partition by first_col order by second_col) my_1st_rownum , row_number() over (partition by first_col order by second_col desc) my_2nd_rownum , ROW_NUMBER() OVER ( PARTITION BY FIRST_COL ORDER BY MOD(SECOND_COL,10) ASC NULLS FIRST ) MY_3RD_ROWNUMfrom another_thing;
Additional Flexibility
first col
second col my 1st rownum
my 2nd rownum
my 3rd rownum
A 3.4897E+10 1 4 1
A 5.7865E+10 2 3 2
A 1.7823E+14 3 2 4
A 3.2486E+14 4 1 3
B 34231 1 3 1
B 34093487 2 2 3
B 4.3330E+14 3 1 2
Additional Flexibility – Query Plan
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 WINDOW (SORT)
3 2 WINDOW (SORT)
4 3 TABLE ACCESS (FULL) OF 'ANOTHER_THING'
Additional Flexibilityselect first_col, second_col , row_number() over (partition by first_col order by second_col) my_1st_rownum , row_number() over (partition by first_col order by second_col desc) my_2nd_rownum , row_number() over ( partition by first_col order by mod(second_col,10) asc nulls first ) my_3rd_rownumfrom another_thingorder by second_col;
Additional Flexibility
first col
second col my 1st rownum
my 2nd rownum
my 3rd rownum
B 34231 1 3 1
B 34093487 2 2 3
A 3.4897E+10 1 4 1
A 5.7865E+10 2 3 2
A 1.7823E+14 3 2 4
A 3.2486E+14 4 1 3
B 4.3330E+14 3 1 2
Additional Flexibility – Query Plan
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS
1 0 SORT (ORDER BY)
2 1 WINDOW (SORT)
3 2 WINDOW (SORT)
4 3 WINDOW (SORT)
5 4 TABLE ACCESS (FULL) OF 'ANOTHER_THING' (TABLE)
Objective 3 - To show additional flexibility of analytic expressions.
• Better Performance• Scales Better• Smaller Query (Less Lines of Code)• PARTITION BY• Multiple ORDER BY – Single Select• Multiple PARTITION – Single Select
Items Learned in this Session
• To get aggregate and detail data in the same query – without selecting the same table twice.
• To compare traditional ranking and analytic ranking and show why analytic ranking is better.
• To show additional flexibility of analytic expressions.
Questions?
Thank You
• Please fill out the evaluation.• Speaker: Mark Inman• Session Name: Analytic SQL for Beginners• Session Number: 213
Mark Inman