advanced row pattern matching

Post on 19-Jan-2017

290 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Meet Your MatchAdvanced row pattern matching (12c)

Stew Ashton (stewashton.wordpress.com)UKOUG Tech 2016

Can you read the following line? If not, please move closer.

It's much better when you can read the code ;)

2

Advanced usage, not all the syntax• Reminder of the basics• Warmup exercises• Bin fitting• Positive and negative sequencing• Hierarchical summaries• Alternatives to joining

3

Who am I?• 36 years in IT

– Developer, Technical Sales Engineer, Technical Architect– Aeronautics, IBM, Finance– Mainframe, client-server, Web apps

• 12 years using Oracle database– SQL performance analysis– Replace Java with SQL

• 4 years as in-house “Oracle Development Expert”• Conference speaker since 2014• Currently independent

4

Questions

5

6

Reminder: the Basics

• To illustrate: table with PAGE column– Group consecutive pages together

PAGE1235

FIRSTPAGE LASTPAGE CNT1 3 35 5 1

7

Pattern and Matching Rows• PATTERN

– Uninterrupted series of input rows– Described as list of conditions (≅ “regular expressions”)

PATTERN (A B*)"A" : 1 row, "B*" : 0 or more rows, as many as possible

• DEFINE (at least one) row condition[A undefined = TRUE]B AS page = PREV(page)+1

• Each series that matches the pattern is a “match”– "A" and "B" identify the rows that meet their conditions– There can be unmatched rows between series

8

Input, Processing, Output

1. Define input2. Order input3. Process pattern4. using defined conditions5. Output: rows per match6. Output: columns per row7. Go where after match?

SELECT *FROM tMATCH_RECOGNIZE ( ORDER BY page MEASURES A.page as firstpage, LAST(page) as lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1);

9

pg idDEFINE ALL ROWS PER MATCH ONE ROW PER MATCH

first Current

last first Curren

t last Final last first Current last Final

last

1 A 1 1 1 1 1 1 3

2 B 1 2 2 1 2 2 3

3 B 1 3 3 1 3 3 3 1 3 3 3

5 B? 1 5 5

Which row do we mean?Column name by itself = « current » row• DEFINE: row being evaluated ; ALL ROWS: each row ; ONE ROW: last row

10

Warming up: What output from this?CUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( all rows per match pattern (a*) define a as 1=1);

11

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( all rows per match pattern (a*) define a as 1=1);

12

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize(

all rows per match pattern (a*) define a as 1=1);

13

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR

C001 2016-01-01 InquiryC001 2016-01-01 InquiryC001 2016-01-10 SalesC001 2016-01-21 Repeat InquiryC001 2016-02-10 Repeat InquiryC001 2016-05-01 SalesC001 2016-05-06 SalesC001 2016-06-10 Inquiry 1C001 2016-09-01 Inquiry 2C002 2016-02-01 Inquiry 1C002 2016-02-25 Inquiry 2C003 2016-02-01 Inquiry 2C003 2016-02-10 SalesC003 2016-02-10 SalesC003 2016-03-10 Inquiry 2C004 2016-04-15 Sales

select * from t match_recognize( partition by cust_id order by tx_date, descr all rows per match pattern (a*) define a as );

select * from t match_recognize( partition by cust_id order by tx_date, descr all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

14

Add sequence number, starting over after 40 daysCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

15

Sequence starts from First Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a*) define a as tx_date <= first(tx_date) + 40);

16

Sequence starts from First Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (a *) define

a as tx_date <= first(tx_date) + 40);

17

Sequence starts from Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 1C001 2016-01-01 Inquiry 2C001 2016-01-10 Sales 3C001 2016-01-21 Repeat Inquiry 4C001 2016-02-10 Repeat Inquiry 5C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 1C002 2016-02-01 Inquiry 1 1C002 2016-02-25 Inquiry 2 2C003 2016-02-01 Inquiry 2 1C003 2016-02-10 Sales 2C003 2016-02-10 Sales 3C003 2016-03-10 Inquiry 2 4C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define more_tx as tx_date <= + 40);

- count(inq.*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

18

Sequence starts from Sale, Inquiry outside 40 days = 0CUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

19

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

20

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures

count(*) - count(inq.*) as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

21

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry 0C001 2016-01-01 Inquiry 0C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 0C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures case when classifier() = 'INQ' and tx_date >= final first(sale1.tx_date) - 10 then count(inq.*) - final count(inq.*) - 1 else count(*) - count(inq.*) end as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

22

Negative sequence for Inquiries within 10 days prior to SaleCUST_ID TX_DATE DESCR SEQ

C001 2016-01-01 Inquiry -2C001 2016-01-01 Inquiry -1C001 2016-01-10 Sales 1C001 2016-01-21 Repeat Inquiry 2C001 2016-02-10 Repeat Inquiry 3C001 2016-05-01 Sales 1C001 2016-05-06 Sales 2C001 2016-06-10 Inquiry 1 3C001 2016-09-01 Inquiry 2 0C002 2016-02-01 Inquiry 1 0C002 2016-02-25 Inquiry 2 0C003 2016-02-01 Inquiry 2 -1C003 2016-02-10 Sales 1C003 2016-02-10 Sales 2C003 2016-03-10 Inquiry 2 3C004 2016-04-15 Sales 1

select * from t match_recognize( partition by cust_id order by tx_date, descr measures case when classifier() = 'INQ' and tx_date >= final first(sale1.tx_date) - 10 then count(inq.*) - final count(inq.*) - 1 else count(*) - count(inq.*) end as seq all rows per match pattern (inq* sale1{0,1} more_tx*) define inq as descr != 'Sales', sale1 as descr = 'Sales', more_tx as tx_date <= sale1.tx_date + 40);

23

24

Hierarchical Summary: get salaries of mgr + subordinates

select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno;

LVL ENAME SAL1 KING 50002 JONES 29753 SCOTT 30004 ADAMS 11003 FORD 30004 SMITH 8002 BLAKE 28503 ALLEN 16003 WARD 12503 MARTIN 12503 TURNER 15003 JAMES 9502 CLARK 24503 MILLER 1300

>2

25

Hierarchical Summary: get salaries of mgr + subordinatesselect * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal pattern(a b*) define b as lvl > a.lvl);

LVL ENAME SAL1 KING 50002 JONES 29753 SCOTT 30004 ADAMS 11003 FORD 30004 SMITH 8002 BLAKE 28503 ALLEN 16003 WARD 12503 MARTIN 12503 TURNER 15003 JAMES 9502 CLARK 24503 MILLER 1300

26

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal pattern(a b*) define b as lvl > a.lvl);

27

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip past last row pattern(a b*) define b as lvl > a.lvl);

28

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 29025

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip to next row pattern(a b*) define b as lvl > a.lvl);

29

Hierarchical Summary: get salaries of mgr + subordinatesLVL ENAME SAL SUM_SAL1 KING 5000 290252 JONES 2975 108753 SCOTT 3000 41004 ADAMS 1100 11003 FORD 3000 38004 SMITH 800 8002 BLAKE 2850 94003 ALLEN 1600 16003 WARD 1250 12503 MARTIN 1250 12503 TURNER 1500 15003 JAMES 950 9502 CLARK 2450 37503 MILLER 1300 1300

select * from ( select level lvl, ename, sal from scott.emp start with mgr is null connect by mgr = prior empno)match_recognize( measures a.lvl lvl, a.ename ename, a.sal sal, sum(sal) as sum_sal after match skip to next row pattern(a b*) define b as lvl > a.lvl);

http://www.kibeha.dk/2015/07/row-pattern-matching-nested-within.html

30

Anchors and Alternation• Anchors

– ^ matches the position before the first row in the partition.– $ matches the position after the last row in the partition

PATTERN(^ A $) = partition must have 1 row

• Alternation: | means OR– "Alternatives are preferred in the order they are specified."

PATTERN ( A | B ) =If A condition is true then A, else if B condition is true then B

31

JOIN alternative: CDC comparePK VAL

1Same value2Delete this3Old value

PK VAL1Same value3New value4Insert this

T1 T2 select pk, op, val, oldrid from ( select pk, val, rowid rid from t1 union all select pk, val, null from t2)match_recognize( partition by pk order by rid measures classifier() op, first(rid) oldrid all rows per match pattern(^ D $ | ^ I $ | (^ O U $) ) define D as rid is not null, U as decode(O.val, val, 0, 1) = 1);

PK OP VAL OLDRID2D Delete this AAAkdlAAH…MAAB3O Old value AAAkdlAAH…MAAC3U New value AAAkdlAAH…MAAC4I Insert this

32

(Almost) All Rows per Match

• PATTERN ( A {- B A -} B)– The parts of the pattern enclosed

between {- and -} are excluded from the output.– Here only two rows per match will be returned– More granular than using a WHERE clause

33

Avoid Inequality joins> create table t(dte not null) asselect sysdate + levelfrom dual connect by level <= 10000;

> create table u(start_dte, end_dte) asselect dte, dte+1/4 from t;

> select count(*) from t, uwhere t.dte = u.start_dte;

Elapsed: 00:00:00.039

> Select count(*) from t, uwhere t.dte between u.start_dte and u.end_dte;

Elapsed: 00:00:09.132

Exadata?

All data in buffer cache

Elapsed: 00:00:09.132

InMemory?Elapsed: 00:00:07.021

34

Avoid Inequality joins---------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 42 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 42 | |* 2 | HASH JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 42 | | 3 | TABLE ACCESS FULL | T | 1 | 10000 | 10000 |00:00:00.01 | 21 | | 4 | TABLE ACCESS FULL | U | 1 | 10000 | 10000 |00:00:00.01 | 21 | ---------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:14.02 | 42 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:14.02 | 42 | | 2 | MERGE JOIN | | 1 | 2500K| 10000 |00:00:13.93 | 42 | | 3 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 21 | | 4 | TABLE ACCESS FULL | T | 1 | 10000 | 10000 |00:00:00.01 | 21 | |* 5 | FILTER | | 10000 | | 10000 |00:00:14.00 | 21 | |* 6 | SORT JOIN | | 10000 | 10000 | 50M|00:00:54.86 | 21 | | 7 | TABLE ACCESS FULL| U | 1 | 10000 | 10000 |00:00:00.01 | 21 | ----------------------------------------------------------------------------------------

35

Avoid Inequality joinsselect count(*) from ( select start_dte, end_dte from u union all select dte, null from t) match_recognize ( order by start_dte, end_dte all rows per match pattern({-u-} t+) define u as end_dte is not null, t as start_dte < u.end_dte);

Elapsed: 00:00:00.037

-- Works because no overlaps in U range

36

Child's

play

37

Solving Problems with pattern matching

• Clear knowledge of input & requirement– Beware of assumptions

• Identify typical problems and solutions– Consecutive sequences– "Start of Group"– Bin fitting– Ranges

(see "Ranges, ranges everywhere!" Tomorrow)• Visualize the data processing flow

– Intermediate results helpful / required?

Meet Your MatchAdvanced row pattern matching (12c)

@StewAshton (stewashton.wordpress.com)UKOUG Tech 2016

top related