row pattern matching in oracle database 12c

35
“Row Pattern Matching” with Database 12c MATCH_RECOGNIZE Beating the Best Pre-12c Solutions Stew Ashton (stewashton.wordpress.com ) OUGN17 Can you read the following line? If not, please move closer. It's much better when you can read the code ;)

Upload: stew-ashton

Post on 16-Apr-2017

628 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Row Pattern Matching in Oracle Database 12c

“Row Pattern Matching” withDatabase 12c MATCH_RECOGNIZE

Beating the Best Pre-12c SolutionsStew Ashton (stewashton.wordpress.com)

OUGN17

Can you read the following line? If not, please move closer.

It's much better when you can read the code ;)

Page 2: Row Pattern Matching in Oracle Database 12c

2

Agenda• Who am I?• Pre-12c solutions compared to row pattern matching with

MATCH_RECOGNIZE– For all sizes of data– Thinking in patterns– Syntax only as needed

• Watch out for “catastrophic backtracking”• Other things to keep in mind (time permitting)

Page 3: Row Pattern Matching in Oracle Database 12c

3

Who am I?• 1967: first FORTRAN program• 1981-2015: software engineer / architect• 2005-present: focus on Oracle DB development– Contribute to asktom & OTN SQL forum– Presented at OOW, UKOUG tech, DOAG– 2015 DEVVY awards finalist.– Advocate of data-centric application architecture

Page 4: Row Pattern Matching in Oracle Database 12c

4

4 pre-12c Solutions

• Analytics1) Tabibitosan / Fixed Difference2) "Start of Group"

• Model3) Bin fitting (fixed size)4) Bin fitting (fixed number)

Page 5: Row Pattern Matching in Oracle Database 12c

5

1) “Fixed Difference” / TabibitosanPAG

E12356710111242

-1-2-3-4

= 0= 0= 0= 1

• Identify and group rows withconsecutive values

• My presentation: print best liked slides• Math: subtract known consecutives (1, 2,…)– Consecutive means A = B - 1 – If A = B - 1 (consecutive) then A - 1 = B – 2

Else (not consecutive) A - 1 <> B - 2– Consecutive becomes equality,

non-consecutive becomes inequality

Page 6: Row Pattern Matching in Oracle Database 12c

6

1) Pre-12cselect min(page) firstpage,max(page) lastpage,count(*) cntFROM ( SELECT page, page – Row_Number() over(order by page) as grp_id FROM t)GROUP BY grp_id;

PAGE [RN] GRP_ID1 - 1 02 - 2 03 - 3 05 - 4 16 - 5 17 - 6 110 - 7 311 - 8 312 - 9 342 -10 32

FIRSTPAGE

LASTPAGE CNT

1 3 35 7 3

10 12 342 42 1

PAGE GRP_ID1 02 03 05 16 17 110 311 312 342 32

Page 7: Row Pattern Matching in Oracle Database 12c

7

Pattern and Matching Rows• PATTERN

– Uninterrupted series of input rows– Described as list of conditions (≅ “regular expressions”)

PATTERN (A B*)"A" : 1 row, "B*" : 0 or more rows, as many as possible

• DEFINE (at least one) row condition[A undefined = TRUE]B AS page = PREV(page)+1

• Each series that matches the pattern is a “match”– "A" and "B" identify the rows that meet each condition– There can be unmatched rows between series

Page 8: Row Pattern Matching in Oracle Database 12c

8

Input, Processing, Output

1. Define input2. Order input3. Process pattern4. using defined conditions5. Output: rows per match6. Output: columns per row7. Go where after match?

SELECT *FROM tMATCH_RECOGNIZE ( ORDER BY page MEASURES A.page as firstpage, LAST(page) as lastpage, COUNT(*) cnt ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A B*) DEFINE B AS page = PREV(page)+1);

Page 9: Row Pattern Matching in Oracle Database 12c

9

1) Run_Stats comparison

For one million rows:

“Latches” are serialization devices: fewer means more scalable

Stat Pre 12c Match_R Pct

Latches 5047 4836 96%

Elapsed Time 1.1164 1.0528 94%

CPU used by this session 1.09 1.04 95%

Page 10: Row Pattern Matching in Oracle Database 12c

10

1) Execution Plans Operation Used-Mem SELECT STATEMENT   HASH GROUP BY 49M (0) VIEW   WINDOW SORT 20M (0) TABLE ACCESS FULL   Operation Used-Mem SELECT STATEMENT   VIEW  

MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 20M (0)

TABLE ACCESS FULL  

Page 11: Row Pattern Matching in Oracle Database 12c

11

2) “Start of Group”• Identify group boundaries, often using LAG()• 3 steps instead of 2:

1. For each row: if start of group, assign 1Else assign 0

2. Running total of 1s and 0s produces a group identifier

3. Group by the group identifier

Page 12: Row Pattern Matching in Oracle Database 12c

12

2) Requirement

GROUP_NAME START_TS END_TS

X2014-01-01

00:002014-02-01

00:00

X2014-03-01

00:002014-04-01

00:00

X2014-04-01

00:002014-05-01

00:00

X2014-06-01

00:002014-06-01

01:00

X2014-06-01

01:002014-06-01

02:00

X2014-06-01

02:002014-06-01

03:00

Y2014-06-01

03:002014-06-01

04:00

Y2014-06-01

04:002014-06-01

05:00

Y2014-07-03

08:002014-09-29

17:00

Merge contiguous date ranges in same group

Page 13: Row Pattern Matching in Oracle Database 12c

13

        1        2        2        3        3        3        1        1        2

X01-01 00:00

02-01 00:00 1

X03-01 00:00

04-01 00:00 1

X04-01 00:00

05-01 00:00 0

X06-01 00:00

06-01 01:00 1

X06-01 01:00

06-01 02:00 0

X06-01 02:00

06-01 03:00 0

Y06-01 03:00

06-01 04:00 1

Y06-01 04:00

06-01 05:00 0

Y07-03 08:00

09-29 17:00 1

X01-01 00:00

02-01 00:00

X03-01 00:00

05-01 00:00

X06-01 00:00

06-01 03:00

Y06-01 03:00

06-01 05:00

Y07-03 08:00

09-29 17:00

with grp_starts as ( select a.*, case when start_ts =

lag(end_ts) over( partition by group_name order by start_ts ) then 0 else 1 end grp_start from t a), grps as ( select b.*, sum(grp_start) over( partition by group_name order by start_ts ) grp_id from grp_starts b)

select group_name,min(start_ts) start_ts,max(end_ts) end_ts from grps group by group_name, grp_id;

Page 14: Row Pattern Matching in Oracle Database 12c

14

2) Match_RecognizeSELECT * FROM t MATCH_RECOGNIZE( PARTITION BY group_name ORDER BY start_ts MEASURES A.start_ts start_ts, end_ts end_ts PATTERN(A B*) DEFINE B AS start_ts = prev(end_ts) );

New this time:• Added PARTITION BY• ONE ROW PER MATCH

andSKIP PAST LAST ROWare the defaults

One solution replaces two methods: simple!

Page 15: Row Pattern Matching in Oracle Database 12c

15

2) Run_Stats comparison

For 500,000 rows:

Stat Pre 12c Match_R PctLatches 8108 7133 88%Elapsed Time 1.7551 0.9765 56%CPU used by this session 1.73 0.96 55%

Page 16: Row Pattern Matching in Oracle Database 12c

16

Operation Used-Mem SELECT STATEMENT HASH GROUP BY 22M (0) VIEW WINDOW BUFFER 32M (0) VIEW WINDOW SORT 27M (0) TABLE ACCESS FULL Operation Used-Mem SELECT STATEMENT VIEW MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTOMATON 27M (0) TABLE ACCESS FULL

2) Execution Plans

Page 17: Row Pattern Matching in Oracle Database 12c

17

2) Matching within a groupSELECT * FROM (

SELECT * from t WHERE group_name = 'X' )MATCH_RECOGNIZE … );

Filter before MATCH_RECOGNIZE to avoid extra work

Page 18: Row Pattern Matching in Oracle Database 12c

18

2) Predicate pushingSelect * from <view> where group_name = 'X'

Operation Name A-Rows Buffers

SELECT STATEMENT   3 4 VIEW   3 4 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO

  3 4

TABLE ACCESS BY INDEX ROWID BATCHED T 6 4

INDEX RANGE SCAN TI 6 3

Page 19: Row Pattern Matching in Oracle Database 12c

19

3) “Bin fitting”: fixed size• Requirement– Order by study_site– Put in “bins” with size =

65,000 max

STUDY_SITE CNT   STUDY_SITE CNT

1001 3407   1026 1371002 4323   1028 60051004 1623   1029 761008 1991   1031 45991011 885   1032 1989101211597   1034 34271014 1989   1036 8791015 5282   1038 64851017 2841   1039 31018 5183   1040 11051020 6176   1041 64601022 2784   1042 968102325865   1044 4711024 3734   1045 3360

FIRST_SITE

LAST_SITE SUM_CNT

1001 1022 480811023 1044 622031045 1045 3360

Page 20: Row Pattern Matching in Oracle Database 12c

20

SELECT s first_site, MAX(e) last_site, MAX(sm) sum_cnt FROM ( SELECT s, e, cnt, sm FROM t MODEL MEASURES (study_site s, study_site e, cnt, cnt sm) RULES ( sm[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN cnt[cv()] ELSE sm[cv() - 1] + cnt[cv()] END, s[ > 1] = CASE WHEN sm[cv() - 1] + cnt[cv()] > 65000 OR cnt[cv()] > 65000 THEN s[cv()] ELSE s[cv() - 1] END ))GROUP BY s;

• DIMENSION with row_numberorders data and processing

• rn can be used like a subscript• cv() means current row• cv()-1 means previous row

DIMENSION BY (row_number() over(order by study_site) rn)

rn [cv() – 1] [cv()] [cv()] [cv()] [cv() – 1] [cv()] rn [cv() - 1] [cv()] [cv()] [cv()] [cv() – 1]

Page 21: Row Pattern Matching in Oracle Database 12c

21

SELECT * FROM tMATCH_RECOGNIZE ( ORDER BY study_site MEASURES FIRST(study_site) first_site, LAST(study_site) last_site, SUM(cnt) sum_cnt PATTERN (A+) DEFINE A AS SUM(cnt) <= 65000);

New this time:• PATTERN

(A+) replaces (A B*)means 1 or more rows

• Why? In previous examples I used PREV(), which returns NULL on the first row.

One solution replaces 3 methods: simpler!

Page 22: Row Pattern Matching in Oracle Database 12c

22

3) Run_Stats comparison

For 150000 rows:

Stat Pre 12c Match_R PctLatches 3501 1312 37%Elapsed Time 2.4036 0.1028 4%CPU used by this session 2.39 0 0%

Page 23: Row Pattern Matching in Oracle Database 12c

23

Id Operation Used-Mem 0 SELECT STATEMENT  1 HASH GROUP BY 1481K (0) 2 VIEW  3 SQL MODEL ORDERED 17M (0) 4 WINDOW SORT 4330K (0) 5 TABLE ACCESS FULL  

Id Operation Used-Mem 0 SELECT STATEMENT  1 VIEW  

2 MATCH RECOGNIZE SORT DETERMINISTIC FINITE AUTO 4330K(0)

3 TABLE ACCESS FULL  

3) Execution Plans

Page 24: Row Pattern Matching in Oracle Database 12c

24

Name Val Val BIN1 BIN2 BIN31 1 10 10    2 2 9 10 9  3 3 8 10 9 84 4 7 10 9 155 5 6 10 15 156 6 5 15 15 157 7 4 19 15 158 8 3 19 18 159 9 2 19 18 1710 10 1 19 18 18

4) “Bin fitting”: fixed number• Requirement– Distribute values in 3

parts as equally as possible

• “Best fit decreasing”– Sort values in

decreasing order– Put each value in least

full “bin”

Page 25: Row Pattern Matching in Oracle Database 12c

25

4) Brilliant pre 12c solutionSELECT bin, Max (bin_value) bin_valueFROM ( SELECT * FROM items MODEL DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn) MEASURES ( item_name, item_value, Row_Number() OVER (ORDER BY item_value DESC) bin, item_value bin_value, Row_Number() OVER (ORDER BY item_value DESC) rn_m, 0 min_bin,

Count(*) OVER () - 3 - 1 n_iters ) RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) ( min_bin[1] = Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn<= 3], bin[ITERATION_NUMBER + 3 + 1] = min_bin[1], bin_value[min_bin[1]] = bin_value[CV()] + Nvl(item_value[ITERATION_NUMBER+4], 0)))WHERE item_name IS NOT NULLgroup by bin;

Page 26: Row Pattern Matching in Oracle Database 12c

26

SELECT * from itemsMATCH_RECOGNIZE ( ORDER BY item_value desc MEASURES sum(bin1.item_value) bin1, sum(bin2.item_value) bin2, sum(bin3.item_value) bin3 PATTERN ((bin1|bin2|bin3)+) DEFINE bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value));

• ()+ = 1 or more of whatever is inside

• '|' = alternatives, “preferred in the order specified”

• Bin1 condition:• No rows here yet,• Or this bin least full

• Bin2 condition• No rows here yet, or• This bin less full than 3

PATTERN ((bin1|bin2|bin3)+) bin1 AS count(bin1.*) = 1 OR sum(bin1.item_value)-bin1.item_value <= least( sum(bin2.item_value), sum(bin3.item_value) ), bin2 AS count(bin2.*) = 1 OR sum(bin2.item_value)-bin2.item_value <= sum(bin3.item_value)

Page 27: Row Pattern Matching in Oracle Database 12c

27

4) Run_Stats comparison

For 20,000 rows:

Stat Pre 12c Match_R Pct

Latches 99165 794 .80%

Elapsed Time 58.3251 0.028 .05%

CPU used by this session 56.75 0 .00

%

Page 28: Row Pattern Matching in Oracle Database 12c

28

Backtracking• What happens when there is no match???• “Greedy” quantifiers * + {2,}– are not that greedy– Take all the rows they can, BUT

give rows back if necessary – one at a time• Regular expression engines will test all possible

combinations to find a match

Page 29: Row Pattern Matching in Oracle Database 12c

29

123456789AABABBBABBBBABBBBBABBBBBB

ABBBBBBB

ABBBBBBBCABBBBBBCABBBBBCABBBBCABBBCABBCABCAC

Backtracking in action:

pattern(a b* c)1. Find A2. Find all the Bs you can3. At the end, look for a C4. No C? Backtrack through the Bs5. Still no C? No Match!

Page 30: Row Pattern Matching in Oracle Database 12c

30

Repeating conditionsselect 'match' from ( select level n from dual connect by level <= 100)match_recognize( pattern(a b* c) define b as n > prev(n) , c as n = 0);

Runs in 0.006 secs

select 'match' from ( select level n from dual connect by level <= 100)match_recognize( pattern(a b* b* b* c) define b as n > prev(n) , c as n = 0);

Runs in 2.934 secs

Page 31: Row Pattern Matching in Oracle Database 12c

31

SELECT * FROM TickerMATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price), STRT AS price >= nvl(PREV(PRICE),0));Runs in 0.014 seconds

Imprecise ConditionsCREATE TABLE Ticker ( SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

insert into tickerselect 'ACME',sysdate + level/24/60/60,10000-levelfrom dualconnect by level <= 5000;

SELECT * FROM TickerMATCH_RECOGNIZE ( PARTITION BY symbol ORDER BY tstamp MEASURES FIRST(tstamp) AS start_tstamp, LAST(tstamp) AS end_tstamp AFTER MATCH SKIP TO LAST UP PATTERN (STRT DOWN+ UP+ DOWN+ UP+) DEFINE DOWN AS price < PREV(price), UP AS price > PREV(price));Runs in 13 secondsINMEMORY: 7 seconds

Page 32: Row Pattern Matching in Oracle Database 12c

32

Keep in Mind• Backtracking

– Precise conditions– Test data with no matches

• To debug:

Measures classifier() cl, match_number() mnAll rows per match with unmatched rows

• No DISTINCT, no LISTAGG• MEASURES columns must

have aliases• “Reluctant quantifier” = ? =

JDBC bind variable(Fix in 12.2 JDBC driver !)

• “Pattern variables” are identifiers, not bind variables

Page 33: Row Pattern Matching in Oracle Database 12c

33

Questions?

The really good stuff is tomorrow!"Meet your Match", 10:30, Bundestag 2

1) consecutive values

2) based on starting point

3) Bin fitting:fixed size

4) Bin fitting:fixed number

Pre 12c Analytic function,Group by

2 analytics, Group by

MODEL clause using “subscript”

Procedural MODEL clause

Match_Recognize

C = PREV(C)+1 C1=PREV(C2) SUM(C1) <= X 3 conditions

Elapsed 94% 56% 4% 0.05%

Page 34: Row Pattern Matching in Oracle Database 12c

34

Which row do we mean? Expression DEFINE MEASURES

ALL ROWS… ONE ROW…FIRST(start_ts) First row of matchstart_ts current row last row of matchLAST(end_ts) current row last row of matchFINAL LAST(end_ts) ORA-62509 last row of match

B.start_ts most recent B row last B rowPREV(), NEXT() Physical offset from referenced row

COUNT(*) from first to current row all rows in match

COUNT(B.*) B rows including current row all B rows

Page 35: Row Pattern Matching in Oracle Database 12c

35

Output Row “shape”

Per Match PARTITION BY ORDER BY MEASURES Other input

ONE ROW X Omitted X omitted

ALL ROWS X X X X

ORA-00918, anyone?