introduction to solving sql problems with match recognize v2€¦ · –resume pattern matching at...

52
Introduction to solving SQL problems with MATCH_RECOGNIZE Northern Technology SIG

Upload: others

Post on 19-May-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

IntroductiontosolvingSQLproblemswithMATCH_RECOGNIZE

Northern Technology SIG

About me… Keith Laker Senior Principal Product Management SQL and Data Warehousing

SQL enthusiast, marathon runner, mountain biker and coffee connoisseur

@ASQLBarista

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Agenda

WhatisMATCH_RECOGNIZE

UseCase1:sessionization

UseCase2:controllingstringconcatenation

UseCase3:formingcontiguousdateranges

Summary

1

2

3

4

5

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Lotsoftutorialsonhttp://livesql.oracle.com

WhatisMATCH_RECOGNIZE

5

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching

ProvidenativeSQLlanguageconstruct

Withintuitiveprocessing

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching

ProvidenativeSQLlanguageconstruct• NewSQLconstructMATCH_RECOGNIZE

– AddedaspartoftheANSI-2016SQLstandard

Withintuitiveprocessing

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching

ProvidenativeSQLlanguageconstruct• NewSQLconstructMATCH_RECOGNIZE

– AddedaspartoftheANSI-2016SQLstandard

Withintuitiveprocessing• Fourlogicalconcepts:

– Logicallypartitionandorderthedata– Definepatternusingregularexpressionandpatternvariables– Regularexpressionismatchedagainstasequenceofrows– Eachpatternvariableisdefinedusingconditionsonrowsandaggregates

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

SQLMATCH_RECOGNIZE“Declarative”patternmatching-4simplesteps

1. Definethepartitions/bucketsandorderingneededtoidentifythe‘streamofevents’youareanalyzing– Matching within a stream of events (ordered partition of data)

2. Definethepatternofeventsandpatternvariablesidentifyingtheindividualeventswithinthepattern– Use framework of Perl regular expressions (conditions on rows) – Define matching using Boolean conditions on rows

Current time - INTERVAL ’10’ second) >= previous time

9

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

SQLMATCH_RECOGNIZE“Declarative”patternmatching-4simplesteps

3. Definemeasures:sourcedatapoints,patterndatapointsandaggregatesrelatedtoapattern

• MEASURES . . . Session_id . . . Number of events . . . Start time. . . End time . . . Duration

4. Determinehowtheoutputwillbegenerated

10

UseCase1:Sessionization

11

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 12

UseCase1:Sessionization

New SQL construct: MATCH_RECOGNIZE Define patterns using regular expression syntax

Supports a wide range of use cases

Analyze online customer sessions by identifying each session within a series of clicks and then track user

activity that typically involves multiple events

Web Sessionization

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

StorelogfiledataasaJSONdocument

CREATE TABLE json_sessionization (session_doc CLOB, CONSTRAINT "VALID_JSON" CHECK (session_doc IS JSON) ENABLE

SELECT TO_NUMBER(j.session_doc.time_id) as time_id, j.session_doc.user_id as user_idFROM json_sessionization j;

TIME_ID USER ID

1 Mary2 Sam11 Mary12 Sam22 Sam23 Mary32 Sam34 Mary43 Sam44 Mary47 Sam48 Sam53 Mary59 Sam60 Sam63 Mary68 Sam

SourceDataSet:JSONKey-ValuePairsLogFile

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 14

UseCase1:Sessionization

Defineasessionasasequenceofoneormoreeventswithinthesamepartitionkeywheretheinter-timestampgapislessthanaspecifiedthreshold

TIME_ID USER ID

1 Mary2 Sam11 Mary12 Sam22 Sam23 Mary32 Sam34 Mary43 Sam44 Mary47 Sam48 Sam53 Mary59 Sam60 Sam63 Mary68 Sam

USER_IDSESSIO

N

ID

START

TIME END TIME

NUM EVENTS DURATION

Mary 1 1 11 2 10

Mary 2 23 23 1 0

Mary 3 34 63 4 29

TIME_ID USER ID SESSION

1 Mary 111 Mary 1

23 Mary 2

34 Mary 344 Mary 353 Mary 363 Mary 3

1. Number sessions per

user

2. Aggregate analysis to provide deeper

insight

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

SELECT * FROM . . . MATCH_RECOGNIZE ( . . . )

15

UseCase1:Sessionization

NewsyntaxfordiscoveringpatternsusingSQL:

MATCH_RECOGNIZE()

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 16

DefiningPARTITIONBYandORDERBYClauses

Finddistinctusersessionsinaweblog:

Step1:definepartitions/bucketsandorderingneededtoidentifythe“streamofevents”…

SetthePARTITIONBYandORDERBYclauses

SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id

. . . )

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 17

DefiningPatternStatement

Step2:definethepatternofeventsandpatternvariablesidentifyingtheindividualeventswithinthepattern

Definethepattern–identifyeach“session”

SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id

PATTERN (b s+)

. . . )

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 18

BuildRegularExpressions

• Concatenation: no operator • Quantifiers:

– * 0 or more matches – + 1 or more matches – ? 0 or 1 match – {n} exactly n matches – {n,} n or more matches – {n, m} between n and m (inclusive) matches – {, m} between 0 an m (inclusive) matches – Reluctant quantifier – an additional ?

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 19

DefinePatternVariables…

Definethepatternvariables–specifyeachvariablelistedinthepattern

asessionisasequenceofoneormoreeventswithinthesamepartitionkeywheretheinter-timestampgapislessthana10seconds

SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id

PATTERN (b s+) DEFINE s as (time_id – prev(time_id)) <=10 . . . )

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 20

ListingPatternMeasurestobeComputed

Step3:definethemeasures:sourcedatapoints,patterndatapointsandaggregatesrelatedtoapattern:

MATCH_NUMBER()

COUNT():numberofevents

FIRST:starttime

LAST:endtime

SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id MEASURES user_id, match_number() session_id, count(*) as no_of_events, first(b.time_id) start_time, last(s.time_id) end_time, last(s.time_id) - first(b.time_id) duration PATTERN (b s+) DEFINE s as (time_id - PREV(time_id)) <=10 . . . )

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 21

DefiningOutputStyle:Summaryvs.Detailed

Step4:determinehowtheoutputwillbegenerated

OutputONEROWforeachtimewefindamatchtoourpattern

SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id MEASURES user_id match_number() session_id, count(*) as no_of_events, first(time_id) start_time, last(s.time_id) end_time, last(time_id) - first(time_id) duration ONE ROW PER MATCH PATTERN (b s+) DEFINE s as (time_id - PREV(time_id)) <=10 . . . )

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PatternOutputOptionsControllingtheoutput

• Which rows to return– ONE ROW PER MATCH– ALL ROWS PER MATCH – ALL ROWS PER MATCH WITH UNMATCHED ROWS

• After match SKIP option :– SKIP PAST LAST ROW– SKIP TO NEXT ROW– SKIP TO <VARIABLE>– SKIP TO FIRST(<VARIABLE>)– SKIP TO LAST (<VARIABLE>)

22

UseCase2:ControllingStringConcatenation

24

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

EmployeeDataSetTablelistingemployeesineachDept.

DEPTNO NAMELIST 10 CLARK;KING10 MILLER20 SMITH;JONES20 SCOTT;ADAMS20 FORD30 ALLEN;WARD30 MARTIN;BLAKE30 TURNER;JAMES

TransformEMPtableto…

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PartitioningandOrderingthesourcedata

SELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

CreatePATTERNStatementandDEFINEPatternVariablesAddPATTERNstatementandDEFINEpatternvariablesSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno

PATTERN (s b*) DEFINE b AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ListMeasurestobeCalculatedUsebuilt-inmeasureMATCH_NUMBER()toreturnagroupingIDSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno

PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

DefineTypeofOutput:Detailedvs.SummaryReturndetailedreport–returnsonerowforeachsuccessfulmatchofpatternSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno ALL ROWS PER MATCH PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

DefineWhereToResumeSearchingUsingdefaultSKIPTO…behaviourtocontrolwheretostartsearchingfornextpatternSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno ALL ROWS PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

SKIPTO-basicsyntax• AFTER MATCH SKIP TO NEXT ROW

– Resumepatternmatchingattherowafterthefirstrowofthecurrentmatch.

• AFTER MATCH SKIP PAST LAST ROW [DEFAULT]– Resumepatternmatchingatthenextrowafterthelastrowofthecurrentmatch.

• AFTER MATCH SKIP TO FIRST pattern_variable– Resumepatternmatchingatthefirstrowthatismappedtothepatternvariable.

• AFTER MATCH SKIP TO LAST pattern_variable– Resumepatternmatchingatthelastrowthatismappedtothepatternvariable.

• AFTER MATCH SKIP TO pattern_variable – ThesameasAFTERMATCHSKIPTOLASTpattern_variable.

31

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

FinalOutputfromMATCH_RECOGNIZE

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

FinalOutputfromMATCH_RECOGNIZEOutputfromMATCH_NUMBERpartoffinalgroupingwithinLISTAGG

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ControllingStringConcatenationLISTAGG-returnslistofconcatenatedstringsarrangedasgroupswithineachDEPTNO

DEPTNO NAMELIST HOW_LONG10 CLARK;KING 1010 MILLER 620 SMITH;JONES 1120 SCOTT;ADAMS 1120 FORD 430 ALLEN;WARD 1030 MARTIN;BLAKE 1230 TURNER;JAMES 12

SELECT deptno, LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno) AS namelist,FROM emp_mr GROUP BY deptno, mno;

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ControllingStringConcatenationLISTAGG-returnslistofconcatenatedstringsarrangedasgroupswithineachDEPTNO

DEPTNO NAMELIST HOW_LONG10 CLARK;KING 1010 MILLER 620 SMITH;JONES 1120 SCOTT;ADAMS 1120 FORD 430 ALLEN;WARD 1030 MARTIN;BLAKE 1230 TURNER;JAMES 12

SELECT deptno, LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno) AS namelist, LENGTH(LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno)) AS how_long FROM emp_mr GROUP BY deptno, mno;

UseCase3:BuildingContiguousDateRanges

37

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ContiguousDateRanges

Returnresultsetshowing:

1) Startdateofcontiguousrange

2) Endofdatecontiguousrange

3) Numberofdaysincontiguousrange

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ContiguousDateRanges–SHSchema

AnalyzeSALESfacttableandcalculatefollowingforeachyear:

1) Startdateofcontiguousrangeofsales

2) Endofdateofcontiguousrangeofsales

3) Numberofdaysincontiguousrange

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

DefineSELECTstatementforsourcedata

SELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

PartitioningandOrderingthesourcedata

SELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

CreatePATTERNStatementandDEFINEPatternVariablesAddPATTERNstatement–includesALWAYSTRUEvariable-DEFINEpatternvariablesSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id

PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

ListMeasurestobeCalculatedUsenewfunctionsFIRSTandLASTtoreturnvaluesfromstartandendofpatternSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id MEASURES FIRST(strt.day_id) AS start_day, LAST(a.day_id) AS end_day, COUNT(day_id) AS count_day

PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

DefineTypeofOutput:Detailedvs.SummaryReturnsummaryreport–returnsonerowforeachsuccessfulmatchofpatternSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id MEASURES FIRST(strt.day_id) AS start_day, LAST(a.day_id) AS end_day, COUNT(day_id) AS count_day ONE ROW PER MATCH PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

FindingContiguousDateRanges…START_DAY END_DAY COUNT_DAY01-JAN-98 28-FEB-98 5902-MAR-98 06-MAR-98 508-MAR-98 01-APR-98 2504-APR-98 06-APR-98 308-APR-98 11-APR-98 413-APR-98 18-APR-98 622-APR-98 06-MAY-98 1508-MAY-98 12-MAY-98 514-MAY-98 24-MAY-98 1126-MAY-98 06-JUN-98 1208-JUN-98 11-JUN-98 413-JUN-98 18-JUN-98 6

Summary

47

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

TypicalPatternMatchingLOBUseCasesInputData Pattern Result

Sessionization Weblogs continuousclicksbysameuser Generatereportsonnumberofdistinctsessions,averagepageviewspersession,etc

Fraud Creditcardtransactions

twotransactionsindifferentlocationswithinashortperiodoftime

Findcasesinwhichacreditcardmayhavebeenusedfraudulentlysinceaphysicalpersoncannotbeintwoplacesatonce

In-gamepurchases

Gameslogs eventsleadinguptoanin-gamepurchase

Detectcommonsequencesofeventthatresultsinanin-gamepurchase

Fraud(mobiles) CDRlogs SIMcardbeingusedinmultiplehandsets

FlagindividualSIMcardsbeingusedbymultiplehandsetswithinaspecifiedtimeperiod

Stockmarketanalysis

Tickerlogs Trackpossiblefraudulentlinkedpatternsofbehavior

Trackknownpatternsofbehaviorsuchasheadandshoulders,triangles,channelsandwedges

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

TypicalPatternMatchingLOBUseCasesInputData Pattern Result

Auditing/Compliance

Applicationlogs

Analyzechangestosecurecustomerdata

Findinstanceswhereoperatorhasmadesuspectmodificationstosecureclientdata

Moneylaundering

Transactionlogs

Searchforsmalltransferswithinatimewindowfollowingbylargetransferwithin“x”daysoflastsmalltransfer

Detectsuspiciousmoneytransferpatternforanaccountandreportaccount,dateoffirstsmalltransfer,dateoflastlargetransfer

Callservicequality

CDRlogs Searchfordropped/reconnectedcalls

Identifyhowmanytimescallswererestartedinasession,totaleffectivecalldurationandtotalinterruptedduration

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

Summary–YouCanNow….

• ConstructaMATCH_RECOGNIZEstatement• Buildsearchcriteriausingpatternvariables• Organizeyourdatacorrectlytodiscoverthepattern• Controlthetypeofdatareturned:summaryvs.detailed• UnderstandthepowerandvalueofSQLpatternmatching

• GoanduseMATCH-RECOGNIZEtoyouradvantage!

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |

WhereToGetMoreInformation

• AnalyticalSQLHomePageonOTNwithlinksto:– Training+OracleByExample– Podcastsforpatternmatching–Whitepapers– SamplescriptsandsimpletutorialsforpatternmatchingonliveSQL

• DataWarehouseandSQLAnalyticsblog• http://oracle-big-data.blogspot.co.uk/

Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |