introduction to solving sql problems with match recognize v2€¦ · –resume pattern matching at...
TRANSCRIPT
About me… Keith Laker Senior Principal Product Management SQL and Data Warehousing
SQL enthusiast, marathon runner, mountain biker and coffee connoisseur
@ASQLBarista
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Agenda
WhatisMATCH_RECOGNIZE
UseCase1:sessionization
UseCase2:controllingstringconcatenation
UseCase3:formingcontiguousdateranges
Summary
1
2
3
4
5
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Lotsoftutorialsonhttp://livesql.oracle.com
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching
ProvidenativeSQLlanguageconstruct
Withintuitiveprocessing
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching
ProvidenativeSQLlanguageconstruct• NewSQLconstructMATCH_RECOGNIZE
– AddedaspartoftheANSI-2016SQLstandard
Withintuitiveprocessing
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PatternRecognitionInSequencesofRowsSQL-anewlanguageforpatternmatching
ProvidenativeSQLlanguageconstruct• NewSQLconstructMATCH_RECOGNIZE
– AddedaspartoftheANSI-2016SQLstandard
Withintuitiveprocessing• Fourlogicalconcepts:
– Logicallypartitionandorderthedata– Definepatternusingregularexpressionandpatternvariables– Regularexpressionismatchedagainstasequenceofrows– Eachpatternvariableisdefinedusingconditionsonrowsandaggregates
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
SQLMATCH_RECOGNIZE“Declarative”patternmatching-4simplesteps
1. Definethepartitions/bucketsandorderingneededtoidentifythe‘streamofevents’youareanalyzing– Matching within a stream of events (ordered partition of data)
2. Definethepatternofeventsandpatternvariablesidentifyingtheindividualeventswithinthepattern– Use framework of Perl regular expressions (conditions on rows) – Define matching using Boolean conditions on rows
Current time - INTERVAL ’10’ second) >= previous time
9
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
SQLMATCH_RECOGNIZE“Declarative”patternmatching-4simplesteps
3. Definemeasures:sourcedatapoints,patterndatapointsandaggregatesrelatedtoapattern
• MEASURES . . . Session_id . . . Number of events . . . Start time. . . End time . . . Duration
4. Determinehowtheoutputwillbegenerated
10
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 12
UseCase1:Sessionization
New SQL construct: MATCH_RECOGNIZE Define patterns using regular expression syntax
Supports a wide range of use cases
Analyze online customer sessions by identifying each session within a series of clicks and then track user
activity that typically involves multiple events
Web Sessionization
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
StorelogfiledataasaJSONdocument
CREATE TABLE json_sessionization (session_doc CLOB, CONSTRAINT "VALID_JSON" CHECK (session_doc IS JSON) ENABLE
SELECT TO_NUMBER(j.session_doc.time_id) as time_id, j.session_doc.user_id as user_idFROM json_sessionization j;
TIME_ID USER ID
1 Mary2 Sam11 Mary12 Sam22 Sam23 Mary32 Sam34 Mary43 Sam44 Mary47 Sam48 Sam53 Mary59 Sam60 Sam63 Mary68 Sam
SourceDataSet:JSONKey-ValuePairsLogFile
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 14
UseCase1:Sessionization
Defineasessionasasequenceofoneormoreeventswithinthesamepartitionkeywheretheinter-timestampgapislessthanaspecifiedthreshold
TIME_ID USER ID
1 Mary2 Sam11 Mary12 Sam22 Sam23 Mary32 Sam34 Mary43 Sam44 Mary47 Sam48 Sam53 Mary59 Sam60 Sam63 Mary68 Sam
USER_IDSESSIO
N
ID
START
TIME END TIME
NUM EVENTS DURATION
Mary 1 1 11 2 10
Mary 2 23 23 1 0
Mary 3 34 63 4 29
TIME_ID USER ID SESSION
1 Mary 111 Mary 1
23 Mary 2
34 Mary 344 Mary 353 Mary 363 Mary 3
1. Number sessions per
user
2. Aggregate analysis to provide deeper
insight
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
SELECT * FROM . . . MATCH_RECOGNIZE ( . . . )
15
UseCase1:Sessionization
NewsyntaxfordiscoveringpatternsusingSQL:
MATCH_RECOGNIZE()
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 16
DefiningPARTITIONBYandORDERBYClauses
Finddistinctusersessionsinaweblog:
Step1:definepartitions/bucketsandorderingneededtoidentifythe“streamofevents”…
SetthePARTITIONBYandORDERBYclauses
SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id
. . . )
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 17
DefiningPatternStatement
Step2:definethepatternofeventsandpatternvariablesidentifyingtheindividualeventswithinthepattern
Definethepattern–identifyeach“session”
SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id
PATTERN (b s+)
. . . )
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 18
BuildRegularExpressions
• Concatenation: no operator • Quantifiers:
– * 0 or more matches – + 1 or more matches – ? 0 or 1 match – {n} exactly n matches – {n,} n or more matches – {n, m} between n and m (inclusive) matches – {, m} between 0 an m (inclusive) matches – Reluctant quantifier – an additional ?
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 19
DefinePatternVariables…
Definethepatternvariables–specifyeachvariablelistedinthepattern
asessionisasequenceofoneormoreeventswithinthesamepartitionkeywheretheinter-timestampgapislessthana10seconds
SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id
PATTERN (b s+) DEFINE s as (time_id – prev(time_id)) <=10 . . . )
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 20
ListingPatternMeasurestobeComputed
Step3:definethemeasures:sourcedatapoints,patterndatapointsandaggregatesrelatedtoapattern:
MATCH_NUMBER()
COUNT():numberofevents
FIRST:starttime
LAST:endtime
SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id MEASURES user_id, match_number() session_id, count(*) as no_of_events, first(b.time_id) start_time, last(s.time_id) end_time, last(s.time_id) - first(b.time_id) duration PATTERN (b s+) DEFINE s as (time_id - PREV(time_id)) <=10 . . . )
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 21
DefiningOutputStyle:Summaryvs.Detailed
Step4:determinehowtheoutputwillbegenerated
OutputONEROWforeachtimewefindamatchtoourpattern
SELECT * FROM . . . MATCH_RECOGNIZE ( PARTITION BY user_id ORDER BY time_id MEASURES user_id match_number() session_id, count(*) as no_of_events, first(time_id) start_time, last(s.time_id) end_time, last(time_id) - first(time_id) duration ONE ROW PER MATCH PATTERN (b s+) DEFINE s as (time_id - PREV(time_id)) <=10 . . . )
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PatternOutputOptionsControllingtheoutput
• Which rows to return– ONE ROW PER MATCH– ALL ROWS PER MATCH – ALL ROWS PER MATCH WITH UNMATCHED ROWS
• After match SKIP option :– SKIP PAST LAST ROW– SKIP TO NEXT ROW– SKIP TO <VARIABLE>– SKIP TO FIRST(<VARIABLE>)– SKIP TO LAST (<VARIABLE>)
22
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 23
Livedemonstration–SessionizationTutorial
https://livesql.oracle.com/apex/livesql/file/tutorial_EWB8G5JBSHAGM9FB2GL4V5CAQ.html
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
EmployeeDataSetTablelistingemployeesineachDept.
DEPTNO NAMELIST 10 CLARK;KING10 MILLER20 SMITH;JONES20 SCOTT;ADAMS20 FORD30 ALLEN;WARD30 MARTIN;BLAKE30 TURNER;JAMES
TransformEMPtableto…
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PartitioningandOrderingthesourcedata
SELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
CreatePATTERNStatementandDEFINEPatternVariablesAddPATTERNstatementandDEFINEpatternvariablesSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno
PATTERN (s b*) DEFINE b AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ListMeasurestobeCalculatedUsebuilt-inmeasureMATCH_NUMBER()toreturnagroupingIDSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno
PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
DefineTypeofOutput:Detailedvs.SummaryReturndetailedreport–returnsonerowforeachsuccessfulmatchofpatternSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno ALL ROWS PER MATCH PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
DefineWhereToResumeSearchingUsingdefaultSKIPTO…behaviourtocontrolwheretostartsearchingfornextpatternSELECT * FROM scott.emp MATCH_RECOGNIZE( PARTITION BY deptno ORDER BY empno MEASURES match_number() AS mno ALL ROWS PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (S B*) DEFINE B AS LENGTHB(S.ename) + SUM(LENGTHB(CONCAT(B.ename, ';'))) + LENGTHB(‘;’) < = 15
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
SKIPTO-basicsyntax• AFTER MATCH SKIP TO NEXT ROW
– Resumepatternmatchingattherowafterthefirstrowofthecurrentmatch.
• AFTER MATCH SKIP PAST LAST ROW [DEFAULT]– Resumepatternmatchingatthenextrowafterthelastrowofthecurrentmatch.
• AFTER MATCH SKIP TO FIRST pattern_variable– Resumepatternmatchingatthefirstrowthatismappedtothepatternvariable.
• AFTER MATCH SKIP TO LAST pattern_variable– Resumepatternmatchingatthelastrowthatismappedtothepatternvariable.
• AFTER MATCH SKIP TO pattern_variable – ThesameasAFTERMATCHSKIPTOLASTpattern_variable.
31
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
FinalOutputfromMATCH_RECOGNIZE
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
FinalOutputfromMATCH_RECOGNIZEOutputfromMATCH_NUMBERpartoffinalgroupingwithinLISTAGG
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ControllingStringConcatenationLISTAGG-returnslistofconcatenatedstringsarrangedasgroupswithineachDEPTNO
DEPTNO NAMELIST HOW_LONG10 CLARK;KING 1010 MILLER 620 SMITH;JONES 1120 SCOTT;ADAMS 1120 FORD 430 ALLEN;WARD 1030 MARTIN;BLAKE 1230 TURNER;JAMES 12
SELECT deptno, LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno) AS namelist,FROM emp_mr GROUP BY deptno, mno;
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ControllingStringConcatenationLISTAGG-returnslistofconcatenatedstringsarrangedasgroupswithineachDEPTNO
DEPTNO NAMELIST HOW_LONG10 CLARK;KING 1010 MILLER 620 SMITH;JONES 1120 SCOTT;ADAMS 1120 FORD 430 ALLEN;WARD 1030 MARTIN;BLAKE 1230 TURNER;JAMES 12
SELECT deptno, LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno) AS namelist, LENGTH(LISTAGG(ename, ';') WITHIN GROUP (ORDER BY empno)) AS how_long FROM emp_mr GROUP BY deptno, mno;
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 36
LiveDemonstration–MATCH_RECOGNIZEandLISTAGG
https://livesql.oracle.com/apex/livesql/file/tutorial_EWCF5RFYTP2OAXEFXI4IXKHOC.html
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ContiguousDateRanges
Returnresultsetshowing:
1) Startdateofcontiguousrange
2) Endofdatecontiguousrange
3) Numberofdaysincontiguousrange
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ContiguousDateRanges–SHSchema
AnalyzeSALESfacttableandcalculatefollowingforeachyear:
1) Startdateofcontiguousrangeofsales
2) Endofdateofcontiguousrangeofsales
3) Numberofdaysincontiguousrange
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
DefineSELECTstatementforsourcedata
SELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
PartitioningandOrderingthesourcedata
SELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
CreatePATTERNStatementandDEFINEPatternVariablesAddPATTERNstatement–includesALWAYSTRUEvariable-DEFINEpatternvariablesSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id
PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
ListMeasurestobeCalculatedUsenewfunctionsFIRSTandLASTtoreturnvaluesfromstartandendofpatternSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id MEASURES FIRST(strt.day_id) AS start_day, LAST(a.day_id) AS end_day, COUNT(day_id) AS count_day
PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
DefineTypeofOutput:Detailedvs.SummaryReturnsummaryreport–returnsonerowforeachsuccessfulmatchofpatternSELECT start_day, end_day, count_dayFROM (SELECT DISTINCT s.time_id AS day_id, t.calendar_year AS cal_yrFROM sh.sales s, sh.times tWHERE channel_id = 4 AND t.time_id= s.time_id)MATCH_RECOGNIZE( PARTITION BY cal_yr ORDER BY day_id MEASURES FIRST(strt.day_id) AS start_day, LAST(a.day_id) AS end_day, COUNT(day_id) AS count_day ONE ROW PER MATCH PATTERN (strt a+) DEFINE a AS day_id = PREV(day_id)+1);
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
FindingContiguousDateRanges…START_DAY END_DAY COUNT_DAY01-JAN-98 28-FEB-98 5902-MAR-98 06-MAR-98 508-MAR-98 01-APR-98 2504-APR-98 06-APR-98 308-APR-98 11-APR-98 413-APR-98 18-APR-98 622-APR-98 06-MAY-98 1508-MAY-98 12-MAY-98 514-MAY-98 24-MAY-98 1126-MAY-98 06-JUN-98 1208-JUN-98 11-JUN-98 413-JUN-98 18-JUN-98 6
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. | 46
Livedemonstration–FindingContiguousDateRanges
https://livesql.oracle.com/apex/livesql/file/content_EWNZJ82L6J0JSVINVL2GAV7DC.html
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
TypicalPatternMatchingLOBUseCasesInputData Pattern Result
Sessionization Weblogs continuousclicksbysameuser Generatereportsonnumberofdistinctsessions,averagepageviewspersession,etc
Fraud Creditcardtransactions
twotransactionsindifferentlocationswithinashortperiodoftime
Findcasesinwhichacreditcardmayhavebeenusedfraudulentlysinceaphysicalpersoncannotbeintwoplacesatonce
In-gamepurchases
Gameslogs eventsleadinguptoanin-gamepurchase
Detectcommonsequencesofeventthatresultsinanin-gamepurchase
Fraud(mobiles) CDRlogs SIMcardbeingusedinmultiplehandsets
FlagindividualSIMcardsbeingusedbymultiplehandsetswithinaspecifiedtimeperiod
Stockmarketanalysis
Tickerlogs Trackpossiblefraudulentlinkedpatternsofbehavior
Trackknownpatternsofbehaviorsuchasheadandshoulders,triangles,channelsandwedges
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
TypicalPatternMatchingLOBUseCasesInputData Pattern Result
Auditing/Compliance
Applicationlogs
Analyzechangestosecurecustomerdata
Findinstanceswhereoperatorhasmadesuspectmodificationstosecureclientdata
Moneylaundering
Transactionlogs
Searchforsmalltransferswithinatimewindowfollowingbylargetransferwithin“x”daysoflastsmalltransfer
Detectsuspiciousmoneytransferpatternforanaccountandreportaccount,dateoffirstsmalltransfer,dateoflastlargetransfer
Callservicequality
CDRlogs Searchfordropped/reconnectedcalls
Identifyhowmanytimescallswererestartedinasession,totaleffectivecalldurationandtotalinterruptedduration
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
Summary–YouCanNow….
• ConstructaMATCH_RECOGNIZEstatement• Buildsearchcriteriausingpatternvariables• Organizeyourdatacorrectlytodiscoverthepattern• Controlthetypeofdatareturned:summaryvs.detailed• UnderstandthepowerandvalueofSQLpatternmatching
• GoanduseMATCH-RECOGNIZEtoyouradvantage!
✔
✔
✔
✔
✔
✔
Copyright © 2017 Oracle and/or its affiliates. All rights reserved. |
WhereToGetMoreInformation
• AnalyticalSQLHomePageonOTNwithlinksto:– Training+OracleByExample– Podcastsforpatternmatching–Whitepapers– SamplescriptsandsimpletutorialsforpatternmatchingonliveSQL
• DataWarehouseandSQLAnalyticsblog• http://oracle-big-data.blogspot.co.uk/