high-performance pattern detection and discovery for databases and data streams barzan mozafari...

58
High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo Cho, Prof. D. Stott Parker, and Prof. Mark Hansen Winter 2011 UCLA Computer Science Department

Upload: barbra-stafford

Post on 21-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

High-performance Pattern Detection and Discovery

for Databases and Data Streams

Barzan Mozafari

Adviser: Prof. Carlo ZanioloCommittee Members:Prof. Junghoo Cho,

Prof. D. Stott Parker, and Prof. Mark Hansen

Winter 2011

UCLA Computer Science Department

Page 2: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Big Picture1. Query Languages that

allow for the expression of complex patterns

2. Scalable Systems that support such languages and can handle massive, high-arrival data

3. Efficient, One-pass Algorithms that can mine large amounts of stored or streaming data and extract useful patterns

Query

Query

Patte

rns

Patte

rns M

atches

Matches

Data Mining

Data MiningDataData

Page 3: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Overview• Introduction• Query Languages for Pattern Detection

– Kleene-* Constructs in SQL– Nested Words [SIGMOD’10, VLDB’10]

– Optimization [Work in progress]

– XSeq [Work in progress]

• Conclusion

Page 4: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Complex Event Patterns• Sequences in DBs and CEP over data

streams

• Academic and industrial interest:– SQL-TS [PODS ‘01]– SASE [2006], SASE+ [2008]– SQL Change proposal, 2007 (by Oracle, IBM and

Streambase)

– Other industrial and academic languages:• Cayuga & CEL• CEDR• Microsoft CEP & LINQ

Page 5: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Our Contribution: K*SQL1. A powerful language for:

i. Expressing more complex patterns on relational streams and sequences

ii. Querying data with more complex structures, e.g, XML and genomic data

2. A unifying engine for sequence patterns and XML3. New optimization techniques

• pattern search over nested words

4. Efficient query execution backend for other languages

5. XSeq: An XPath-resembling language to bring Kleene-* to XML applications

Page 6: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Regular Expressions in SQLrfid_readings (Time, SensorType, ensorId, ItemId)rfid_readings (Time, SensorType, ensorId, ItemId)

Page 7: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Nested Kleene-*: K*SQLTimestamp BadgeID Room

1226633804799 26 Room12

1226633805799 2 Room7

1226633806799 26 Room14

1226633807799 5 Room37

1226633808799 5 Room37

… … …

Page 8: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

Page 9: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( L )WHERE L.room = ‘Lab’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

Page 10: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( L+ )WHERE L.room = ‘Lab’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

LL+

Page 11: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( L+ O+ )WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

O

O

O

L+

O+

Page 12: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*) )WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

L

R

R

R

R

R

L+

O+

L+

O+

R

R

Page 13: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*)+ )WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

L

R

R

R

R

R

L+

O+

L+

O+

R

RR+

Page 14: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*)+ X)WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’ AND X.room = ‘Exit’

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

L

R

R

R

R

R

X

L+

O+

L+

O+

R

RR+

Page 15: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*)+ X)WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’ AND X.room = ‘Exit’ AND sum(R.Last(L).timestamp – R.First(L).timestamp) > 3600

Employees who spend Employees who spend >1 hour>1 hour in the lab but leave in the lab but leave without going to decontamination roomwithout going to decontamination room

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

L

R

R

R

R

R

X

L+

O+

L+

O+

R

RR+

Page 16: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID

FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*)+ X)WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’ AND X.room = ‘Exit’ AND sum(R.Last(L).timestamp – R.First(L).timestamp) > 3600

Strictly More Expressive, through:Strictly More Expressive, through:(i)Nested Kleene-*, (ii) Labels, i.e. Aliases(i)Nested Kleene-*, (ii) Labels, i.e. Aliases

Page 17: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

SELECT badgeID,Last(R).Last(L).timestamp – First(R).First(L).timestamp)FROM rfidPARTITION BY badgeIDORDER BY timestampAS PATTERN ( (R: L+ O*)+ X)WHERE L.room = ‘Lab’ AND O.room != ‘Decontamination’ AND X.room = ‘Exit’ AND sum(R.Last(L).timestamp – R.First(L).timestamp) > 3600

Lab

Room2

Room12

Room7

Lab

Room2

Room7

Exit

Lab

L

L

L

R

R

R

R

R

X

L+

O+

L+

O+

R

RR+

Strictly More Expressive, through:Strictly More Expressive, through:(i)Nested Kleene-*, (ii) Labels, i.e. Aliases(i)Nested Kleene-*, (ii) Labels, i.e. Aliases

Page 18: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

1. A powerful language with a very efficient implementation based on FSA

2. Subsumes SQL-MR, SASE+, Cayuga, SQL-TS

3. Many interesting applications– including queries on semistructured documents

Very natural question:

Can we handle full XML?

K*SQL Checkpoint

Page 19: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Automata and XMLWord Automata (FSA): only linear structure is explicit, cannot model parenthesis languages

Ordered Tree Automata (OTA): only hierarchical structure is explicit, exponentially less succinct for word queries

Pushdown Automata (PDA): Many problems are undecidable; expensive complexity

Page 20: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

20

Advances in the Automata World

Linear sequence + well-nested edges

Positions labeled with symbols in

a1a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12

Positions classified as: Call positions: both linear and hierarchical successors

Return positions: both linear and hierarchical predecessors

Internal positions: otherwise

Nested Words [Alur’06]

Page 21: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Nested Word ApplicationsXML Document

<conference> <name> CAV 2006 </name> <location> <city> Seattle </city> <hotel> Sheraton </hotel> </location> <sponsor> MSR </sponsor> <sponsor> Cadence </sponsor></conference>

Programglobal int x;bool P() { … x = 3; if Q x = 1 ; …}

bool Q () { local int y; … x = y; return (x==0);}

Primary structure: Linear sequence of nucleotides (A, C, G, U)

Secondary structure: Hydrogen bonds between nucleotides

G

C

U

GA

A

U

AC

G C

G

C

U

C

G

RNA Sequence

Page 22: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Odious ComparisonProperty FSA NWA PDA

input is read from left to right Yes Yes Yes

Deterministic automata as expressive as non-deterministic ones

Yes Yes No

Closed under complementation Yes Yes Only for DPDA w/ final state

Closed under union, intersection, concatenation, and Kleene-*

Yes Yes No

Emptiness Decidable Decidable Decidable

membership, language inclusion, language equivalence

Decidable Decidable Undecidable

Can recognize paranthesis languages? No Yes Yes

NWA is exponentially more succinct than Tree Automata

No query language has been proposed for NW

Page 23: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

XML Sigmod Record:SAX-3<!ELEMENT SigmodRecord (issue)* > <!ELEMENT issue

(volume,number,articles) > <!ELEMENT volume (#PCDATA)> <!ELEMENT number (#PCDATA)> <!ELEMENT articles (article)* > <!ELEMENT article

(title,initPage,endPage,authors) > <!ELEMENT title (#PCDATA)> <!ELEMENT initPage (#PCDATA)> <!ELEMENT endPage (#PCDATA)> <!ELEMENT authors (author)* > <!ELEMENT author (#PCDATA)> <!ATTLIST author position CDATA

#IMPLIED>

tagIndex

Type Token Value

1 open SigmodRecord

_

2 open issue _

3 open volume _

4 text _ 11

5 close volume _

6 open number _

… … … …

25 open author _

26 attribute position 01

27 text _ Karen Botnich

… … … …

Page 24: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

XPathXPath

Find articles of Carlo Zaniolo Find articles of Carlo Zaniolo as the 2as the 2ndnd co-author co-author

//article[authors/author [@position = "01" and

text()="Carlo Zaniolo"]

]/title/text()

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 25: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

K*SQLK*SQL

Question: Question: Can we query nested words in Can we query nested words in K*SQL?K*SQL?

In particular:In particular:

can we express traditional XML queriescan we express traditional XML queries– i.e. those often expressed via XPath/XQuery:i.e. those often expressed via XPath/XQuery:

Page 26: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(

)

WHEREWHERE

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <aut hors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 27: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt

)

WHEREWHERE OpArt.value = ‘<article>’

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 28: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt

)

WHEREWHERE OpArt = open(‘article’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 29: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl

)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 30: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title

)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’)

<SigmodRecord><issue>…<article> <title> Implementation of

GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 31: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl

)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 32: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 33: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 34: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E*)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND ClArt = close(‘article’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors>

… <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 35: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 36: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos.type = ‘attr’ AND pos.value = ’01’

AND AND pos.token = ‘position’

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 37: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos

)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute (‘position’, ’01’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 38: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute(‘position’, ‘01’)

ANDAND author.token = `Carlo Zaniolo’

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 39: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author ClAu)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)ANDAND pos = attribute(‘position’, ‘01’) ANDAND author.value = `Carlo Zaniolo’

ANDAND ClAu = close(‘author’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 40: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author ClAu E*)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute(‘position’, ‘01’)

ANDAND author.value = `Carlo Zaniolo’

ANDAND ClAu = close(‘author’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author>

… </authors> </article> ….

Page 41: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author ClAu E*

ClAuths ClArt)WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute(‘position’, ‘01’)

ANDAND author.token = `Carlo Zaniolo’

ANDAND ClAu = close(‘author’)

ANDAND ClAuths = close(‘authors’)

ANDAND ClArt = close(‘article’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 42: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author ClAu E*

ClAuths ClArt)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute(‘position’, ‘01’)

ANDAND author.token = `Carlo Zaniolo’

ANDAND ClAu = close(‘author’)

ANDAND ClAuths = close(‘authors’)

ANDAND ClArt = close(‘article’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 43: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Find articles of Carlo Find articles of Carlo Zaniolo Zaniolo

as the 2as the 2ndnd co-author co-authorSELECTSELECT Title.token ASAS articleName

FROMFROM sigmod_record

AS PATTERN AS PATTERN

(OpArt OpTitl Title ClTitl E*

OpAuths E* OpAu Pos Author ClAu E*

ClAuths ClArt)

WHEREWHERE OpArt = open(‘article’)

ANDAND OpTitl = open(‘title’) ANDAND ClTitl = close(‘title’)

ANDAND isElement(E)

ANDAND OpAuths = open(‘authors’)

ANDAND OpAu = open(‘author’)

ANDAND pos = attribute(‘position’, ‘01’)

ANDAND author.token = `Carlo Zaniolo’

ANDAND ClAu = close(‘author’)

ANDAND ClAuths = close(‘authors’)

ANDAND ClArt = close(‘article’)

<SigmodRecord><issue>…<article> <title> Implementation of GEM </title> <initPage> 45 </initPage> … <authors> … <author position="01"> Carlo Zaniolo </author> … </authors> </article> ….

Page 44: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Sequence Queries over XML: ‘W’-Patterns in Stocks

<!ELEMENT Stocks (Stock)* ><!ELEMENT Stock (symbol, date, price, volume)><!ELEMENT symbol (#PCDATA)><!ELEMENT date (#PCDATA)><!ELEMENT price (#PCDATA)><!ELEMENT volume (#PCDATA)>

Page 45: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

W-patterns in NASDAQ transactions with volume>1000

SELECTSELECT FIRST(Z).FIRST(X).Sym.token

FROMFROM Nasdaq PARTITION BY Y.X.Sym.token

AS PATTERNAS PATTERN

(Z: (X: OpSt Sym Date OP Price1 CP

OpV Volume ClV ClSt)*

(Y: OpSt Sym Date OP Price2 CP

OpV Volume ClV ClSt)*

)^2

WHERE WHERE

OpSt = open(‘Stock’) ANDAND ClSt = open(‘Stock’)

ANDAND OP = open(‘price’) ANDAND CP = close(‘price’)

ANDAND OpV = open(‘volume’) ANDAND ClV = close(‘volume’)

ANDAND INT(volume.token) >= 100

ANDAND Z.X.price1.token < Z.PREV(X).price1.token

ANDAND Z.Y.price2.token > Z.PREV(Y).price2.token

<Stock symbol=“YHOO” date=“01-01-2010 23:10:00”>

<price> 18.50 </price><volume> 21 </volume></Stock><Stock symbol=“YHOO”

date=“01-01-2010 23:16:00”>

<price> 18.70 </price><volume> 11 </volume></Stock>…

Page 46: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

W-patterns in NASDAQ transactions with volume>1000

SELECTSELECT FIRST(Z).FIRST(X).Sym.token

FROMFROM Nasdaq PARTITION BY Y.X.Sym.token

AS PATTERNAS PATTERN

(Z: (X: OpSt Sym Date OP Price1 CP

OpV Volume ClV ClSt)*

(Y: OpSt Sym Date OP Price2 CP

OpV Volume ClV ClSt)*

)^2

WHERE WHERE

OpSt = open(‘Stock’) ANDAND ClSt = open(‘Stock’)

ANDAND OP = open(‘price’) ANDAND CP = close(‘price’)

ANDAND OpV = open(‘volume’) ANDAND ClV = close(‘volume’)

ANDAND INT(volume.token) >= 100

ANDAND Z.X.price1.token < Z.PREV(X).price1.token

ANDAND Z.Y.price2.token > Z.PREV(Y).price2.token

<Stock symbol=“YHOO” date=“01-01-2010 23:10:00”>

<price> 18.50 </price><volume> 21 </volume></Stock><Stock symbol=“YHOO”

date=“01-01-2010 23:16:00”>

<price> 18.70 </price><volume> 11 </volume></Stock>…

Page 47: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

W-patterns in NASDAQ transactions with volume>1000

SELECTSELECT FIRST(Z).FIRST(X).Sym.token

FROMFROM Nasdaq PARTITION BY Y.X.Sym.token

AS PATTERNAS PATTERN

(Z: (X: OpSt Sym Date OP Price1 CP

OpV Volume ClV ClSt)*

(Y: OpSt Sym Date OP Price2 CP

OpV Volume ClV ClSt)*

)^2WHERE WHERE

OpSt = open(‘Stock’) ANDAND ClSt = open(‘Stock’)

ANDAND OP = open(‘price’) ANDAND CP = close(‘price’)

ANDAND OpV = open(‘volume’) ANDAND ClV = close(‘volume’)

ANDAND INT(volume.token) >= 100

ANDAND Z.X.price1.token < Z.PREV(X).price1.token

ANDAND Z.Y.price2.token > Z.PREV(Y).price2.token

<Stock symbol=“YHOO” date=“01-01-2010 23:10:00”>

<price> 18.50 </price><volume> 21 </volume></Stock><Stock symbol=“YHOO”

date=“01-01-2010 23:16:00”>

<price> 18.70 </price><volume> 11 </volume></Stock>…

X* Y*X*

Y*

Page 48: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

Optimization in K*SQL

• Compile-Time:– Inferring inter-predicate implications– Query re-writing, e.g. adding more constrainst– Greedy predicate assignment

• Run-Time: Avoiding unnecessary backtracks

– VPSearch: Extending KMP search algorithm to nested words and visibly pushdown words

– Optimizing non-determinisitc queries• i.e. all-match query modes

Page 49: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

K*SQL vs. XML Engines

Page 50: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [1] Data mining: Staking a claim on your privacy. Information and

Privacy Commissioner, Ontario, Jan. 1998.• [2] Directive on privacy protection. European Union, Oct. 1998.• [3] The end of privacy. The Economist, May 1999.• [4] Daniel J. Abadi, Donald Carney, Ugur C etintemel, Mitch

Cherniack, Christian Convey, C. Erwin, Eduardo F. Galvez, M. Hatoun, Anurag Maskey, Alex Rasin, A. Singer, Michael Stonebraker, Nesime Tatbul, Ying Xing, R. Yan, and Stanley B. Zdonik. Aurora: A data stream management system. In SIGMOD Conference, page 666, 2003.

• [5] Mads Sig Ager, Olivier Danvy, and Henning Korsholm Rohde. Fast partial evaluation of pattern matching in strings. In PEPM, 2003.

• [6] Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman. Ecient pattern matching over event streams. In SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 147{160, New York, NY, USA, 2008. ACM.

Page 51: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References

• [7] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large

• databases. In VLDB, 1994.• [8] Rakesh Agrawal and Ramakrishnan Srikant. Privacy-

preserving data mining. In SIG-• MOD, 2000.• [9] Shipra Agrawal, Vijay Krishnan, and Jayant R.

Haritsa. On addressing eciency• concerns in privacy-preserving mining. In DASFAA,

2004.• [10] Rajeev Alur. Marrying words and trees. In PODS,

2007.

Page 52: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [11] Rajeev Alur, Marcelo Arenas, Pablo Barcelo, Kousha Etessami,

Neil Immerman, and1 Leonid Libkin. First-order and temporal logics for nestedwords. In LICS, 2007.

• [12] Rajeev Alur, Swarat Chaudhuri, and P. Madhusudan. Languages of nested trees. In CAV, 2006.

• [13] Rajeev Alur and P. Madhusudan. Visibly pushdown languages. In STOC, pages 202{ 211, 2004.

• [14] Rajeev Alur and P. Madhusudan. Adding nesting structure to words. In Developments in Language Theory, pages 1{13, 2006.

• [15] Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Itaru Nishizawa, Justin Rosenstein, and Jennifer Widom. Stream: The stanford stream data manager. In SIGMOD, 2003.

• [16] Brian Babcock, Mayur Datar, and Rajeev Motwani. Load shedding for aggregation queries over data streams. In ICDE '04: Proceedings of the 20th International Conference on Data Engineering, page 350, Washington, DC, USA, 2004. IEEE Computer Society.

Page 53: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [17] RICARDO A. BAEZA-YATES and GASTON H. GONNET. Fast text

searching for regular expressions or automaton searching on tries. 1996.• [18] Yijian Bai, Hetal Thakkar, Chang Luo, Haixun Wang, and Carlo Zaniolo.

A data stream language and system designed for power and extensibility. In CIKM, pages 337{346, 2006.

• [19] Gerard Berry and Ravi Sethi. From regular expressions to deterministic automata.

• [20] Philip Bille and Martin Farach-Colton. Fast and compact regular expression matching. 2008.

• [21] Ronnie Chaiken, Bob Jenkins, Paul Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: Easy and ecient parallel processing of massive data sets. VLDB, 29(2):282{318, 2008.

• [22] Ronnie Chaiken, Bob Jenkins, Per-Ake Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou. Scope: easy and ecient parallel processing of massive data sets. PVLDB, 1(2):1265{1276, 2008.

• [23] Hei Chan and Adnan Darwiche. Sensitivity analysis in Bayesian networks: From single to multiple parameters. In 20'th Conference on Uncertainty in Articial Intelligence (UAI), 2004.

Page 54: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [24] Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz. Moment: Maintaining closed

frequent itemsets over a stream sliding window. In Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM'04), November 2004.

• [25] Yun Chi, Philip S. Yu, Haixun Wang, and Richard R. Muntz. Loadstar: A load shedding scheme for classifying data streams. In SDM, 2005.

• [26] Alexandre Evmievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In PODS, 2003.

• [27] Sudipto Guha, Dimitrios Gunopulos, and Nick Koudas. Correlating synchronous and asynchronous data streams. In KDD, pages 529{534, 2003.

• [28] Daniel Gyllstrom, Eugene Wu 0002, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, and Gordon Anderson. Sase: Complex event processing over streams. CoRR, abs/cs/0612128, 2006.

• [29] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.

• [30] HARUO HOSOYA, JERO ME VOUILLON, and BENJAMIN C. PIERCE. Regular expression types for xml. ACM Transactions on Programming Languages and Systems, 27(1):46{90, January 2005.

• [31] Jeong-Hyon Hwang, Sanghoon Cha, Ugur C etintemel, and Stanley B. Zdonik. Borealisr: a replication-transparent stream processing system for wide-area monitoring applications. In SIGMOD Conference, pages 1303{1306, 2008.

Page 55: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [32] E. Keogh and M. Pazzani. Learning augmented bayesian classiers: A

comparison of distribution-based and classication-based approaches. In 7th. Int'l Workshop on AI and Statistics, 1999.

• [33] Donald E. Knuth, James H. Morris Jr., and Vaughan R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323{350, 1977.

• [34] S. Rao Kosaraju. Ecient tree pattern matching. 1989.• [35] Yan-Nei Law, Haixun Wang, and Carlo Zaniolo. Query languages and

data models for database sequences and data streams. In VLDB, 2004.• [36] Yan-Nei Law and Carlo Zaniolo. Improving the accuracy of continuous

aggregates and mining queries on data streams under load shedding.• [37] Yan&#45;Nei Law and Carlo Zaniolo. Improving the accuracy of

continuous aggregates and mining queries on data streams under load shedding. Int. J. Bus. Intell. Data Min., 3(1):99{117, 2008.

• [38] JaeGil Lee, Jiawei Han, Xiaolei Li, and Hector Gonzalez. Traclass: Trajectory classification using hierarchical regionbased and trajectorybased clustering. VLDB, 29(2):282{ 318, 2008.

• [39] C.K.-S. Leung, Q.I. Khan, and T. Hoque. Cantree: A tree structure for efficient incremental mining of frequent patterns. In ICDM, 2005.

Page 56: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [40] Feifei Li, Jimeng Sun, Spiros Papadimitriou, George A. Mihaila, and Ioana

Stanoi. Hiding in the crowd: Privacy preservation on evolving streams through correlation tracking. In ICDE, pages 686{695, 2007.

• [41] Barzan Mozafari, Hetal Thakkar, and Carlo Zaniolo. Verifying and mining frequent patterns from large windows over data streams. In the 24th International Conference on Data Engineering (ICDE), 2008.

• [42] Barzan Mozafari and Carlo Zaniolo. Publishing naive bayesian classiers: Privacy without accuracy loss. In the 35th International Conference on Very Large Data Bases (VLDB), 2009.

• [43] Barzan Mozafari and Carlo Zaniolo. A scalable algorithm for optimal load shedding with aggregates and mining queries. In Under review process, 2009.

• [44] Gonzalo Navarro and Mathieu Rafnot. Fast regular expression search. WAE, pages 198{212, 1999.

• [45] Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. Pig latin: a not-so-foreign language for data processing. In Jason Tsong-Li Wang, editor, SIGMOD Conference, pages 1099{1110. ACM, 2008.

• [46] Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal, and M. Hsu. Mining sequential patterns by pattern-growth:The PrefixSpan approach. IEEE TKDE, 16(11):1424{1440, November 2004.

Page 57: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [47] Vibhor Rastogi, Sungho Hong, and Dan Suciu. The boundary

between privacy and utility in data publishing. In VLDB, 2007. 34• [48] Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, and Jafar Adibi. A

sequential pattern query language for supporting instant data mining for e-services. In VLDB, pages 653{656, 2001.

• [49] Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, and Jafar Adibi. Expressing and optimizing sequence queries in database systems. ACM Trans. Database Syst., 29(2):282{318, 2004.

• [50] Nesime Tatbul, Ugur C etintemel, Stanley B. Zdonik, Mitch Cherniack, and Michael Stonebraker. Load shedding in a data stream manager. In VLDB, pages 309{320, 2003.

• [51] Nesime Tatbul and Stanley B. Zdonik. Window-aware load shedding for aggregation queries over data streams. In VLDB, pages 799{810, 2006.

Page 58: High-performance Pattern Detection and Discovery for Databases and Data Streams Barzan Mozafari Adviser: Prof. Carlo Zaniolo Committee Members: Prof. Junghoo

References• [52] Hetal Thakkar, Barzan Mozafari, and Carlo Zaniolo. A data stream

mining system. In ICDM, pages 79{88, 2008.• [53] Hetal Thakkar, Barzan Mozafari, and Carlo Zaniolo. Designing an

inductive data stream management system: the stream mill experience. In SSPS in conjunction with EDBT, pages 79{88, 2008.

• [54] Yi-Cheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao. Load shedding in stream databases: A control-based approach. In VLDB, pages 787{798, 2006.

• [55] Haixun Wang, Carlo Zaniolo, and Chang Luo. Atlas: A small but complete sql extension for data mining and data streams. In VLDB, pages 1113{1116, 2003.

• [56] J. T. Yao and M. Zhang. A fast tree pattern matching algorithm for xml query.

• [57] Fred Zemke, Andrew Witkowski, Mitch Cherniak, and Latha Colby. Pattern matching in sequences of rows. In [sql change proposal, march 2007], http://asktom.oracle.com/tkyte/row-patternrecogniton-11-public.pdf