icdcs 2008 @ beijing china routing of xml and xpath queries in data dissemination networks guoli li,...

28
ICDCS 2008 @ Beijing China Routing of XML and XPath Queries in Da ta Dissemination Ne tworks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto

Upload: mark-lynch

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

ICDCS 2008 @ Beijing China

Routing of XML and XPath Queries in Data Disse

mination Networks

Guoli Li, Shuang Hou

Hans-Arno Jacobsen

Middleware Systems Research Group

University of Toronto

ICDCS 2008 @ Beijing China

Agenda

Motivation Advertisement-based routing Covering Evaluation Conclusions

ICDCS 2008 @ Beijing China

Motivation

Data sources: publish XML data Data users: register XPath queries The data dissemination network: deliver matching results to a large and dyn

amically changing group of users

Content-based Data Dissemination

… …XML

XML

… …

Queries

Queries

Results

Results

ICDCS 2008 @ Beijing China

Publish/Subscribe

Publisher

Subscriber

Subscription (XPath)

Publication (XML)

Advertisement (DTD)

Subscriber

Matching of XMLs and XPaths [ICDE’06] Matching of Advertisements and XPaths Exploring relations among XPaths

ICDCS 2008 @ Beijing China

Covering-based Routing

3 4

5

6

1

2

ICDCS 2008 @ Beijing China

Language Model Advertisement: generated from DTDs

Non-recursive advertisement e.g., A = /t1/t2/t3…/tn-1/tn

Recursive advertisement Simple A = A1(A2)+A3 Series A = A1(A2)+A3(A4)+A5 Embedded A = A1(A2(A3 )+ A4)+A5

<?xml encoding="UTF-8"?>

<!ELEMENT personnel (person)+>

<!ELEMENT person (name,email*,url*,link?)>

<!ATTLIST person id ID #REQUIRED>

<!ELEMENT name ((family,given)|(given,family))>

<!ELEMENT family (#PCDATA)>

<!ELEMENT given (#PCDATA)>

<!ELEMENT email (#PCDATA)>

<!ELEMENT url EMPTY>

<!ATTLIST url href CDATA 'http://'>

<!ELEMENT link EMPTY>

<!ATTLIST link manager IDREF #IMPLIED>

… …

/personnel/person

/personnel/person/name

/personnel/person/name/family

/personnel/person/name/given

/personnel/person/email

/personnel/person/url

/personnel/person/link

DTD Advertisements

ICDCS 2008 @ Beijing China

Language Model

Subscription: XPaths Absolute

e.g., /c/d/*/e Relative

e.g., c/d/*/e Descendant operators

e.g., c//e/*/c

c

d e

*

e

*

c

b

a

ICDCS 2008 @ Beijing China

Advertisement-based Routing

P(A) P(S) P(S) P(A)

P(A) P(S)

P(A) P(S)

Subscription (S) Broker

A1: /a/b/*/e

A2: /b/e

A3: /a/b/d

A4: /a/b/e

… …

ICDCS 2008 @ Beijing China

Overlapping Algorithms

-1 0 0 0 1 2

S = /a /b /c /* /b /e

Adv Sub Overlap

* * Y

* t Y

t * Y

t t Y

t1 t2 N

Next Table

A = /a /b /c /* /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

/a /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

/a /b /c /* /b /e/a /b /c /* /b /e

/a /b /c /* /b /c /* /b /e

e.g, S = /a /b //c /* /b //e

Basic case: Other cases:

ICDCS 2008 @ Beijing China

Subscription Tree Subscriptions are

maintained in a hierarchical tree

A child has more than one parent

Siblings may intersect If a publication does not

match a node, it does not match any of the descendants

ROOT

/a

/b/e/c/f

/*/b d/a/b

/a/b/a/c /a/*/d

/a/b/d/a/c/d

/b/d/b/e

/b/d/a

pointer

ICDCS 2008 @ Beijing China

Tree Maintenance

Insert Delete

ICDCS 2008 @ Beijing China

Covering Algorithms

Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs

XPEs with // operator e.g.,

S1 S2 Cover

* * Y

* t Y

t * N

t t Y

t1 t2 N

S2 = /a /a /* //c /e /c /d

S1 = /* /a //e /c

/a /a /*//c /e /c /d

/* /a /e /c

/a /a //c /e /c /d

/*

ICDCS 2008 @ Beijing China

Merging Rules

Rules XPEs with one difference (e.g., element, op)

e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* XPEs with different sub-XPEs

e.g., … … … …

XPE1 XPE2

… … … …

S1

S2… … … … S //

Merge degree

P(S1)

P(S2)

P(S)

ICDCS 2008 @ Beijing China

Evaluation

Setup Implemented in C++ Overlay with 127 content-based routers Cluster (each node:1.86GHz, 4G) vs. PlanetLab Workloads are generated from two DTDs: NITF and PSD

Metrics Number of subscriptions per router Network traffic XPE processing time Notification delay

ICDCS 2008 @ Beijing China

Routing Table Size

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 20000 40000 60000 80000 100000

Number of Xpath Queries

Ro

utin

g T

ab

le s

ize

(# o

f XP

ath

Qu

eri

es)

No Covering( Set A and B)

50% Covering (Set A)

90% Covering (Set B)

ICDCS 2008 @ Beijing China

Routing Table Size

0

10000

20000

30000

40000

50000

0 20000 40000 60000 80000 100000

Number of Subscriptions

Rou

ting

Tab

le S

ize

Covering (Set B)

Perfect Merging(Set B)

Imperfect Merging(Set B)

ICDCS 2008 @ Beijing China

Network Traffic

Method Network Traffic Delay(ms)

No-Adv-No-Cov 654,871 97.82

No-Adv-With-Cov 572,890 20.74

With-Adv-No-Cov 398,810 98.09

With-Adv-With-Cov 326,796 20.89

With-Adv-With-CovPM 254,900 16.78

With-Adv-With-CovIPM 257,567 12.24

ICDCS 2008 @ Beijing China

Process Time

ICDCS 2008 @ Beijing China

Notification Delay (PSD)

ICDCS 2008 @ Beijing China

Notification Delay (NITF)

ICDCS 2008 @ Beijing China

Related Work Locating data sources in large distributed systems [Galanis et al. 2003]

DHT based approach Data summary

Query aggregation for scalable data dissemination [Chan et al. 2002]

Equivalence between the original query set and the aggregated set ONYX [Diao et al. 2004]

Deliver part of the XML documents Share common prefixes among queries using NFA

XTreeNet [Fenner et al. 2005]

Unify the pub/sub model and the query/response model Avoid repeatedly matching at each hop

ICDCS 2008 @ Beijing China

Conclusions Investigate advertisement-based routing for XML data di

ssemination networks Propose a novel data structure to maintain covering & m

erging relationships among XPEs. Perform experimental evaluation on a 127 broker overlay

to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%

Future work Extend to tree patterns Share common prefixes among XPEs in overlapping and coverin

g algorithms

ICDCS 2008 @ Beijing China

Q & A

Contact [email protected] [email protected]

Middleware systems research group, University of Toronto www.msrg.eecg.toronto.edu

ICDCS 2008 @ Beijing China

Process Time

Number of Subscriptions

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Tim

e (m

s)

0

20

40

60

80

100

120

140

ICDCS 2008 @ Beijing China

Notification Delay (NITF)

ICDCS 2008 @ Beijing China

Notification Delay (PSD)

Number of Hops

2 3 4 5 6

0

4

8

12

16

Not

ifica

tion

Del

ay (

ms)

ICDCS 2008 @ Beijing China

False Positives

0

2

4

6

8

0 0.05 0.1 0.15 0.2

Imperfect Degree

Fa

lse

Po

sitiv

e (

%)

ICDCS 2008 @ Beijing China

Conclusions Investigate advertisement-based routing for XML data di

ssemination networks Present algorithms to determine the covering relations a

mong arbitrary XPEs Propose a novel data structure to maintain covering & m

erging relationships among XPEs. Explore rules to merge similar XPEs in order to further re

duce the routing table size Perform experimental evaluation on a 127 broker overlay

to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%