icdcs 2008 @ beijing china routing of xml and xpath queries in data dissemination networks guoli li,...
TRANSCRIPT
ICDCS 2008 @ Beijing China
Routing of XML and XPath Queries in Data Disse
mination Networks
Guoli Li, Shuang Hou
Hans-Arno Jacobsen
Middleware Systems Research Group
University of Toronto
ICDCS 2008 @ Beijing China
Agenda
Motivation Advertisement-based routing Covering Evaluation Conclusions
ICDCS 2008 @ Beijing China
Motivation
Data sources: publish XML data Data users: register XPath queries The data dissemination network: deliver matching results to a large and dyn
amically changing group of users
Content-based Data Dissemination
… …XML
XML
… …
Queries
Queries
Results
Results
ICDCS 2008 @ Beijing China
Publish/Subscribe
Publisher
Subscriber
Subscription (XPath)
Publication (XML)
Advertisement (DTD)
Subscriber
Matching of XMLs and XPaths [ICDE’06] Matching of Advertisements and XPaths Exploring relations among XPaths
ICDCS 2008 @ Beijing China
Language Model Advertisement: generated from DTDs
Non-recursive advertisement e.g., A = /t1/t2/t3…/tn-1/tn
Recursive advertisement Simple A = A1(A2)+A3 Series A = A1(A2)+A3(A4)+A5 Embedded A = A1(A2(A3 )+ A4)+A5
<?xml encoding="UTF-8"?>
<!ELEMENT personnel (person)+>
<!ELEMENT person (name,email*,url*,link?)>
<!ATTLIST person id ID #REQUIRED>
<!ELEMENT name ((family,given)|(given,family))>
<!ELEMENT family (#PCDATA)>
<!ELEMENT given (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT url EMPTY>
<!ATTLIST url href CDATA 'http://'>
<!ELEMENT link EMPTY>
<!ATTLIST link manager IDREF #IMPLIED>
… …
/personnel/person
/personnel/person/name
/personnel/person/name/family
/personnel/person/name/given
/personnel/person/email
/personnel/person/url
/personnel/person/link
DTD Advertisements
ICDCS 2008 @ Beijing China
Language Model
Subscription: XPaths Absolute
e.g., /c/d/*/e Relative
e.g., c/d/*/e Descendant operators
e.g., c//e/*/c
c
d e
*
e
*
c
b
a
ICDCS 2008 @ Beijing China
Advertisement-based Routing
P(A) P(S) P(S) P(A)
P(A) P(S)
P(A) P(S)
Subscription (S) Broker
A1: /a/b/*/e
A2: /b/e
A3: /a/b/d
A4: /a/b/e
… …
ICDCS 2008 @ Beijing China
Overlapping Algorithms
-1 0 0 0 1 2
S = /a /b /c /* /b /e
Adv Sub Overlap
* * Y
* t Y
t * Y
t t Y
t1 t2 N
Next Table
A = /a /b /c /* /b /c /* /b /e
/a /b /c /* /b /c /* /b /e
/a /b /c /* /b /e
/a /b /c /* /b /c /* /b /e
/a /b /c /* /b /e/a /b /c /* /b /e
/a /b /c /* /b /c /* /b /e
e.g, S = /a /b //c /* /b //e
Basic case: Other cases:
ICDCS 2008 @ Beijing China
Subscription Tree Subscriptions are
maintained in a hierarchical tree
A child has more than one parent
Siblings may intersect If a publication does not
match a node, it does not match any of the descendants
ROOT
/a
/b/e/c/f
/*/b d/a/b
/a/b/a/c /a/*/d
/a/b/d/a/c/d
/b/d/b/e
/b/d/a
pointer
ICDCS 2008 @ Beijing China
Covering Algorithms
Similar to Adv-Sub overlapping algorithms Absolute simple XPEs Relative simple XPEs
XPEs with // operator e.g.,
S1 S2 Cover
* * Y
* t Y
t * N
t t Y
t1 t2 N
S2 = /a /a /* //c /e /c /d
S1 = /* /a //e /c
/a /a /*//c /e /c /d
/* /a /e /c
/a /a //c /e /c /d
/*
ICDCS 2008 @ Beijing China
Merging Rules
Rules XPEs with one difference (e.g., element, op)
e.g., S1= /a/*/c/d S2 = /a/*/c/e S = /a/*/c/* XPEs with different sub-XPEs
e.g., … … … …
XPE1 XPE2
… … … …
S1
S2… … … … S //
Merge degree
P(S1)
P(S2)
P(S)
ICDCS 2008 @ Beijing China
Evaluation
Setup Implemented in C++ Overlay with 127 content-based routers Cluster (each node:1.86GHz, 4G) vs. PlanetLab Workloads are generated from two DTDs: NITF and PSD
Metrics Number of subscriptions per router Network traffic XPE processing time Notification delay
ICDCS 2008 @ Beijing China
Routing Table Size
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 20000 40000 60000 80000 100000
Number of Xpath Queries
Ro
utin
g T
ab
le s
ize
(# o
f XP
ath
Qu
eri
es)
No Covering( Set A and B)
50% Covering (Set A)
90% Covering (Set B)
ICDCS 2008 @ Beijing China
Routing Table Size
0
10000
20000
30000
40000
50000
0 20000 40000 60000 80000 100000
Number of Subscriptions
Rou
ting
Tab
le S
ize
Covering (Set B)
Perfect Merging(Set B)
Imperfect Merging(Set B)
ICDCS 2008 @ Beijing China
Network Traffic
Method Network Traffic Delay(ms)
No-Adv-No-Cov 654,871 97.82
No-Adv-With-Cov 572,890 20.74
With-Adv-No-Cov 398,810 98.09
With-Adv-With-Cov 326,796 20.89
With-Adv-With-CovPM 254,900 16.78
With-Adv-With-CovIPM 257,567 12.24
ICDCS 2008 @ Beijing China
Related Work Locating data sources in large distributed systems [Galanis et al. 2003]
DHT based approach Data summary
Query aggregation for scalable data dissemination [Chan et al. 2002]
Equivalence between the original query set and the aggregated set ONYX [Diao et al. 2004]
Deliver part of the XML documents Share common prefixes among queries using NFA
XTreeNet [Fenner et al. 2005]
Unify the pub/sub model and the query/response model Avoid repeatedly matching at each hop
ICDCS 2008 @ Beijing China
Conclusions Investigate advertisement-based routing for XML data di
ssemination networks Propose a novel data structure to maintain covering & m
erging relationships among XPEs. Perform experimental evaluation on a 127 broker overlay
to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%
Future work Extend to tree patterns Share common prefixes among XPEs in overlapping and coverin
g algorithms
ICDCS 2008 @ Beijing China
Q & A
Contact [email protected] [email protected]
Middleware systems research group, University of Toronto www.msrg.eecg.toronto.edu
ICDCS 2008 @ Beijing China
Process Time
Number of Subscriptions
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Tim
e (m
s)
0
20
40
60
80
100
120
140
ICDCS 2008 @ Beijing China
Notification Delay (PSD)
Number of Hops
2 3 4 5 6
0
4
8
12
16
Not
ifica
tion
Del
ay (
ms)
ICDCS 2008 @ Beijing China
False Positives
0
2
4
6
8
0 0.05 0.1 0.15 0.2
Imperfect Degree
Fa
lse
Po
sitiv
e (
%)
ICDCS 2008 @ Beijing China
Conclusions Investigate advertisement-based routing for XML data di
ssemination networks Present algorithms to determine the covering relations a
mong arbitrary XPEs Propose a novel data structure to maintain covering & m
erging relationships among XPEs. Explore rules to merge similar XPEs in order to further re
duce the routing table size Perform experimental evaluation on a 127 broker overlay
to demonstrate the approach Reduce routing table by up to 90% Improve routing latency by roughly 85%