vldb2005 cms-topss: efficient dissemination of rss documents milenko petrovic haifeng liu hans-arno...
Embed Size (px)
TRANSCRIPT

VLDB2005
CMS-ToPSS: Efficient Dissemination of RSS Documents
Milenko Petrovic Haifeng Liu Hans-Arno JacobsenUniversity of Toronto

VLDB05 2
Information Dissemination
Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers
RSS frequently used to disseminate update to interested users CNN.com, Yahoo! News, Amazon.com, MSN search (beta)
RSSaggregator
RSSreaders
RSSpublishers
Problem: Polling based architecture

VLDB05 3
Solution!
Current rss dissemination architecture
G-ToPSS rss dissemination architecture

VLDB05 4
Interaction Model: Publish/Subscribe
Broker
Publisher Publisher
Subscriber Subscriber
RSS feeds
MatchingRSS feeds
MatchingRSS feeds
Queries over all RSS

VLDB05 5
Research challenges
1. Need a subscription (query) language suitable for filtering of rss documents
2. Need an efficient matching algorithm based on graph representation• Structurally matching• Constraint matching
3. Scalability to a large number of subscriptions and high publishing rate

VLDB05 6
CMS-ToPSS System Architecture

VLDB05 7
vary number of subs
020406080
100120140160
5,00
08,
000
10, 0
00
20, 0
00
30, 0
00
40, 0
00
50, 0
00
60, 0
00
70, 0
00
80, 0
00
90, 0
00
100,
000
number of subscri pti ons
matc
hing
tim
e (m
s)
Subscription Scalability

VLDB05 8
Memory Scalability
memory vs. #subs
0100200300400500600700800
5,00
08,
000
10, 0
00
20, 0
00
30, 0
00
40, 0
00
50, 0
00
60, 0
00
70, 0
00
80, 0
00
90, 0
00
100,
000
number of subscri pti ons
memo
ry s
ize
(M)

VLDB05 9
Matching Semantics
PAPER17
“Arno Jacobsen”
AUTHOR
SIGMOD
CONFERENCE
“California”
LOCATION“2001”
YEAR
?y(?y <= Publication)
“Arno Jacobsen”
AUTHOR
SIGMOD
CONFERENCE
?z(?z > 2000)
YEAR
Publication
Subscription

VLDB05 10
Data Model (RSS Documents) Publications are represented as directed
graphs with node and edge labels Node labels are typed
Literal value Class
Edge labels are typed Class
Classes can be related using multiple inheritance ontology

VLDB05 11
Query Language (GQL)
Queries are represented as directed graph patterns with node and edge labels
Node labels are variables Variables can be constrained by
Classes Class instances and literal values
Edge labels are class instances Mapping (matching) semantics
Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints are satisfied

VLDB05 12
Conclusion and Future Work
Proposed a prototype for graph-based metadata filtering
G-ToPSS supports high matching rate for an expressive subscription language
Extend G-ToPSS with full RDF language features
Optimize constraint processing during matching