vldb2005 cms-topss: efficient dissemination of rss documents milenko petrovic haifeng liu hans-arno...

12
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto

Upload: karen-park

Post on 13-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

VLDB2005

CMS-ToPSS: Efficient Dissemination of RSS Documents

Milenko Petrovic Haifeng Liu Hans-Arno JacobsenUniversity of Toronto

VLDB05 2

Information Dissemination

Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers

RSS frequently used to disseminate update to interested users CNN.com, Yahoo! News, Amazon.com, MSN search (beta)

RSSaggregator

RSSreaders

RSSpublishers

Problem: Polling based architecture

VLDB05 3

Solution!

Current rss dissemination architecture

G-ToPSS rss dissemination architecture

VLDB05 4

Interaction Model: Publish/Subscribe

Broker

Publisher Publisher

Subscriber Subscriber

RSS feeds

MatchingRSS feeds

MatchingRSS feeds

Queries over all RSS

VLDB05 5

Research challenges

1. Need a subscription (query) language suitable for filtering of rss documents

2. Need an efficient matching algorithm based on graph representation• Structurally matching• Constraint matching

3. Scalability to a large number of subscriptions and high publishing rate

VLDB05 6

CMS-ToPSS System Architecture

VLDB05 7

vary number of subs

020406080

100120140160

5,00

08,

000

10, 0

00

20, 0

00

30, 0

00

40, 0

00

50, 0

00

60, 0

00

70, 0

00

80, 0

00

90, 0

00

100,

000

number of subscri pti ons

matc

hing

tim

e (m

s)

Subscription Scalability

VLDB05 8

Memory Scalability

memory vs. #subs

0100200300400500600700800

5,00

08,

000

10, 0

00

20, 0

00

30, 0

00

40, 0

00

50, 0

00

60, 0

00

70, 0

00

80, 0

00

90, 0

00

100,

000

number of subscri pti ons

memo

ry s

ize

(M)

VLDB05 9

Matching Semantics

PAPER17

“Arno Jacobsen”

AUTHOR

SIGMOD

CONFERENCE

“California”

LOCATION“2001”

YEAR

?y(?y <= Publication)

“Arno Jacobsen”

AUTHOR

SIGMOD

CONFERENCE

?z(?z > 2000)

YEAR

Publication

Subscription

VLDB05 10

Data Model (RSS Documents) Publications are represented as directed

graphs with node and edge labels Node labels are typed

Literal value Class

Edge labels are typed Class

Classes can be related using multiple inheritance ontology

VLDB05 11

Query Language (GQL)

Queries are represented as directed graph patterns with node and edge labels

Node labels are variables Variables can be constrained by

Classes Class instances and literal values

Edge labels are class instances Mapping (matching) semantics

Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints are satisfied

VLDB05 12

Conclusion and Future Work

Proposed a prototype for graph-based metadata filtering

G-ToPSS supports high matching rate for an expressive subscription language

Extend G-ToPSS with full RDF language features

Optimize constraint processing during matching