on web stream processing
Embed Size (px)
TRANSCRIPT

Department of Informatics
On web stream processing
Daniele Dell’Aglio
[email protected] http://dellaglio.org @dandellaglio
Linköping, 22.11.2017

RDF Stream Processing
StreamProcessing
RDF
&
SPARQL
RDF Stream Processing
(RSP)
Real-time
processing of
highly dynamic
data
Semantic Web
technologies for
data exchange
through the Web
Linköping, 22.11.2017 On web stream processing 2

Finding agreements
Many topics
– RDF streams
– Stream reasoning
– Complex event processing
– Stream query processing
– Internet/web of things
Many studies
– Data models
– Query models
– Prototypes
– Benchmarks
– Datasets
W3C RSP community group (2013 – 2016)
– Effort to (discuss | formalise | standardise | combine | evangelise) the
existing studies on RSP
– Outcomes
– Abstract model for RDF streams
– Requirements document for query languages of RDF streams
– More at: https://www.w3.org/community/rsp/
Linköping, 22.11.2017 On web stream processing 3

But...
W3C RSP sets some foundations and requirements, but:
– Standard protocols and exchanging mechanisms for RDF stream are still missing
– We need generic and flexible solutions for making RDF streams available and exchangeable on the Web
Linköping, 22.11.2017 On web stream processing 4

The goal: a decentralized web of RSPs
MorphStreams
CSPARQL
TrOWL StreamRule
CQELS
CSPARQL
Instans
Q1: How can we let RSP engines interact and
exchange streams on the web?
Linköping, 22.11.2017 On web stream processing 5

The goal: a decentralized web of RSPs in the web
MorphStreams
CSPARQL
StreamRule
CSPARQL
Instans
SPARQL
Q2: How to integrate stream processing with
background knowledge exposed remotely on the web?
SPARQL
CQELS
TrOWL
Linköping, 22.11.2017 On web stream processing 6

EXCHANGING STREAMS ON
THE WEB
Linköping, 22.11.2017 On web stream processing 7

How far are we?
Documents from RSP
– Abstract model of RDF Stream
– Requirements for query languages for RDF Stream
Protocols to exchange data streams on the web and internet
– WebSocket, MQTT
Description of the stream
– SSN
Interfaces to control RSP engines
Linköping, 22.11.2017 On web stream processing 8

Requirements
A framework for RDF stream exchange should
1. prioritize active paradigms for data stream exchange
2. enable the combination of streaming and stored data
3. enable the possibility to build reliable, distributed and scalable streaming applications
4. guarantee a wide range of operations over the streams
5. support the publication of information about the stream
6. support the exchange of a wide variety of streams
7. exploit as much as possible existing protocols and standards
Linköping, 22.11.2017 On web stream processing 9

WeSP
A framework to publish and exchange RDF streams on the web
• A model to serialise RDF streams
• A model to describe RDF streams
• A communication protocol
Linköping, 22.11.2017 On web stream processing 10

A model to serialise RDF streams
An RDF stream can be represented as an (infinite) ordered sequence of time-annotated data items (RDF graphs)…
... serialized in JSON-LD
[{ "@graph": {"@id": "http://.../G1",{ "@id": "http://.../a", "http://.../isIn": {"@id":"http://.../rRoom"}}
},{ "@id": "http://.../G1","prov:generatedAt":"2016-16-12T00:01:00"
}},{ "@graph": {
"@id": "http://.../G2",{ "@id": "http://.../b",
"http://.../isIn": {"@id":"http://.../bRoom"}}},{ "@id": "http://.../G2",
"prov:generatedAt":" 2016-16-12T00:03:00"}
},…
Compliant with RDF, as well as W3C RSP abstract data model
G1
G2
G3
{:a :isIn :rRoom}
{:b :isIn :bRoom}
{:c :talksIn :rRoom,
:d :talksIn :bRoom}
S
3
5
1
t
Linköping, 22.11.2017 On web stream processing 11

A model to describe RDF streams
A description of the RDF stream should be provided
• The identifier of the stream
• A description of the schema of the stream items
• Data item samples
• The location of the stream endpoint (e.g. WebSocket URL)
This description is provided through the RDF Stream Descriptor
• Serialised in RDF
• An extension of DCAT and SPARQL Service Descriptor
• Published according to the linked data principles
Linköping, 22.11.2017 On web stream processing 12

A communication protocol
Two interfaces
• Producer
• Consumer
We distinguish three types of actors (depending on the implemented interfaces)
Producer Consumer
Stream source
Stream
transformer
Stream sink
Linköping, 22.11.2017 On web stream processing 13

A communication protocol: push-based streams
Producer
ConsumerStream Descriptor
endpoint
RDF stream
endpoint
Get stream descriptor (SD)
SDProcess
SD
Subscribe to stream
Stream item
Stream item
Stream item…
Process
stream
Linköping, 22.11.2017 On web stream processing 14

A communication protocol: pull-based streams
Producer
ConsumerStream Descriptor
endpoint
RDF stream
endpoint
Get stream descriptor (SD)
SDProcess
SD
GET items
Stream items
…
Process
stream
GET items
Stream items
GET items
Stream items
Linköping, 22.11.2017 On web stream processing 15

Protocols
The RDF Stream Descriptor is accessible through HTTP
The transmission of the stream can happen through different protocols
• HTTP chunked encoding
• WebSocket
• Message Queing Telemetry Transport (MQTT)
• Server-Sent Events (SSE)
• HTTP
• ...
Linköping, 22.11.2017 On web stream processing 16

WeSP: Proof of concepts
C-SPARQL
• Stream transformer
• WeSP implemented as a wrapper
• https://github.com/dellaglio/csparql-wesp
CQELS
• Stream transformer
• Native implementation of WeSP
• https://github.com/cqels/CQELS-1.x/
TripleWave
• Stream source
• Native implementation of WeSP
• http://streamreasoning.github.io/TripleWave
Linköping, 22.11.2017 On web stream processing 17

TripleWave
TripleWave is open source
• Learn more at: https://streamreasoning.github.io/TripleWave/
Triple
Wave
input?
RDF Streams(Web socket |
HTTP-chunk |
etc.)
Stream
Descriptor
Linköping, 22.11.2017 On web stream processing 18

Feeding TripleWave
TripleWave supports a variety of data sources:
• RDF dumps with temporal information
• RDF with temporal information exposed through SPARQL endpoints
• Streams available on the Web
Web API
Transform Stream
Graph stream
Connector stream
Datagen stream
Scheduler stream
Web Service
SPARQL Endpoint
File
R2RML Mapping
Conversion
Replay
Replay loop
Linköping, 22.11.2017 On web stream processing 19

Summary
WeSP: framework to exchange RDF streams on the web
– RDF to serialise the stream items
– RDF to describe the stream
– Application and communication protocols: HTTP, WebSocket, MQTT, etc.
– Interfaces to produce and consume RDF streams
What’s next?
– Relation with other technologies: LDN, Activity Streams, etc.
– Adoption
– Federated stream processing over the Web
Linköping, 22.11.2017 On web stream processing 20

COMBINING STREAMS AND
BACKGROUND DATA
Linköping, 22.11.2017 On web stream processing 21

The goal: a decentralized web of RSPs in the web
MorphStreams
CSPARQL
StreamRule
CSPARQL
Instans
SPARQL
Q2: How to integrate stream processing with
background knowledge exposed remotely on the web?
SPARQL
CQELS
TrOWL
Linköping, 22.11.2017 On web stream processing 22

W(ω,β)
Evaluation
Time-based sliding window
S3
S4 S5
S6
S7
S8
S9 S10
S11
S12
S1
S2
β
ω
t
widthslideWindow
S
Linköping, 22.11.2017 On web stream processing 23

Join
RDF stream generator
Background data(SPARQL endpoint)
Win
do
w
The setting
Background data changes and it is stored on the web
Accessing background data is costly
Is it possible to avoid a continuous access to the background data?
Linköping, 22.11.2017 On web stream processing 24

Local view
How to cope with changes on the background data?
Join
RDF stream generator
Background data(SPARQL endpoint)
Win
do
w
Local view
Linköping, 22.11.2017 On web stream processing 25

Maintenance process
Maintenance introduces a trade-off between response quality and time.
We propose to manage this trade-off by fixing time dimension based on query constraints and maximizing freshness of response.
Join
RDF stream generator
Background data(SPARQL endpoint)
Win
do
w
Local View
Maintenance process
Linköping, 22.11.2017 On web stream processing 26

How to track background data changes?
Update streams
• stream with changes available to the query processor
• rarely available on the Web, e.g. Wikipedia, SPARQLPush
Data changes regularly
• data generated by automatic processes that refresh it periodically
• data warehouses, sensors
Data changes “randomly”
• Twitter user profiles, taxi status, financial updates
Linköping, 22.11.2017 On web stream processing 27

Requirements
The maintenance process:
1. should take into account the change rates of the data elements in the background data;
2. should consider the dynamicity of the change rate values;
3. should satisfy the Quality of Service constraints on responsiveness and freshness of the answer;
4. may consider the query and its definition.
Linköping, 22.11.2017 On web stream processing 28

A query-driven maintenance process
WINDOW(S, ω, β) PW JOIN SERVICE(BKG) PS
WINDOW clause
JOIN Proposer Ranker
MaintainerLocal View
Ω𝑗𝑜𝑖𝑛4 2
3
1
SERVICE clause
E
C
RND
LRU
WBM
SBM
IBM
WSJ
Linköping, 22.11.2017 On web stream processing 29

τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
Terminology
Best Before Time: the time that an element will
become stale and is defined by:
Mappings from the WINDOW clause
Mappings in the LOCAL VIEW
Compatible mappings
Linköping, 22.11.2017 On web stream processing 30

τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WSJ
WSJ identifies the candidate set: the possibly stale local view mappings involved in the current evaluation.
WSJ analyzes the content of the current window evaluation and identifying the compatible mappings in the local view.
The possibly stale mappings are identified by analyzing the associated best before time
Linköping, 22.11.2017 On web stream processing 31

V L Score
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM
WBM ranks the candidate set to determine which mappings to update.
The ranking is computed through two values: the renewed best before time and the remaining life time
The top k elements are selected to be refreshed. The value k is selected according to the responsiveness constraint.
Linköping, 22.11.2017 On web stream processing 32

V L Score
3
4
1
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM: renewed best before time
When would the mappings became stale if refreshed now?
The renewed best before time V is computed as:
Linköping, 22.11.2017 On web stream processing 33

V L Score
3 3
4 1
1 3
τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
WBM: remaining life time and score
For how many future evaluations the mappings is involved?
The remaining life time L is computed as:
WBM ranks the mappings by using a score:
Score=min(L,V)
is selected for the maintenance
Linköping, 22.11.2017 On web stream processing 34

Experiments
Linköping, 22.11.2017 On web stream processing 35

τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
Extensions: SBM
It exploits the fact that mappings may have n-n relations
• Each pair generates a join (e.g. BGP)
If is refreshed, there will be four fresh mappings
If is refreshed, there will be five fresh mappings
is selected for the maintenance
Linköping, 22.11.2017 On web stream processing 36

τ
t5 6 7 8 9 10 11
W1 W2 W3 W4
124
5 6 7 8 9 10 11 124
Extensions: SBM
It exploits the fact that mappings may have n-n relations
• A result is fresh if all the pairs are fresh (e.g. aggregations)
If is refreshed, there will be one fresh mapping
If is refreshed, there will be two fresh mappings
is selected for the maintenance
fresh
Linköping, 22.11.2017 On web stream processing 37

Other extensions
We developed a other rankers:
IBM: combines WBM and SBM, taking into account both the number of produced join mappings in the present and in future windows
FBA: dynamic allocations of the refresh operations among different evaluations
F rankers: extensions of the presented rankers to cope with queries with FILTER clauses on the subquery over the background data
Linköping, 22.11.2017 On web stream processing 38

Summary
We proposed using the idea of materialization to optimize processing continuous queries.
We proposed a policy to maximize the freshness according to time constraint in continuous query.
We tested our policy against based line policies (LRU and Random).
Future Work:
– Measuring the time overhead of maintenance
– Investigating more queries involving both remote SPARQL endpoints and streams.
– Dynamically estimating the change rate of users.
Linköping, 22.11.2017 On web stream processing 39

Acknowledgments
Linköping, 22.11.2017 On web stream processing 40

Conclusions
RDF (or semantic) streams are getting a momentum
• Several active research groups, working on querying and reasoning
• Prototypes, methods and applications
• Query languages, ontologies
• Use cases
However, the web dimension has only been slightly considered
Linköping, 22.11.2017 On web stream processing 41

What’s next?
We still need
• Infrastructures and standards to exchange (RDF) streams on the Web
• Agreements on languages to specify tasks over such streams
• Query languages richer than SPARQL not only to manage streams, but also to express higher-level operations
• Methods to manage reasoning tasks over streams
The Web dimension requires to be studied and understood
• Combination of remote streams and background data requires new solutions
• Not only queries, but also constraints over them (QoS)
Linköping, 22.11.2017 On web stream processing 42

Thank you! Questions?
On web stream processing
Daniele Dell’Aglio
http://dellaglio.org
@dandellaglio
Linköping, 22.11.2017 On web stream processing 43

Find more: Q1
• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, E. Della Valle, K. Aberer: Where Are the RDF Streams?: On Deploying RDF Streams on the Web of Data with TripleWave. Poster at International Semantic Web Conference 2015.
• A. Mauri, J.-P. Calbimonte, D. Dell’Aglio, M. Balduini, M. Brambilla, E. Della Valle, K. Aberer: TripleWave: Spreading RDF Streams on the Web. Resource Paper at International Semantic Web Conference 2016.
• D. Dell'Aglio, D. Le Phuoc, A. Lê Tuán, M. Intizar Ali, J.-P.Calbimonte: On a Web of Data Streams. DeSemWeb@ISWC 2017
Linköping, 22.11.2017 On web stream processing 44

Find more: Q2
• S. Dehghanzadeh, A. Mileo, D. Dell'Aglio, E. Della Valle, Shen Gao, A. Bernstein: Online View Maintenance for Continuous Query Evaluation. WWW (Companion Volume) 2015: 25-26
• S. Dehghanzadeh, D. Dell'Aglio, S. Gao, E. Della Valle, A. Mileo, A. Bernstein: Approximate Continuous Query Answering over Streams and Dynamic Linked Data Sets. ICWE 2015: 307-325
• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: When a FILTER Makes the Difference in Continuously Answering SPARQL Queries on Streaming and Quasi-Static Linked Data. ICWE 2016: 299-316
• S. Gao, D. Dell'Aglio, S. Dehghanzadeh, A. Bernstein, E. Della Valle, A. Mileo: Planning Ahead: Stream-Driven Linked-Data Access Under Update-Budget Constraints. International Semantic Web Conference (1) 2016: 252-270
• S. Zahmatkesh, E. Della Valle, D. Dell'Aglio: Using Rank Aggregation in Continuously Answering SPARQL Queries on Streaming and Quasi-static Linked Data. DEBS 2017: 170-179
Linköping, 22.11.2017 On web stream processing 45