continuous self-updating query results over dynamic linked data

22
Continuously Self-Updating Query Results over Dynamic Linked Data Ruben Taelman - @rubensworks iMinds - Ghent University

Upload: ruben-taelman

Post on 10-Feb-2017

337 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Continuous Self-Updating Query Results over Dynamic Linked Data

Continuously Self-Updating Query Results over Dynamic Linked Data

Ruben Taelman - @rubensworksiMinds - Ghent University

Page 2: Continuous Self-Updating Query Results over Dynamic Linked Data

Dynamic Linked DataE.g. Thermometer measures every minute:

“19,05°C” - 30-05-2016 11:00“19,06°C” - 30-05-2016 11:01“19,11°C” - 30-05-2016 11:02“19,08°C” - 30-05-2016 11:03…

Typically exposed as an RDF stream = stream of <RDF triple, timestamp>

Page 3: Continuous Self-Updating Query Results over Dynamic Linked Data

Querying continous dataClients send queries to server: e.g. What is the current temperature?

Server continuously evaluates the queries

→ Server does all of the work

Cause of low public endpoint availability!½ have availability of < 95% (Buil-Aranda 2013)

→ Clients just wait for results

Page 4: Continuous Self-Updating Query Results over Dynamic Linked Data

What if we moved continuous query evaluation to the client?→ to lower server load

Page 5: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results

Page 6: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results

Page 7: Continuous Self-Updating Query Results over Dynamic Linked Data

Research questionsHow to publish of dynamic data, to make it queryable together with static data at a low server cost?

How can we efficiently store dynamic data and allow efficient transfer to clients?

What kind of server interface do we need to enable client-side query evaluation over both static and dynamic data?

Page 8: Continuous Self-Updating Query Results over Dynamic Linked Data

Hypotheses1. Our storage solution can store new data in linear time with respect to the

amount of new data.

2. Our storage solution can retrieve data by time or triple values in linear time with respect to the amount of retrieved data.

3. The server cost for our solution is lower than the alternatives.

4. Data transfer is the main factor influencing query execution time.

Page 9: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results

Page 10: Continuous Self-Updating Query Results over Dynamic Linked Data

Moving continuous query evaluation to the client

Page 11: Continuous Self-Updating Query Results over Dynamic Linked Data

Triple Pattern Fragments does this for static data!

Triple pattern fragments (TPF) (Verborgh 2016):

Servers can only respond to triple pattern queriesClients need to evaluate queries locally→ Lowers server complexity

Page 12: Continuous Self-Updating Query Results over Dynamic Linked Data

How I will do this for dynamic data

Storage Transmission Query evaluation

Page 13: Continuous Self-Updating Query Results over Dynamic Linked Data

StorageHow do we efficiently store / retrieve dynamic data? (Indexing)

It depends on the use cases:

Querying on a certain time (Indexing by time)

What was the temperature in Ghent yesterday?

Querying for a certain time (Indexing by property)

When was it 20°C in Ghent?

Can we / Do we have to combine these indexing techniques?

Page 14: Continuous Self-Updating Query Results over Dynamic Linked Data

TransmissionDisadvantage:Moving query evaluation to the client requires more data to be transfered→ Increases bandwidth usage

→ Slows down query evaluation→ Limits query frequency

Possible solutions:Compression within and between versionsCachingHigher data selectivity

Page 15: Continuous Self-Updating Query Results over Dynamic Linked Data

Query EvaluationScope: Data with a predictable valid time

Some thermometers measure /min → data will not change during that minute.Otherwise we need to poll or have a persistent server connection

Annotate data with their valid time:

Thermometer_1 : 10°C (10:00 - 10:01)Thermometer_1 : 20°C (10:01 - 10:02)Thermometer_1 : 20°C (10:02 - 10:03)

→ Clients can fetch this data as if it was static data

Page 16: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results

Page 17: Continuous Self-Updating Query Results over Dynamic Linked Data

Evaluation of the three partsStorage

Transmission

Query evaluation

Insertion, lookup, size

Latency, bandwidth, cacheability

Result latency

Page 18: Continuous Self-Updating Query Results over Dynamic Linked Data

Combined evaluationRealistic datasets/datastreams and queries

Compare with:Server-side:

C-SPARQL (Barbieri 2012)CQELS (Le-Phuoc 2011)

Client-side:Ztreamy (Fisteus 2014)

Compare by:latencycompletenessserver loadclient loadscalability

→ LSBench (Le-Phuoc 2012), SRBench (Zhang 2012), CityBench (Ali 2015), ...

Page 19: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results

Page 20: Continuous Self-Updating Query Results over Dynamic Linked Data

Preliminary scalability testQuery Streamer prototype (Taelman 2016), based on TPF

Test server load for increasing #clients

Compared with C-SPARQL, CQELS

Page 21: Continuous Self-Updating Query Results over Dynamic Linked Data

Query Streamer moves load from server to client

Server scalability Client load

Page 22: Continuous Self-Updating Query Results over Dynamic Linked Data

OverviewResearch questions

Research approach

Evaluation plan

Preliminary results