the impact of data caching of on query execution for linked data

68
The Impact of Data Caching on Query Execution for Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Database and Information Systems Research Group Humboldt-Universität zu Berlin

Upload: olaf-hartig

Post on 29-Aug-2014

1.399 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: The Impact of Data Caching of on Query Execution for Linked Data

The Impact ofData Caching on

Query Execution for Linked Data

Olaf Hartighttp://olafhartig.de/foaf.rdf#olaf

@olafhartig

Database and Information Systems Research GroupHumboldt-Universität zu Berlin

Page 2: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2

SELECT DISTINCT ?i ?labelWHERE {

?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ; foaf:topic_interest ?i .

OPTIONAL { ?i rdfs:label ?label FILTER( LANG(?label)="en" || LANG(?label)="") }}ORDER BY ?label

?

Can we query the Web of Dataas of it were a single,giant database?

Our approach: Link Traversal Based Query Execution [ISWC'09]

Page 3: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3

Main Idea

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 4: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4

Main Idea

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 5: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 6: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 7: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 8: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8

query-localdataset

Main Idea

http://bob.name

?

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

“Descriptor object”

Page 9: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

Page 10: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

knows

http://bob.name

http://alice.name

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

Page 11: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

project ?prj

Query

?prjName

know

s

http://bob.name

?acq

knows

http://bob.name

http://alice.name

Page 12: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 13: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 14: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http

://al

ice.

nam

e

?

http://alice.name

?acq

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

Page 15: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

n

ame

project ?prj

Query

know

s

?acq

?prjName

http://bob.name

http://alice.name

?acq

Page 16: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

Page 17: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

project http://.../AlicesPrj

http://alice.name

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

Page 18: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name http://.../AlicesPrj

?prj?acq

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

project http://.../AlicesPrj

http://alice.name

Page 19: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

?prjName

know

s

http://bob.nameQuery

project ?prj?acq

http://alice.name http://.../AlicesPrj

?prj?acq

Page 20: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20

query-localdataset

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Main Idea

http://alice.name

?acq

n

ame

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name http://.../AlicesPrj

?prj?acq

http://.../AlicesPrj “ … “

?prjName?prj

Page 21: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21

query-localdataset

Main Idea

http://alice.name

?acq

n

ame

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name http://.../AlicesPrj

?prj?acq

http://.../AlicesPrj “ … “

?prjName?prj

http://alice.name

?acqhttp://.../AlicesPrj “ … “

?prjName?prj

● Intertwine query evaluation with traversal of data links

● We alternate between:● Evaluate parts of the query (triple patterns)

on a continuously augmented set of data● Look up URIs in intermediate

solutions and add retrieved datato the query-local dataset

Page 22: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22

Characteristics

● Link traversal based query execution:● Evaluation on a continuously augmented dataset● Discovery of potentially relevant data during execution● Discovery driven by intermediate solutions

● Main advantage:● No need to know all data sources in advance

● Limitations:● Query has to contain a URI as a starting point● Ignores data that is not reachable* by the query execution

* formal definition in [LDOW'11a]

Page 23: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

Page 24: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

http://bob.name?

Page 25: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

knows

http://bob.name

http://alice.name

Page 26: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

knows

http://bob.name

http://alice.name

?acq ?iLabel?i

Page 27: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27

query-localdataset

query-localdataset

The Issue

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

Page 28: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28

query-localdataset

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

Page 29: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?acq

?iLabel

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

knows

http://alice.name

http://bob.name

Page 30: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30

query-localdataset

Reusing the Query-Local Dataset

lab

el

interest?i

Querykn

ows

?iLabel

?acq

http://bob.name

nam

e

know

s

http://bob.nameQuery

project?acq

?prjName

?prj

http://alice.name

?acq

knows

http://bob.name

http://alice.name

Page 31: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31

Re-using the query-local dataset (a.k.a. data caching)may benefit

query performance + result completeness

Hypothesis

Page 32: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32

Contributions

● Systematic analysis of the impact of data caching● Theoretical foundation* ● Conceptual analysis* ● Empirical evaluation of the potential impact

● Out of scope: Caching strategies (replacement, invalidation)

*see [LDOW'11a]

Page 33: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33

Experiment – Scenario

● Information about thedistributed social network of FOAF profiles

● 5 types of queries

● Experiment Setup:● 20 persons● Sequential use➔ 100 queries

Page 34: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

query execution time (in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Experiment – Single Query

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

number of query results

● no reuse experiment:● No data caching

● reuse per query experiment● Reuse of query-local dataset

for 3 executions of each query● Third execution measured

Page 35: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35

query execution time (in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Experiment – Single Query

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

number of query results

● no reuse experiment:● No data caching

● reuse per query experiment● Reuse of query-local dataset

for 3 executions of each query● Third execution measured

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

Page 36: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Experiment – Single Query

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

query execution time (in seconds)number of query results

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

Page 37: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Experiment – Single Query

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

query execution time (in seconds)number of query results

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

Page 38: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Experiment – Single Query

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

query execution time (in seconds)number of query results

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

Page 39: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39

Experiment – Complete Sequence

number of query results

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

reuse all queries

query execution time(in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

● reuse all queries experiment:● Reuse of the query-local

dataset for the complete sequence of all 100 queries

Page 40: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40

Experiment – Complete Sequence

number of query results

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

reuse all queries

query execution time(in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Page 41: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41

Experiment – Complete Sequence

number of query results

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

reuse all queries

query execution time(in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Page 42: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42

Experiment – Complete Sequence

number of query results

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

reuse all queries

query execution time(in seconds)

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Page 43: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43

Experiment – Complete Sequence

query execution time(in seconds)

number of query results

(Query No. 61)

(Query No. 62)

(Query No. 63)

(Query No. 64)

(Query No. 65)

ContactInfoDanBri

UnsetPropsDanBri

2ndDegree1DanBri

2ndDegree2DanBri

IncomingDanBri

0 10 20 30 40 50 60

0 10 20 30 40 50 60

no reuse reuse per query

reuse all queries

0,01 0,1 1 10 100

0,01 0,1 1 10 100

Page 44: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44

Outlook

● Requirements of a data cache:● Replacement mechanism● Coherency mechanism

Page 45: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45

Cache Replacement

● Cache full → remove descriptor objects

● Replacement strategy● Primary goal: maximize hit rate● Recency-based● Frequency-based● Function-based● Randomized

● Replacement process● Watermarks: high and low

Page 46: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 46

Studying Cache Replacement?

Page 47: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47

Studying Cache Replacement?

“Web cache replacement in its general form seems to be a solved topic.”

S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003

Page 48: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48

Studying Cache Replacement?

● 6 quad indexes* in main memory● Size grows linear in the number of quads

● Example (after reuse all queries experiment, 100 queries):● 905 descriptor objects, overall number of 745,756 triples● ca. 103 MB

➔ Available main memory is almost no limit

“Web cache replacement in its general form seems to be a solved topic.”

S. Podlipnig and L. Böszörmenyi: Survey of Web Cache Replacement Strategies, 2003

* as introduced in [LDOW'11b]

Page 49: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49

Cache Coherency

● Data items in the cache may become inconsistent

● Strong cache consistency● Server validation● Client validation

● Weak cache consistency● Time to live (TTL)● Adaptive TTL

Page 50: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50

Client Validation

● Polling every time

● Enables strong cache consistency

● Conditional GET● Request with If-Modified-Since header● Possible response: 304 Not Modified

Page 51: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51

Client Validation

● Polling every time

● Enables strong cache consistency

● Conditional GET● Request with If-Modified-Since header● Possible response: 304 Not Modified

● Not supported by most Linked Data servers● Experiment based on the CKAN catalog of linked datasets

● 41 out of 154 example resources (26.6%) from 110 datasets

Page 52: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52

Time to Live (TTL)

● TTL field: life time estimation for each object

● Supported by HTTP response headers:● Expires● Cache-Control: max-age

● When TTL elapses, object is invalid

● Accessing an invalid object → re-retrieve object again● Conditional GET

Page 53: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53

Time to Live (TTL)

● TTL field: life time estimation for each object

● Supported by HTTP response headers:● Expires● Cache-Control: max-age

● When TTL elapses, object is invalid

● Accessing an invalid object → re-retrieve object again● Conditional GET

● Alternative (due to lack of support in Linked Data servers):● Assume a default TTL for each object● Ordinary GET

37.7%

37.0%

26.6%

Page 54: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54

Adaptive TTL

● Assumption:● The older an object, the less likely it is to be modified

● TTL is a percentage of the age:● Threshold = 10% ; age = 30 days → TTL = 3 days● Last verification: yesterday → invalidation in 2 days

● HTTP-based implementation:● Calculation of age: use Last-Modified response header● Verification with conditional GET

Page 55: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55

Adaptive TTL

● Assumption:● The older an object, the less likely it is to be modified

● TTL is a percentage of the age:● Threshold = 10% ; age = 30 days → TTL = 3 days● Last verification: yesterday → invalidation in 2 days

● HTTP-based implementation:● Calculation of age: use Last-Modified response header● Verification with conditional GET

● Alternative (due to lack of support in Linked Data servers):● Assume Last-Modified is time of first retrieval● Verification by comparing a response to the current version

35.1%

Page 56: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56

Summary

● Systematic analysis of the impact of data cache ● Theoretical foundation● Conceptual analysis● Empirical evaluation

● Main findings:● Additional results possible (for semantically similar queries)● Impact on performance may be positive but also negative

● Future work:● Analysis of caching strategies in our context● Main issue: invalidation

Page 57: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 57

Backup Slides

Page 58: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58

Contributions

● Theoretical foundation (extension of the original definition)

● Reachability by a Dseed

-initialized execution of a BGP query b

● Dseed

-dependent solution for a BGP query b

● Reachability R(B) for a serial execution of B = b1 , … , b

n

➔ Each solution for bcur

is also R(B)-dependent solution for bcur

● Conceptual analysis of the impact of data caching

● Performance factor: p( bcur

, B ) = c( bcur

, [ ] ) – c( bcur

, B )

● Serendipity factor: s( bcur

, B ) = b( bcur

, B ) – b( bcur

, [ ] )

● Empirical verification of the potential impact

● Out of scope: Caching strategies (replacement, invalidation)

Page 59: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 59

Query Template Contact

SELECT * WHERE { <PERSON> foaf:knows ?p .

OPTIONAL { ?p foaf:name ?name } OPTIONAL { ?p foaf:firstName ?firstName } OPTIONAL { ?p foaf:givenName ?givenName } OPTIONAL { ?p foaf:givenname ?givenname } OPTIONAL { ?p foaf:familyName ?familyName } OPTIONAL { ?p foaf:family_name ?family_name } OPTIONAL { ?p foaf:lastName ?lastName } OPTIONAL { ?p foaf:surname ?surname }

OPTIONAL { ?p foaf:birthday ?birthday }

OPTIONAL { ?p foaf:img ?img }

OPTIONAL { ?p foaf:phone ?phone } OPTIONAL { ?p foaf:aimChatID ?aimChatID } OPTIONAL { ?p foaf:icqChatID ?icqChatID } OPTIONAL { ?p foaf:jabberID ?jabberID } OPTIONAL { ?p foaf:msnChatID ?msnChatID } OPTIONAL { ?p foaf:skypeID ?skypeID } OPTIONAL { ?p foaf:yahooChatID ?yahooChatID }

}

Page 60: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60

Query Template UnsetProps

SELECT DISTINCT ?result ?resultLabel WHERE{

?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .?result rdfs:domain foaf:Person .

OPTIONAL { <PERSON> ?result ?var0 } FILTER ( !bound(?var0) )

<PERSON> foaf:knows ?var2 .?var2 ?result ?var3 .?result rdfs:label ?resultLabel .?result vs:term_status ?var1 .

}ORDER BY ?var1

Page 61: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61

Query Template Incoming

SELECT DISTINCT ?result WHERE{

?result foaf:knows <PERSON> .

OPTIONAL{

?result foaf:knows ?var1 .FILTER ( <PERSON> = ?var1 )<PERSON> foaf:knows ?result .

}FILTER ( !bound(?var1) )

}

Page 62: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62

Query Template 2ndDegree1

SELECT DISTINCT ?result WHERE{

<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )

?p1 foaf:knows ?result .FILTER ( <PERSON> != ?result )?p2 foaf:knows ?result .

OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )

}FILTER ( !bound(?knows) )

}

Page 63: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63

Query Template 2ndDegree2

SELECT DISTINCT ?result WHERE{

<PERSON> foaf:knows ?p1 .<PERSON> foaf:knows ?p2 .FILTER ( ?p1 != ?p2 )

?result foaf:knows ?p1 .FILTER ( <PERSON> != ?result )?result foaf:knows ?p2 .

OPTIONAL {<PERSON> ?knows ?result .FILTER ( ?knows = foaf:knows )

}FILTER ( !bound(?knows) )

}

Page 64: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64

Experiment – Single Query

● In the ideal case for Bupper

= [ bcur

, bcur

] :

● pupper

( bcur

, Bupper

) = c( bcur

, [ ] ) – c( bcur

, Bupper

) = c( bcur

, [ ] )

● supper

( bcur

, Bupper

) = b( bcur

, Bupper

) – b( bcur

, [ ] ) = 0

1 Averaged over all 100 queries

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse27.37

(140.49)

0.849(0.205)

64.95 s(124.50)

reuse per query26.71

(148.77)

1(0)

0.02 s(0.07)

Page 65: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65

Experiment – Single Query

● Summary (measurement errors aside):● Same number of query results● Significant improvements in query performance

1 Averaged over all 100 queries

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse27.37

(140.49)

0.849(0.205)

64.95 s(124.50)

reuse per query26.71

(148.77)

1(0)

0.02 s(0.07)

Page 66: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66

Experiment – Complete Sequence

● Summary:● Data cache may provide for additional query results● Impact on performance may be positive but also negative

1 Averaged over all 100 queries

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse27.37

(140.49)

0.849(0.205)

64.95 s(124.50)

reuse per query26.71

(148.77)

1(0)

0.02 s(0.07)

reuse all queries44.87

(178.36)

0.991(0.053)

37.91 s(112.94)

Page 67: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67

Experiment – Complete Sequence

● Executing the query sequence in a random order results in measurements similar to the given order.

Experiment Avg.1 number of Query Results

(std.dev.)

Average1

Hit Rate

(std.dev.)

Avg.1 query Execution Time

(std.dev.)

no reuse27.37

(140.49)

0.849(0.205)

64.95 s(124.50)

reuse per query26.71

(148.77)

1(0)

0.02 s(0.07)

reuse all queries44.87

(178.36)

0.991(0.053)

37.91 s(112.94)

reuse all queries(random orders)

118.18(867.07)

0.992(0.016)

20.61 s(216.61)

Page 68: The Impact of Data Caching of on Query Execution for Linked Data

Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68

These slides have been created byOlaf Hartig

http://olafhartig.de

This work is licensed under aCreative Commons Attribution-Share Alike 3.0 License

(http://creativecommons.org/licenses/by-sa/3.0/)