strategies for executing federated queries in sparql1 · strategies for executing federated queries...

36
Strategies for executing federated queries in SPARQL1.1 Carlos Buil-Aranda 1 , Axel Polleres 2 and Jürgen Umbrich 2 1. Center for Semantic Web Research (CIWS), DCC, PUC, Chile 2. WU Wien (Vienna University of Economics & Business)

Upload: dangkhanh

Post on 09-May-2018

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Strategies for executing federated queries in SPARQL1.1Carlos Buil-Aranda1, Axel Polleres2 and Jürgen Umbrich2

1. Center for Semantic Web Research (CIWS), DCC, PUC, Chile 2. WU Wien (Vienna University of Economics & Business)

Page 2: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are
Page 3: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SPARQL Federated Query, i.e. query all these databases as !if they were a single one

How can I query all these

data?

Page 4: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

I want a list of mouse phenotypes and their

symbols

Page 5: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Now I want to combine the symbols with

standard scientific terminology

Page 6: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Use of the SERVICE keyword

VALUES operator for shipping

data to the remote endpoint

SPARQL 1.1 Federated Query Extension

SELECT  ?mgi  ?symbol  ?status  WHERE  {      SERVICE<http://mgi.bio2rdf.org/sparql>{          ?mgi  xSymbol  ?symbol  .      ?mgi  xHGNX  ?xhgnc      }      SERVICE<http://hgnc.bio2rdf.org/sparql>  {          ?xhgnc  Status  ?status      }  }

Page 7: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Some problems related to query federation:

Time outs

Result set incompleteness

Page 8: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

We will try to find algorithms for dealing with these problems in SPARQL query federation

Page 9: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

How the federation is implemented? (general idea)

The systems basically want to reduce the amount of data transmitted… …and the amount of processing time needed by the remote server As users, we want sound and complete results

Page 10: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Two ideas for the query federation algorithms: Query one dataset and use its results to constrain the query to the next dataset Query both datasets and join locally their results

How the federation is implemented? (general idea)

Page 11: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SPARQL Query Federation Algorithms

Use combinations of SPARQL operators

VALUES (implicitly used)

FILTER

UNION (a variant used in FedX)

Use well-known database algorithms

Nested Loop Join (used in Virtuoso, Sesame and Jena-Fuseki)

Hash Join (used in SIHJoin)

Page 12: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

How federated queries are executed (general case)

Page 13: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Local Federation Implementation

SPARQL Endpoint 1: • Virtuoso • Jena • Sesame

SPARQL Endpoint 2: • Virtuoso • Jena • Sesame

SERVICE query 1

SERVICE query 2

How federated queries are executed (general case)

Page 14: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Local Federation Implementation

SPARQL Endpoint 1: • Virtuoso • Jena • Sesame

SPARQL Endpoint 2: • Virtuoso • Jena • Sesame

SERVICE queries 1 and 2

SERVICE query 2

How federated queries are executed (only using SERVICE)

Page 15: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SERVICE implementation using VALUES

SELECT  *  (    service  (mgi)  (          (?X  xHGNX  ?xhgnc)  AND          (?X  xSymbol  ?symbol)  AND        )

service  (hgnc)(          (?xhgnc  status  ?status  )    VALUES  {      (?xhgnc)  (hgnc:13182,  hgnc:18126,  hgnc:27022)  }  

         service  (hgnc)  (          (?xhgnc  status  ?status)      )    )

?X ?symbol ?xhgncmgi:1913386 Znrd1 hgnc:13182mgi:3039616 Znrf3 hgnc:18126mgi:2443415 Zpld1 hgnc:27022

AND

Page 16: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SERVICE implementation using FILTER

SELECT  *  (    service  (mgi)  (          (?X  xHGNX  ?xhgnc)  AND          (?X  xSymbol  ?symbol)  AND        )

service  (hgnc)(      ?xhgnc  status  ?status          FILTER(?xghnc  =  hgnc:13182  ||            ?xhgnc  =  hgnc:18126  ||            ?xhgnc  =  hgnc:207022  )  )

?X ?symbol ?xhgncmgi:1913386 Znrd1 hgnc:13182mgi:3039616 Znrf3 hgnc:18126mgi:2443415 Zpld1 hgnc:27022

         service  (hgnc)  (          (?xhgnc  status  ?status)      )    )

AND

Page 17: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SERVICE implementation using FILTER + UNION

SELECT  *  (    service  (mgi)  (          (?X  xHGNX  ?xhgnc)  AND          (?X  xSymbol  ?symbol)  AND        )

(service  (hgnc)(      ?xhgnc  status  ?status          FILTER(?xhgnc  =  hgnc:13182)))  UNION  (service  (hgnc)(      ?xhgnc  status  ?status          FILTER(?xhgnc  =  hgnc:18126)))  UNION  (service  (hgnc)(      ?xhgnc  status  ?status          FILTER(?xhgnc  =  hgnc:27022)))

?X ?symbol ?xhgncmgi:1913386 Znrd1 hgnc:13182mgi:3039616 Znrf3 hgnc:18126mgi:2443415 Zpld1 hgnc:27022

         service  (hgnc)  (          (?xgnc  status  ?status)      )    )

AND

Page 18: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SERVICE implementation using a Nested Loop Join

SELECT  *  (    service  (mgi)  (          (?X  xHGNX  ?xhgnc)  AND          (?X  xSymbol  ?symbol)  AND        )

service  (hgnc)(          (hgnc:13182  status  ?status  )    )  service  (hgnc)(          (hgnc:18126  status  ?status  )    )  service  (hgnc)(          (hgnc:27022  status  ?status  )    )

         service  (hgnc)  (          (?xhgnc  status  ?status)      )    )

?X ?symbol ?xhgncmgi:1913386 Znrd1 hgnc:13182mgi:3039616 Znrf3 hgnc:18126mgi:2443415 Zpld1 hgnc:27022

AND

Page 19: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

data xhgncmgi:1913386, Znrd1 hgnc:13182mgi:3039616, Znrf3 hgnc:18126mgi:2443415, Zpld1 hgnc:27022

SERVICE implementation using a Symmetric Hash Join

SELECT  *  (    service  (mgi)  (          (?X  xHGNX  ?xhgnc)  AND          (?X  xSymbol  ?symbol)  AND        )  AND

         service  (hgnc)  (          (?xhgnc  status  ?status)      )    )

?status ?xhgnc"Approved" hgnc:13182"Approved" hgnc:18126

         service  (hgnc)  (          (?xhgnc  status  ?status)      )    )

Page 20: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Strategies Evaluation - ExampleLocal data:

Remote data::bob  :works  :riva  .  :bob  :works  :italy  .

Query:SELECT  *  WHERE  {    SERVICE  <example_server1>  {        ?X  foaf:knows  :peter    }  .    SERVICE  <example_server2>  {            {?Y  :works  :italy}            UNION            {?X  :works  :riva}    }  }

Results are:{?X  -­‐>  :bob},    {?Y  -­‐>  :bob,  ?X  -­‐>  :bob}

:bob  foaf:knows  :peter  .

Page 21: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Strategies Evaluation - Example (using FILTER & UNION)

However, if we evaluate that query with these data using the configuration before:

Jena-Fuseki Sesame VirtuosoSERVICE 2 2 2VALUES 2 2 2FILTER 2 1 1UNION 2 1 1

NESTED 2 2 2SYMHASH 1 1 1

Page 22: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

?person1 ?person2 ?place:bob :peter :riva

:peter :alice :rome:peter :alice :riva

?person1 ?person2 ?place:bob :peter :riva

:peter :mark

VALUES example (using OPTIONAL)SELECT  *  WHERE  {        ?person1  foaf:knows  ?person2  OPTIONAL  (?person1  :works  ?place)    }  VALUES  (?place)  {(:riva)  (:rome)}

:bob  foaf:knows  :peter  .  :peter  foaf:knows  :alice  .  :bob  :works  :puc  .

?place:riva

:rome

Remote Query:

Remote Data:

Page 23: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

There are (at least) two problems we may find

With FILTER and UNION strategies we may miss results

With VALUES we get mixed data plus "unwanted" results

There are other problems with NESTED (related to variable substitution)

Page 24: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

SELECT  *  WHERE  {      SERVICE<http://localhost:3030/mgi/sparql>  {          #  250923  results          ?s  rdf:type  ?type  .          OPTIONAL  {  ?s  mgi:xHGNC  ?hgnc_link  }      }      SERVICE  <http://localhost:3130/hgnc/sparql>{          #  23  results          ?hgnc_link  hgnc:status  "Approved".          ?hgnc_link  hgnc:date_modified  ?date  .          FILTER(?date  <  "1995-­‐01-­‐01")      }  }

Consider now this query with real data:

Page 25: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Strategies Evaluation - Bio2RDF Example

Jena-Fuseki Sesame VirtuosoSERVICE 5,361,421 5,361,421 toVALUES 6 6 23FILTER 6 6 6UNION 6 6 6

NESTED 5,361,421 5,361,421 toSYMHASH 6 6 6

We evaluate that query with these data using the configuration before obtaining:

Page 26: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Why all these inconsistent results happen?

The empty mapping is the join identity, it matches with everything

It is not null rejecting!

How can we fix it? preventing the injection of null values in the remote query

Besides, substituting a variable for a value that does not belong to the database is not a good idea

Page 27: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

One possible fix: Strongly Bound syntactic restriction [1,2](?mgi,  type,  ?type) ?mgi ?type!

?mgi:4420798 mgi:Marker…

(?mgi,  type,  ?type)        OPTIONAL  (          ?mgi,  works,  ?link_hgnc  )

?mgi ?link_hgncmgi:99926 hgnc:7415

mgi:102492… ..

(?mgi1,  pos,  “-­‐1")      UNION    (?mgi2,  pos,  “74.83")

?mgi1 ?mgi2mgi:99926

mgi:1916948… …

Page 28: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

(Fixed) Strategies evaluation with real data: which one is best? (data installed in a local network)

Page 29: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Evaluation Query SetCARD Q1 CARD Q2 #triple

patterns Q1#triple

patterns Q2 CARD Q

B0 27 1 1 1 1B1 27 33562 1 1 1B2 17817 33562 2 1 17547B3 16753 2 2 3 2B4 250924 23 2 2 6B5 16753 8771 2 2 3274B6 268743 27132 3 7 23873B7 35636 33134 3 4 17545

Page 30: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

(Time) Results on Jena-Fuseki (in milliseconds)

B0 B1 B2 B3 B4 B5 B6 B7SERVICE 1436 642 31620 39119 * * * toFILTER 439 360 7235 6022 13633 3638 26577 62947UNION 730 678 10657 15033 22269 7732 60814 63335VALUES 211 227 8247 5223 error error error 11646NESTED 758 643 52160 55445 * * * to

SYMHASH 403 16462 17563 1937 9494 13082 53591 28759

Page 31: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Results on Sesame

B0 B1 B2 B3 B4 B5 B6 B7SERVICE 77 320 1555 73 * * * 1968FILTER 149 596 4340 4788 7815 3230 18746 7032UNION 260 645 6415 10807 12502 5443 43648 13673VALUES 167 153 4289 2893 error error error 7042NESTED 443 385 52051 62083 to to to 89487

SYMHASH 101 14520 15037 1203 4662 5409 20915 21242

Page 32: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Results on Virtuoso

B0! B1 B2 B3 B4 B5 B6 B7SERVICE error error error error error error error errorFILTER 159 159 1731 31720 26329 31324 68749 37444UNION 267 237 30904 72495 55321 3737 7134 11990VALUES 137 117 6611 7561 error error error 13291NESTED 559 500 88240 128399 * to to 196280

SYMHASHH

102 1905 2733 2205 5525 2306 8149 3407

Page 33: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

ConclusionsAlgorithms for SPARQL query federation are not as straightforward as they seem

It is important to check their correctness

Which strategy/algorithm should I use?

Depends on the remote server

SYMHASH and FILTER use to perform well (real data, controlled scenario)

If these two do not work well for the setup, use UNION/VALUES (when possible)

Page 34: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Thanks.

Page 35: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

Any questions?

Page 36: Strategies for executing federated queries in SPARQL1 · Strategies for executing federated queries in SPARQL1.1 ... Fuseki) Hash Join (used in SIHJoin) How federated queries are

References[1] Semantics and optimization of the SPARQL 1.1 federation extension C Buil-Aranda, M Arenas, O Corcho, The Semanic Web: Research and Applications, 1-15. ![2] Federating queries in SPARQL 1.1: Syntax, semantics and evaluation. C Buil-Aranda, M Arenas, O Corcho, A Polleres, Web Semantics: Science, Services and Agents on the World Wide Web 18 (1), 1-17 ![3] SPARQL Web-Querying Infrastructure: Ready for Action? C Buil-Aranda, A Hogan, J Umbrich, PY Vandenbussche, The Semantic Web–ISWC 2013, 277-293. ![4] Strategies for executing federated queries in SPARQL1.1, C Buil-Aranda, A Polleres and Jürgen Umbrich, The Semantic Web–ISWC 2014, to appear.