splendid: sparql endpoint federation exploiting void descriptions

Post on 09-Jul-2015

3.038 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Institute for Web Science and Technologies

University of Koblenz ▪ Landau, Germany

SPLENDID: SPARQL Endpoint Federation

Exploiting VOID Descriptions

Olaf Görlitz, Steffen Staab

Slide 2WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Motivation

How to access a large number of linked data sources?

Slide 3WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Data Integration Approaches

Data Warehouse

Efficient query execution Complete results Data copies Inflexible

Link Traversal

Live Data Access Flexible / On Demand Incomplete results Biased by starting point

Slide 4WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Our Approach

Live data accessFlexible source integrationEffective query planningComplete results

Data Federation

Hypothesis:Efficient query federation is possible using core Semantic Web technology (i.e. SPARQL endpoints, VoiD descriptions)

Slide 5WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

VoiD: „Vocabulary of Interlinked Datasets“

}}

}

} General Information

Basic statisticstriples = 732744

Type statisticschebi:Compound = 50477

Predicate statisticsbio:formula = 39555

Slide 6WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Distributed Query Processing

Contribution:Apply Best Practices of RDBMS for RDF Federation

http://code.google.com/p/rdffederator/

Slide 7WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Example

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

Which drugs are categorized as micronutrients?

Slide 8WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

Source Selection Join Optimization Query Execution

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

Slide 9WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

Source Selection Join Optimization Query Execution

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

predicate-indexdrugbank:drugCategory → drugbank

type-indexkegg:Drug → kegg

1. Step: Index-based source mapping

→ drugbank

→ kegg

→ kegg, dbpedia, Chebi

→ drugbank

→ kegg

Slide 10WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

No index for subject / object values

2. Step: Refinement with ASK Queries

Source Selection Join Optimization Query Execution

Slide 11WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Query Processing

SELECT ?drug ?title WHERE {  ?drug drugbank:drugCategory category:micronutrient .  ?drug drugbank:casRegistryNumber ?id .  ?keggDrug rdf:type kegg:Drug .  ?keggDrug bio2rdf:xRef ?id .  ?keggDrug purl:title ?title . }}

3. Step: Grouping Triple Patterns

Source Selection Join Optimization Query Execution

}}

drugbank

kegg

} kegg, dbpedia, Chebi

+ grouping sameAs patterns

Slide 12WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Join Order Optimization

Source Selection Join Optimization Query Execution

bind join /hash join

Dynamic Programming with statistics-based cost estimation

Slide 13WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation

DARQ AliBaba FedX SPLENDID

Statistics ServiceDesc – – VoiD

Source Selection

Statistics(predicates)

All sources ASK queries Statistics + ASK queries

Query Optimization

DynProg Heuristics Heuristics DynProg

Query Execution

Bind join Bind join Bound Join + parallelization

Bind Join + Hash Join

Orthogonal State-of-the-Art approaches:

FedBench Evaluation Suite• Life Science + Cross Domain Data• different query characteristics

Measuring• #data sources selected• query execution time

Slide 14WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation: Source Selection

Source Selection Join Optimization Query Execution

rdf:typeowl:sameAs

Slide 15WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Evaluation: Query Optimization

Source Selection Join Optimization Query Execution

Slide 16WeST InstitutePeople and Knowledge Networks

Olaf GörlitzCOLD 2011, Bonn, Germany

Conclusion

VoiD-based query federation is efficient

Publish more VoiD description!

What next? Combination with FedX Improving estimation and cost model Integrating SPARQL 1.1 features

top related