linked data+top-k query processing

20
Top-k Linked Data Query Processing Linked Data is about using the Web to connect related data that wasn't previously linked. Linked Data - Definition

Upload: le-duc-thang

Post on 14-Apr-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 1/24

Top-k Linked D

Query Process

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 2/24

• Linked Data is about using the Web to connect related data that w

previously linked.

Linked Data - Definition

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 3/24

• Use URIs to define things

• Use HTTP URIs so that these things can be referred to andup

• Provide useful information in RDF – when someone looks

• Include RDF links to other URIs – to enable discovery of reinformation

Linked Data - Principle

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 4/24

• URIs

 –  Global unique identifiers for entites

 –  Pointers to data

• HTTP to access data on the Web

• RDF as a share data model

• FORMATS ( RDF/XML, RDFa,…) / HYPERLINKS 

Linked Data - Component

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 5/24

• A resource is basically everything

 –  E.g. persons, places, Web documents, abstract concepts

• Descriptions of resources

 –  Attributes

 –  Relations

• The framework contains:

 –  A data model

 –  Languages and syntaxes

RDF

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 6/24

• Data comes as a set of triples (subject, predicate, objec

• Subject: resources

• Predicate: properties

• Object: literals or resources

• Examples: –  ( Mount Baker , last eruption , 1880 )

 –  ( Mount Baker , location , Washington )

RDF Data Model

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 7/24

• RDF is also a graph model

 –  Triples as directed edges

 –  Subjects and objects as vertices

 –  Edges labeled by predicate

• Example:

 –  ( Mount Baker , last eruption , 1880 ) –  ( Mount Baker , location , Washington )

RDF Data Model

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 8/24

• URIs extend the concept of URLs

 –  Globally unique identifier for resources

 –  URL of a Web document usually used as its URI

 –  Attention: URIs identify not only Web documents

• Example:

 –  Me: http://olafhartig.de/~hartig/foaf.rdf#olaf  –  RDF document about me: http://olafhartig.de/~hartig/foaf.rdf 

 –  HTML document about me: http://olafhartig.de/~hartig/index.h

URI

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 9/24

URI

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 10/24

• Literals may occur in the object position of triples

• Represented by strings

• Literal strings interpreted by datatypes

 –  Datatype identified by a URI

 –  Common to use the XML Schema datatypes

 –  No datatype: interpreted as xsd:string

 –  Untyped literals may have language tags (e.g. @de)

Literal

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 11/24

• Blank nodes represent unnamed, anonymous resources

 –  Not identified by a URI

• Blank node identifiers

 –  Identification of blank nodes in triple serializations

• Form: _:xyz

• Scope: a single RDF graph

Blank Nodes

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 12/24

• Extends the definition of RDF nodes and RDF triples

 –  RDF node: I, B, and L, which are pair-wise disjoint infinite sets of IResource Identifiers (IRIs), blank nodes and literals

 –  RDF triple: (s, p, o) ∈ IB × I × IBL, where IL = I ∪ L, IB = I ∪ B and IBL

Linked Stream Data model

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 13/24

• White box architecture

 –  Implements all required components

 –  physical operators (e.g. windows, join, triple pattern matching)

 –  data structures (e.g. B+-Trees, hashtables)

 –  query generator/optimizer/executor 

Black box architecture –  Uses existing RDF and data stream processing systems as sub-co

 –  Query rewriter, data translator and orchestrator among subcomneeded

System architecture

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 14/24

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

 xmlns:foaf="http://xmlns.com/foaf/0.1/"

 xmlns:dc="http://purl.org/dc/elements/1.1/"

 xmlns:ex="http://example.com/">

<foaf:Band rdf:about="http://example.com/Beatles/">

<foaf:name>Beatles</foaf:name>

<ex:album rdf:about="http://example.com/Beatles/Sgt_Pepper/">

<ex:name>Sgt_Pepper</ex:name>

<ex:song rdf:about="http://example.com/Beatles/Sgt_Pepper/Lucy">

<ex:name>Lucy</ex:name>

</ex:song>

</ex:album>

<ex:album rdf:about="http://example.com/Beatles/Help!/">

<ex:name>Help!</ex:name>

<ex:song rdf:about="http://example.com/Beatles/Help!/Help!">

Data model

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 15/24

-Get all songs from Beatles’s album

PREFIX ex: http://example.com/ 

PREFIX foaf: http://xmlns.com/foaf/0.1/ 

PREFIX : http://example.com/resource/ 

SELECT * WHERE

ex:beatle ex:album ?album

?album ex:song ?song

}

Query

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 16/24

• Purpose

 –  Efficiency and scalability are essential problems in the Linked D

 –  Instead of computing all results, top-k query processing approacproduce only the “best" k results

Top-k Linked Data Query Proces

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 17/24

• Source index

 –  Map or match triple pattern to sources containing bindings

Top-k Linked Data Query ProcessingRequirement (just use for binary op

Linked DataQuery ProcessingEngine

ex:sgt_pepper foaf:n

"Sgt. Pepper";

ex:song "Lucy".

Src.2

ex:help foaf:name

"Help!";

ex:song "Help!".

Src.3ex:beatles foaf:name

"The Beatles";

ex:album ex:sgt_pepper;

ex:album ex:help.

Src.1

TP1: ex:beatles ex:album ?album .

TP2: ?album ex:song ?song .

 sourceindex

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 18/24

• Ranking function 

 –  Determining the relevance of triple pattern bindings

 –  For instance, scores for triples can be obtained through PageRank ranking

 –  However, no triples are indexed (i.e., each source must be scanne

Top-k Linked Data Query Proces

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 19/24

• Sorted access

Top-k Linked Data Query Proces

TP2: ?album ex:song ?song 

Src.2

TP1:

ex:beatles ex:album ?album

Bindidesc score

Sche

Str

Src.3

2

Src.1

1

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 20/24

• Push Based Rank Join

Top-k Linked Data Query Proces

Sorted Access forex:beatles ex:album ?album .

Sorted Access for?album ex:song ?

Score Query Bindings – Output Queue

ex:beatles foaf:name

"The Beatles";

ex:album ex:sgt_pepper;

ex:album ex:help.

Src.1 ex:help foaf:name

"Help!";

ex:song "Help!".

Src.3

Score Seen Triples (TP2)

3 ex:help ex:song "Help!"

2 ex:sgt_pepper ex:song

“Lucy“ (skip because of 

score 2 <3, just only pus

“help!” ) 

Score Seen Triples (TP1)

1 ex:beatles ex:album

ex:sgt_pepper

1 ex:beatles ex:albumex:help (because

1=1 so choose both

of them) 

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 21/24

• Push Based Rank Join

Top-k Linked Data Query Proces

Score Query Bindings – Output Queue

4ex:beatles ex:album ex:help.ex:help ex:song "Help!" .

Threshold: 4(max (1+3, 1+3))

Sorted Access forex:beatles ex:album ?album .

Sorted Access fo?album ex:song

Score Seen Triples (TP2)

3 ex:help ex:song "Help

Score Seen Triples (TP1)

1 ex:beatles ex:album

ex:sgt_pepper

1 ex:beatles ex:albumex:help

Found query bindingwith score ≥ threshold 

STOP

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 22/24

• Improving threshold estimation

 –  Origin threshold estimation:

 –  How to improve:

 –  Star-shaped entity query bounds

 –  Look-ahead bounds

Top-k Linked Data Query Proces

Threshold: max { max_1 + min_2, max_2 + min_1}

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 23/24

• Star-shaped entity query bounds

 –  Problem:

 –  In Linked Data query processing, every result for entity queries to suis contained in one single source.

 –  Reason:

 –  A result here is an entity,

 –  Information related to that entity comes exclusively from the one sorepresenting that particular entity.

 –  Idea:

 –   Upper bound scores for triple pattern bindings via the maximal posscore

Top-k Linked Data Query Proces

7/27/2019 Linked Data+Top-K Query Processing

http://slidepdf.com/reader/full/linked-datatop-k-query-processing 24/24

• Look-ahead Bounds:

 –  Provide a more accurate upper bound for the unseen bindings

the next possible score 

Top-k Linked Data Query Proces

ma

mi

Threshold: max { 1 + 2 , 1 + 3 } = 4

Score Seen Triples (TP2)

3 ex:help ex:song "Help!"

Sorted Access for?album ex:song ?song

Src. 2

S

mi

max_1 = 1

min_1 = 1

Score Seen Triples (TP1)

1 ex:beatles ex:album

ex:sgt_pepper

1 ex:beatles ex:album

ex:help

Sorted Access for

Score Query Bindings – Output Queue

4 ex:beatles ex:album ex:help .ex:help ex:song "Help!" .