on explicit provenance management in rdf/s graphs

24
23/02/2009 Giorgos Flouris 1 TAPP-09 On Explicit Provenance Management in RDF/S Graphs Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr

Upload: ailish

Post on 12-Jan-2016

42 views

Category:

Documents


1 download

DESCRIPTION

On Explicit Provenance Management in RDF/S Graphs. Panagiotis Pediaditis Giorgos Flouris Irini Fundulaki Vassilis Christophides {pped, fgeo, fundul, christop}@ics.forth.gr. Institute of Computer Science Foundation for Research and Technology – Hellas Heraklion, Greece. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 1

TAPP-09

On Explicit Provenance Management in RDF/S Graphs

Institute of Computer Science Foundation for Research and Technology – Hellas

Heraklion, Greece

Panagiotis PediaditisGiorgos FlourisIrini Fundulaki

Vassilis Christophides

{pped, fgeo, fundul, christop}@ics.forth.gr

Page 2: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 2

TAPP-09

Provenance Management in RDF/S

Provenance management problem

Mostly addressed in the database contextWe are dealing with why provenance in RDF/S graphs

—Why provenance: identifying the source data that had some influence on the existence of the target data

Three main characteristics (peculiarities of RDF/S)

Triple-based representation—Use quadruples to talk about triples’ provenance

Inference—Assign provenance information to implicit data

Coherence semantics (in updates)—Implicit data is a first-class citizen and should be retained during change,

along with its provenance information

Page 3: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 3

TAPP-09

Characteristic #1Triple-based Representation

Page 4: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 4

TAPP-09

RDF Graphs

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

RDF graph = set of RDF triples

Define classes[Paper rdf:type rdfs:Class][PaperTAPP rdf:type rdfs:Class][Person rdf:type rdfs:Class][Author rdf:type rdfs:Class]Define properties[writes rdf:type rdf:Property[writes rdfs:domain Author][writes rdfs:range Paper]Instantiate (and define) individuals[Paper10 rdf:type PaperTAPP][Giorgos rdf:type Author][Giorgos writes Paper10]Define hierarchies[PaperTAPP rdfs:subClassOf Paper][Author rdfs:subClassOf Person]And other stuff…

Page 5: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 5

TAPP-09

Provenance in RDF Graphs

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Publications Graph

(PUB)

TAPP Graph (TAPP)

PUB: [Paper rdf:type rdfs:Class]TAPP: [PaperTAPP rdf:type rdfs:Class] PUB: [Person rdf:type rdfs:Class]PUB: [Author rdf:type rdfs:Class]PUB: [writes rdf:type rdf:Property]PUB: [writes rdfs:domain Author]PUB: [writes rdfs:range Paper]TAPP: [Paper10 rdf:type PaperTAPP]TAPP: [Giorgos rdf:type Author]TAPP: [Giorgos writes Paper10]TAPP: [PaperTAPP rdfs:subClassOf Paper]PUB: [Author rdfs:subClassOf Person]

Page 6: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 6

TAPP-09

Named Graphs and Provenance

Create two named graphs and assign an ID (URI) to each

Publications graph (URI: PUB)TAPP graph (URI: TAPP)

Each named graph corresponds to a different source

Need some method to associate named graphs with triples

Triples become quadruples Fourth element is the URI of the named

graph (origin)

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 7: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 7

TAPP-09

Quadruples for Provenance

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

[Paper rdf:type rdfs:Class PUB][PaperTAPP rdf:type rdfs:Class TAPP][Person rdf:type rdfs:Class PUB][Author rdf:type rdfs:Class PUB][writes rdf:type rdf:Property PUB][writes rdfs:domain Author PUB][writes rdfs:range Paper PUB][Paper10 rdf:type PaperTAPP TAPP][Giorgos rdf:type Author TAPP][Giorgos writes Paper10 TAPP][PaperTAPP rdfs:subClassOf Paper TAPP][Author rdfs:subClassOf Person PUB]

All quadruples of the form [s p o PUB] originate from named graph PUB (Publications graph)All quadruples of the form [s p o TAPP] originate from named graph TAPP (TAPP graph)

Page 8: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 8

TAPP-09

Properties of Named Graphs

The named graph URI can be used to refer to the named graph

Can be used for assignment of metadata[TAPP hasAuthor JamesCheney G]

Granularity of provenance

A triple is the smallest bit of informationThe granularity of provenance achieved

by named graphs is at the triple levelFlexible

—A named graph can contain 0,1, or many triples

—A triple can belong to 0,1, or many named graphs

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 9: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 9

TAPP-09

Characteristic #2Inference

Page 10: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 10

TAPP-09

RDF/S Graphs

RDF Schema: add-on to RDF

RDFS adds inference semantics

Transitivity of subclass/subpropertyImplicit instantiations

Example

[Giorgos rdf:type Author][Author rdfs:subClassOf Person]Inference:

[Giorgos rdf:type Person]

Inferred knowledge is implicit

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 11: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 11

TAPP-09

Provenance and Inference

Quadruples:

[Giorgos rdf:type Author PUB][Author rdfs:subClassOf Person TAPP][Giorgos rdf:type Person ???]

Needs:

Shared ownershipA more sophisticated, compound

structureKeeping the connection with the

componentsComposition operator (PT=PUB●TAPP)

—[Giorgos rdf:type Person PT]

—Ok, but see characteristic #3

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 12: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 12

TAPP-09

Characteristic #3 Coherence Semantics (in Updates)

Page 13: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 13

TAPP-09

Foundational Semantics

Foundational viewpoint (pyramid):

Knowledge consists of the explicitly represented knowledgeOnly explicit knowledge can be changedImplicit knowledge is affected indirectly, through the changes in

the explicit knowledge (so that the resulting “pyramid” is “stable”)Explicit knowledge is more important than implicit knowledge

Basic Knowledge

Supported Knowledge

Explicit Knowledge

Implicit Knowledge

Page 14: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 14

TAPP-09

Coherence Semantics

Coherence viewpoint (raft):

No discrimination between explicit and implicit knowledgeBoth explicit and implicit knowledge can be changedChanges should be made coherently in order for the resulting

knowledge to make sense (so that the “raft” is “stable”)Explicit and implicit knowledge are of the same value

{Knowledge(includes both implicit and explicit knowledge)

Page 15: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 15

TAPP-09

Deletes

Under coherence semantics

Inferred knowledge needs to be made explicit (when in danger of being lost)

Explicit assignment of shared origin to triples

Explicit shared origin assignment

Cannot use any composition operatorMust be a first-class construct

(autonomous)Retain the connection with its

constituents

A need, but also a useful feature

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 16: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 16

TAPP-09

RDF/S Graphsets

Graphsets are like named graphs

Have IDs (URIs)Used in quadruples

—Association of triples with graphsets[Giorgos rdf:type Person PT]

—Can be referred to (metadata)[PT rdf:type Confidential G]

Encode origin or shared origin

[Giorgos rdf:type Person PT]URI association (via skolem function)

—PT is the URI of {PUB, TAPP}

—PUB is the URI of {PUB}

A named graph is a graphset—PUB corresponds to {PUB}

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

PT

Page 17: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 17

TAPP-09

Querying With RDF/S Graphsets

Standard queries (original RQL)

Give me the Persons [Giorgos]

Provenance queries (extended RQL)

Give me the Persons per {PUB}[ ]

Give me the Persons per {TAPP, PUB}[Giorgos]

Give me the sources per which Author is a subclass of Person[{PUB}]

Give me all the individual sources[{TAPP}, {PUB}]

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

Page 18: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 18

TAPP-09

Validity and Redundancy Elimination

Two invariants for RDF/S graphs

Valid (per some validity rules)Redundant-free (space considerations)

The invariants allow optimized execution of queries

These invariants are imposed during change

Improve query speed, but make updates more difficultTrade-off between having query overhead or update overhead

Page 19: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 19

TAPP-09

Updating With RDF/S Graphsets

Updates supported through an extended version of RUL

INSERT and DELETEOnly for data (class and property instances)Implicit or explicit knowledgeTake into account and update graphset (provenance) information

Main considerations

Apply the change (INSERT or DELETE)Respect invariants

—Non-redundancy (INSERT) and validity (DELETE)

Make minimal changes (under coherence viewpoint)—No unnecessary loss of information

Take into account and preserve graphset (provenance) information—Applicable upon quadruples

Page 20: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 20

TAPP-09

Conclusion

Objective: assign provenance information to RDF/S graphs to capture why provenance

Triple-based representation—Turned triples into quadruples and used named graphs to record the origin

Inference (per RDFS)—Composed named graphs

Coherence semantics in updates (deletes)—Used graphsets for composed named graphs (cannot use an operator)

Proposed query and update languages for graphsets

Based on RQL, RULCan be used to query/update provenance informationProvided syntax and semantics, as well as an implementation

—Demo at: http://139.91.183.30:3026/RULdemo/named_graph_demo/

Page 21: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 21

TAPP-09

Page 22: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 22

TAPP-09

EXTRA SLIDES

Page 23: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 23

TAPP-09

RDF/S Graphset Properties

Three types of triples in a graphset:

Explicitly assigned triplesImplicitly assigned triples (from the

constituent named graphs)Implications of the above (per

RDFS)

Paper10

PaperTAPP

Paper

instancerdf:type

subclassrdfs:subClassOf

Giorgos

Author

Person

writes

PT

PT

Page 24: On Explicit Provenance Management in RDF/S Graphs

23/02/2009 Giorgos Flouris 24

TAPP-09

Inserts and Deletes: General Process

INSERT

Validity respectedMust verify non-redundancy

Process

If INSERT is redundant ignore itRemove all redundant

information (after insert)

DELETE

Must verify validityNon-redundancy respectedIssues with inference and the

coherence viewpoint

Process

If DELETE is void ignore itMake explicit all originally

redundant information that will be lost otherwise

Restore validity by removing property instances if necessary