part ii. reification we can make statements about the rdf statements themselves. this can be used to...

34
Part II

Upload: susan-morgan

Post on 29-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Part II

Page 2: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

ReificationWe can make statements about the RDF statements

themselves. This can be used to annotate information

In science, it is common to quote someone, or provide provenance or date stamp information, like who conducted certain experiment or simulation, and when it was done

Explicit reification, which is used in database modeling, is also used in RDF to write more sophisticated statements about other statements using built-in vocabulary

This is done by first making a reified model of the statement, with type, subject, predicate, and object propertiesWe make a new resource to represent the entire

statement

Page 3: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

RDF Reification vocabularyReification is done in RDF by using the following

qualified names to annotate the statement: rdf : Statement (resources that are statement), and rdf : subject, rdf : predicate, and rdf : object properties

For example, if we want to say that “Bill Fritz says that Dinwoody Formation formed in the Triassic”, we do it by first assigning a qualified name to the statement, such as q : n1, and then use it in the reification quad statements:

q : n1 rdf : type rdf : Statement;rdf : subject strat : Dinwoody;rdf : predicate strat : formed-in;rdf : object time : Triassic.

Person : Bill Fritz s : says q : n1i.e., the statement q, which is an rdf statement, the subject, predicate,

and objects of which are given by the three qualified names, and that Dr. Fritz made this statement. This statement is using a bnode.

Page 4: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

rdf : type

time : Triassic

strat : Dinwoodyrdf : subject

rdf : Statement

rdf : object

rdf : predicate

Bill Fritz

saysstrat : formed-in

attributed-to

Page 5: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Alternative way to reify itBill Fritz says that Dinwoody

Formation formed in the Triassic

Fritz says S

S rdf:type rdfs:Statement

S rdf:subject DinwoodyFormation

S rdf:predicate formedIn

S rdf:object Triassic

Page 6: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

SPARQLSPARQL (pronounced sparkle) is the

standard RDF query language SPARQL uses variables for the subject,

predicate, and object of an RDF triple

The queries are made of parts called ’triple pattern”, which has variables represented by a letter preceded by a question mark (?), e.g., ?x.

Page 7: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

SPARQL Queries, ExampleWhich epoch precedes Miocene (Oligocene)

?x time : precedes time : Miocene.

Which minerals are part-of granite (quartz, feldspars, micas)petr : Mineral ?y petr : Granite.

Pollutant pollute which aquifer?hydro : Pollutant hydro : pollute ?z.

The SPARQL engine needs the ontologies (in this case, Time, Petrology, and Hydrogeology) to return the associated responses to these queries

Page 8: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Graph Pattern QueryA graph pattern query (given within {} braces) is the one with

a set of triple patterns.

For example the following two triples:

Which orogeny deformed (tect: namespace) the Tertiary system (strat: namespace)?

Zagros orogeny (tect: namespace) formed (strat: namespace) which mountain range?

The set of two triples are given in N3 as: {?orogeny tect : deformed strat : TertiarySystemtect : ZagrosOrogeny struc : formed ?MtRange}

For these queries to work, all the triple patterns must match the nodes and edges of the ontologies in these namespaces!

Page 9: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

InferencingThe Semantic Web languages allow explicit

expression of the relationship between classes of objectsstrat: Triassic partOf strat: Mesozoic

Compared to databases, which require programming to drive data from complex hierarchical structures, these languages allow smarter integration and connection of data, making it easier to query and use the data

Page 10: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

What is Inferencing?The Semantic Web languages provide ‘inferencing’, meaning

that we can derive other related [unstated] information from a set of stated information

The mechanisms for inference are provided in the language constructs, like rdfs:subclassOf, which make ‘inference-based semantics’ possible

Through inferencing, we should be able to query a broader (general) term (e.g., Fault Rock) and get information about their narrower (specialized) subclass terms that extend it, e.g.,

Mylonite subClassOf FaultRock

If we know FaultRock isA Rock, and Rock is Solid, and Solid isNot Liquid, then we can infer that Mylonite is Solid, and Mylonite isNot liquid.

Note: isNot is modeled by saying that Liquid disjointWith Solid

Page 11: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

…The Web Ontology Language (OWL) provides formal meaning to its constructs such as rdfs: Class and rdfs : subClassOf

It is inferred from the language that: if C is a subClassOf C’, then every member x of class C is also a member of class C’

For example, if the Idaho batholith is a Batholith, and Batholith rdfs: subClassOf IgneousBody, then IdahoBatholiths rdfs:subclassOf IgneousBody

So, if we search for igneous bodies in general, we may be offered information about the narrower Batholith term, and data about the Idaho batholiths may be provided

C

C’

y

x

Page 12: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Type Propagation RuleThe ‘type propagation rule’ gives the definition

of the meaning of the C subClassOf C’ statement:

IF?C rdfs : subClassOf ?C’.AND?x rdf : type ?C.THEN?x rdf :type ?C’.

if C isA C’, and x is an instance of C, then x is an instance of C’.

C

C’

y

x

Page 13: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Example for inferenceIf all porphyritic textures are igneous texture, and all

igneous textures are texture, and the individual texture1 is porphyritic:

Applying predicate logic: If x is porphyritic texture, then x is igneous texture

PorphyriticTexture (x) IgneousTexture (x) If x is igneous texture, then x is texture

IgneousTexture (x) Texture (x)

Given the following two instances: IgneousTexture (IgneousTexture1) andPorphyriticTexture (PorphyriticTexture1)

Then we infer the following unasserted facts:IgneousTexture (PorphyriticTexture1)Texture (IgneousTexture1)Texture (PorphyriticTexture1)

Texture

IgneousTexture

PorphyriticTexture1

PorphyriticTexture

IgneousTexture1

Page 14: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Multiple SubclassingThe Web Ontology Language (OWL),

and its sub-languages(RDF and RDFS), provide formal constraint for the meaning of theirconstructs to make inferencing from combinations of terms possible

Like object-oriented programming (OOP) languages, multiple subclassing (inheritance) exists in RDFS

If A subClassOf B and A subClassOf C, then if x is an instance (individual) of A, thenx is instances of both B and C (which follows from the type propagation rule)

B C

A

x

Brittle Ductile

Semibrittle

x

Page 15: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Benefits of Inference RulesThis inference-based semantics is very

powerful for the integration of heterogeneous data provided from autonomous, distributed sources on the Web, and making the distributed data useful

The reason why inference rules make data, which are constrained by the OWL constructs, more useful, is that RDFS and OWL inferencing query engines, that know OWL inference rules, will infer (during a query) unasserted information from the directly asserted triples in the RDF store

Page 16: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Assume the triple store contains two asserted RDF triplesstruc : FaultRockrdfs : subClassOf petr : Rockstruc : Mylonite rdf : type struc : FaultRock

Suppose the following SPARQL code queries thetriple store, and wants to find out about things that are of type Rock, which is defined in the ‘petr’ namespace

?x rdf : type petr : Rock .

Despite the fact that there is no triple for thestruc:Mylonite subject, with predicate rdf:type and object petr:Rock in the above asserted triples, the query will return (in addition to the started ?x = struc : FaultRock) the following inferred result using the rdfs inference query engine:

?x = struc : Mylonite

Rock

FaultRock

Mylonite

Page 17: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Inferred TriplesInference engines, applying their set of

inference rules return unasserted, inferred triples from asserted triples

The inferred triples may or may not be saved in the triple store, and may be generated only at the time of querying

Page 18: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

ExampleThe following diagram shows the hierarchy

of the pyroxene minerals in the min : Mineralogy ontology

This means that Diopside isA Pyroxene, and Pyroxene isA Silicate, and Silicate isA Mineral

Page 19: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Inferred TriplesGiven the following asserted triples:

min : Diopside rdf : type min : Pyroxenemin : Pyroxene rdf : type min : Silicatemin : Silicate rdf : type min : Mineral

We can derive the following inferred triples using the type propagation rule on the asserted triples:

min : pyroxene rdf : type min : Mineral min : diopside rdf : type min : Silicatemin : diopside rdf : type min : Mineral

Page 20: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

RDF and Relational DatabaseEvery statement in RDF is like a value in a cell

of a database table which requires three values for its complete representation: Table

a row identifier (subject, s)a column identifier (predicate, p)the value in each table cell (object, o)

Note: for a 3x3 table, we have 9 triples!

Recall that we refer to the ‘subject-predicate-object’ statement as a ‘triple’

p

s o

Page 21: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Triples: Building blocks for RDFSubject (S) is the thing for which we are making

the statement. In this case it is the record, i.e., row

Predicate (P) is the property for the subject entity in the row

In this case it is the column or field

Object (O) is the value for the property at the cell

p

s o

Page 22: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Data FederationRDF is designed for data federation of any kind

(database, spreadsheet, XML), originated from multiple sources

These data can be converted into a set of triples and put in the RDF data store (federated graph), ready to be queried

In the RDF triple: ‘Course instructor Babaie’, course is the subject, instructor is the predicate, and Babaie is the value for the instructor:

Course Babaieinstructor

Subject ObjectPredicate

Page 23: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Directed GraphAn RDF store commonly has more

than one triple referring to the same subject (S), i.e., 1 s, many o’s

The picture is shown for one row only!

This translates to one row, (i.e., record)of a relational database table with multiple fields (columns)

This leads to the ‘directed graph’, which shows triples as ‘edges’ (labeled by predicates) radiating from one subject ‘node‘ to different object nodes

p1 p2 p3

s o1 o2 o3

s o2

o3

o1p1

p3

p2

Page 24: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

takes

lithology

purpose

type

sampleID lithology type purpose

N235 basalt powder K-Ar dating

N300 granite chip thin section

basalt

K-Ar dating

powder

SampleIDN235

Investigator

Sample Table

Directed Graph only shown for N235

S1

S2

p1 p2 p3

Page 25: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

URI (Uniform Resource Identifier)Merging a distributed group of directed groups

requires mapping nodes in each graph

Even if nodes in different graphs have the same name, it is not guaranteed that the nodes are from the same resource!

To make matching of the nodes possible, we need to use the URI (Uniform Resource Identifier), which is a superclass of the URL (every URL is a URI, but not the other around).

A URI is a global identifier for a resource (has information about server name, protocol, port number, file name) which is required for a global networking

URI refers to either a Web name or a location, compared to the URL which only refers to a Web location

Page 26: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

URI PrefixNodes from two graphs can be merged if they

have the same URI

We use a prefix to represent the long URI strings, e.g., ‘geochem’ and ‘struc’ can represent the Geochemistry and structural geology prefixes which may have a URI:

http://www.usgs.org/ontologies/Geochemistry.owl#http://www.usgs.org/ontologies/StructuralGeology.owl#

If the Geochemistry or Structural Geology ontology has a class called Analysis or Foliation, respectively, we designate them as:geochem : Analysis

struc : Foliation

Page 27: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Default NamespaceIf there is only one (default) namespace, we

show the class name with a colon followed by the class name (e.g., : Fracture).

OWL, RDF, RDFS, and XSD have their own standard namespace

Thus, rdf : type is a typing construct in the rdf namespace. Here are some more:

struc : Fold rdf : type struc : Structuregeochem : oxidize rdf : type rdf : Property

Page 28: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Relational database tables and RDFRows in a relational table represent a single

record

Each record maps to an individual entity

This means that each row should have a unique URI, which in the database is represented by the unique identifier (ID column, the primary key)

ID p1 p2 p3

s1 o11 o12 o13

s2 o21 o22 o23

Record

Record

Page 29: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Relational Database to RDF Graph

The best practice is to design a URI for the table, with a prefix:

xmlns : geochem = http://www.gsi.ir/ontologies/geochemistry.owl#Sample

We identify each row by concatenating the table name (Sample) with the ID of each row, for example, geochem : Sample1, geochem : Sample2, etc.

To make the fields also unique, we concatenate the table name (Sample) with the column name, like: geochem : Sample_lithology, geochem : Sample_type, geochem : Sample_purpose

ID lithology type purpose

1

2

Geochem : Sample

Page 30: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Example for RDB to RDFNotice that, during conversion of a relational

table to RDF, each cell in the table converts into one RDF triple

In the table in the next slide, we have:7 rows and 5 columns, which lead to 35 triples

Note: Only triples for two samples are shown!

Page 31: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

ID number location analysis lithology

1 N122 Neyriz REE Gabbro

2 N150 Neyriz Trace element Pyroxenite

3 Z338 Zabol Pb Isotope Basalt

4 R120 Rasht Sr Isotope Granite

5 S214 Sabzevar XRD Gabbro

6 R123 Rasht XRD Granite

7 S220 Sabzevar Major oxides Dunite

Geochem : Sample

Geochem

:Sample

Num

ber

Geochem:Sample1

Geochem:Sample2

Geochem

:Sample

location

Geochem

:Sample

analysis

Geochem

:Sample

lithology

Geochem:Sample3

Geochem:Sample4

Geochem:Sample5

Geochem:Sample6

Geochem:Sample7

Page 32: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Relational database (RDB) to RDFFields (columns) of the table become properties

(predicate):

geochem : Sample_numbergeochem : Sample_locationetc.

Each row provides the subject, for example,geochem : Sample1geochem : Sample2etc.

The following table shows part of the RDF graph of the previous Sample table in the Geochemistry database:

Page 33: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

Subject Predicate Objectgeochem : Sample1 geochem : sampleId 1geochem : Sample1 geochem : sampleNumber N122geochem : Sample1 geochem : sampleLocation Neyrizgeochem : Sample1 geochem : sampleAnaysis REEgeochem : Sample1 geochem : sampleLithology Gabbrogeochem : Sample2 geochem : sampleId 2geochem : Sample2 geochem : sampleNumber N150geochem : Sample2 geochem : sampleLocation Neyrizgeochem : Sample2 geochem : sampleAnalysis Trace Elementgeochem : Sample2 geochem : sampleLithology Pyroxenite… … …

RDF triples for the Sample table in the Geochemistry database (only 2 samples shown!)

Page 34: Part II. Reification We can make statements about the RDF statements themselves. This can be used to annotate information In science, it is common to

In this case, the objects are not class (object) resources. Here they are literal values (i.e., string).

The type for each individual (i.e., each row) is the table (in this case, Sample).

These types are also given in the RDF graph.

Subject Predicate Objectgeochem : Sample1 rdfs : type geochem : Samplegeochem : Sample2 rdfs : type geochem : Samplegeochem : Sample3 rdfs : type geochem : Samplegeochem : Sample4 rdfs : type geochem : Samplegeochem : Sample5 rdfs : type geochem : Samplegeochem : Sample6 rdfs : type geochem : Samplegeochem : Sample7 rdfs : type geochem : Sample