ontology-based integration of xml web resources

31
Ontology-based Integration of XML Web resources Irini Fundulaki CNAM-Paris, INRIA-Futurs (France) Bernd Amann, Michel Scholl CNAM-Paris, INRIA-Futurs (France) Catriel Beeri The Hebrew University, Jerusalem

Upload: arva

Post on 08-Jan-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Ontology-based Integration of XML Web resources. Irini Fundulaki CNAM-Paris, INRIA-Futurs (France) Bernd Amann, Michel Scholl CNAM-Paris, INRIA-Futurs (France) Catriel Beeri The Hebrew University, Jerusalem. The World according to XML. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology-based Integration of XML Web resources

Ontology-based Integration of XML Web resources

Irini Fundulaki

CNAM-Paris, INRIA-Futurs (France)

Bernd Amann, Michel SchollCNAM-Paris, INRIA-Futurs (France)

Catriel BeeriThe Hebrew University, Jerusalem

Page 2: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

The World according to XML

• XML is the standard for the representation and

exchange of Web data

• Success of XML : “Semantic” Tags Structured

Querying

• But :

– “Semantic” tags are not always appropriate

– Semantics is hidden in the document structure

– XML DTDs can be very complex

• Solution : Ontologies Semantic Querying

Page 3: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Outline

• Problems for querying XML sources

• The STYX approach for querying and integrating

XML Web sources

– The Ontology

– Publishing XML sources

– Answering Queries

– Semantic Keys

• Conclusions and Contributions

Page 4: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

The XML World : A simple example

<!ELEMENT Film (Crew)>

<!ATTLIST Film Title #CDATA #REQUIRED>

<!ELEMENT Crew (Member*)>

<!ELEMENT Member EMPTY>

<!ATTLIST Member Name #CDATA >Film

Crew Title

Name

‘Suleiman’

‘Intervention Divine’Member

‘Khader’‘Yitzak’

Name Name

Page 5: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

XML World : What about Semantics and Querying ?

• What about querying ?

– Be aware of the XML query language supported by the source

– Be aware of the structure and the semantics

• Where are the semantics ?

– Some in the DTD : Element names and parent/child

relationships

• a Film element “contains” a Title and a Crew elements

– Some in the XML document structure :

• the first Crew element represents the film’s director

• the second Crew element represents the film’s assistant

directorAsk God for the semantics ! (I.e. source administrator)

Page 6: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Querying the XML World

• Simple Query :«The director and assistant director

of the film ‘Intervention Divine’»

• Simple (?) XQuery expression :

FOR $a IN document(‘URL’/Film),

$b IN $a/Crew/Member[1]

$c IN $b/following-sibling::*[1]

WHERE $a/@Title = ‘Intervention Divine’

RETURN $b/@Name , $c/@Name

Page 7: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

From the XML World to the Semantic Web

• XML only does not answer the needs of the

Semantic Web

• Need for richer models that precise/clarify the

semantics of XML data : rich domain schemas

(e.g. ontologies)

• Applications of the Semantic Web:

– Querying and

– Data Integration

Page 8: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

The STYX approach for integrating and querying XML Web resources

• Integrating XML resources:

– Integration schema (Ontology): conceptual schema with

semantic keys, symmetric relationships and inheritance

– XML resources are described by mapping rules between

paths in the XML tree (XPath location paths) and

ontology paths

• Query Mediation:

– User queries are defined in terms of the ontology

– Query rewriting using mapping rules

– Query evaluation over multiple sources

– Joining the results using semantic keys

Page 9: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

A ‘Simple’ World Assumption

• Domain of interest contains:

– Entities, semantic relationships between entities and

properties of entities

• The STYX Ontology models the domain of interest

and is comprised of :

– Concepts

– symmetric binary roles between concepts

– attributes of concepts and

– inheritance relations to model commonality of structures

and subset relationships between concepts

Page 10: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Example of a (simple) STYX Ontology

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Concepts

Inheritance Relations

Roles

Inverse Roles

Attributes

Semantics ? No Need to ask God!

Page 11: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Querying in STYX

• Simple Query :«The director and assistant director

of the film ‘Intervention Divine’»

SELECT e,f

FROM FILM a,

a.has title b,

b.directed_by c,

c.assisted by d,

c.has_name e,

d.has_name f

WHERE b = ‘Intervention Divine’

Get the film

Get its title

Get the director

Get the assistant director

Check the title

Return the requested values

Get their names

Page 12: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Publishing XML sources in STYX

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Film

Crew Title

Name

Member

R1 : URL/Film as u1 POLITICAL FILM

Page 13: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Publishing XML sources in STYX

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Film

Crew Title

Name

Member

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

Page 14: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Publishing XML sources in STYX

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Film

Crew Title

Name

Member

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R3 : u1/Crew/Member[1] as u3 directed by

Page 15: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Publishing XML sources in STYX

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Film

Crew Title

Name

Member

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

Page 16: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Publishing XML sources in STYX

directed by

actorassisted by

filmed

took place at

POLITICAL FILMPOLITICAL FILM

FILMFILM PERSONPERSONEVENTEVENT

PLACEPLACE

took place inInteger has title String has name

String

(directed)

(played in)

(assisted)

(filming of)

(place of)

Film

Crew Title

Name

Member

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R5: u3/@Name as u5 has name

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

R6: u4/@Name as u6 has name

Page 17: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Querying in STYX

• Queries are simple tree queries expressed in

terms of the STYX ontology

– No joins, restructuring, aggregation

• Query Evaluation over multiple sources

– A source, returns only a subset of the possible answers

for the query

– To get additional answers, we must evaluate the query

over all published sources

– The partial results are finally processed by the mediator

Page 18: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Querying one source in STYX

• To evaluate a query over a source:

I. find the mapping rules that give answers to the

query variables

binding variables to rules

II. rewrite the query into an XML query expressed in the

schema of the XML source

III. the XML query is evaluated by the source

IV. and the answers are returned to the STYX mediator

Page 19: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Query Rewriting in STYX

a

FILM

has title

b

directed by

c

«The director and assistant director of the film ‘Intervention Divine’»

assisted by

d

has name

e

f

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

R5 : u3/@Name as u5 has name

[a R1]

has name

Variable to Rule Bindings

R6 : u4/@Name as u6 has name

Page 20: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Query Rewriting in STYX

«The director and assistant director of the film ‘Intervention Divine’»

a

FILM

has title

b

directed by

cassisted by

d

has name

e

f

has name

[a R1]

Variable to Rule Bindings

[a R1, b R2]

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

R5 : u3/@Name as u5 has name

R6 : u4/@Name as u6 has name

Page 21: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Query Rewriting in STYX

«The director and assistant director of the film ‘Intervention Divine’»

a

FILM

has title

b

directed by

cassisted by

d

has name

e

f

has name

Variable to Rule Bindings

[a R1, b R2] [a R1, b R2, c R3]

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

R5 : u3/@Name as u5 has name

R6 : u4/@Name as u6 has name

Page 22: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Query Rewriting in STYX

«The director and assistant director of the film ‘Intervention Divine’»

a

FILM

has title

b

directed by

cassisted by

d

has name

e

f

has name

Variable to Rule Bindings

[a R1, b R2, c R3, d R4, e R5, f R6]

R2 : u1/@Title as u2 has title

R1 : URL/Film as u1 POLITICAL FILM

R4 : u3/following-sibling::*[1] as u4 assisted by

R3 : u1/Crew/Member[1] as u3 directed by

R5 : u3/@Name as u5 has name

R6 : u4/@Name as u6 has name

Full Binding

Page 23: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Rewriting to XQuery expression

a

FILM

has title

b

directed by

cassisted by

d e

f

has name

has name

a R1 ( URL/Film )

b R2 ( a/@Title)

c R3 ( a/Crew/Member[1])

d R4 ( c/following-sibling::*[1]

f R6 ( d/@Name)

e R5 ( c/@Name)

a

b c

URL/Film

@Title Crew/Member[1]

following-sibling::*[1]

d e

f

@Name

@Name

FOR $a document(‘URL’/Film), $b IN $a/@Title, $c IN $a/Crew/Member[1] $d IN $c/following-sibling::*[1], $e IN $c/@Name, $f IN $d/@NameWHERE $b = ‘Intervention Divine’RETURN $e, $f

Page 24: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

What about queries that cannot be answered by a source ?

«The director and assistant director of the film ‘Intervention

Divine’ and its year of creation ?»

aFILM

has title

bdirected by

cassisted by

d e

f

has name

has name

filmed.took place in

Variable to Rule Bindings

[a R1, b R2, c R3, d R4, e R5, f R6]

g

Partial Binding

Page 25: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Partial Bindings

• To get a full answer, we need to evaluate the

sub-query that the source cannot answer to the

other sources and then join the partial results

• To obtain this (those) sub-query (queries) we need

to decompose the query into :

1. a prefix query that the source answers

2. and one or more suffix queries (sub-queries)

that are possibly answered by the other

sources

• To join, we need keys!

Page 26: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Semantic Keys in STYX : Ontology Revisited

• XML keys

– Local ID/IDREF attributes (internal pointers)

– XML Schema keys are defined in terms of local

element/attribute values

• No formal agreement !

• Solution : define keys at the ontology

level !

Page 27: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Semantic Keys in STYX : Ontology Revisited

• Semantic Keys defined in concepts of the

ontology independently of any possible keys

defined at the XML sources

• A key for a concept is a set of attribute paths

– Example : a film is identified by its title

• Instances of concepts are identified by the values

of the keys obtained by the mapping rules

Page 28: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Decomposing the query

aFILM

has title

bdirected by

cassisted by

d e

fhas name

has name

filmed.took place in

g

a

FILM

has title

b

directed by

cassisted by

d e

f

has name

has name

PREFIX QUERY

Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]

Page 29: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Decomposing the query

aFILM

has title

bdirected by

cassisted by

d e

fhas name

has name

filmed.took place in

g

a

FILM

has title

b

directed by

cassisted by

d e

f

has name

has name

PREFIX QUERY

Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]

aFILM

filmed.took place in

gSUFFIX QUERY

Page 30: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

After Decomposition : Add Keys

a

FILM

has titleb

directed by

c

assisted by

d e

f

has name

has name

PREFIX QUERY

aFILM

filmed.took place in

gSUFFIX QUERY

thas title

thas title

The join between the prefix and the suffix queriesis the join between values of variable t

Page 31: Ontology-based Integration of XML Web resources

Irini Fundulaki, BDA 2002, Evry

Conclusions and Contributions

• Adding semantics to XML

– Ontology = rich description of the domain of interest

– Simple but powerful mapping language that associates

XPath location paths to ontology paths

– Semantic keys for XML data integration

• Integration System for XML : STYX prototype

– Implementation of the query rewriting and query

decomposition algorithms

– Web application