ontology-based integration of xml web resources
DESCRIPTION
Ontology-based Integration of XML Web resources. Irini Fundulaki CNAM-Paris, INRIA-Futurs (France) Bernd Amann, Michel Scholl CNAM-Paris, INRIA-Futurs (France) Catriel Beeri The Hebrew University, Jerusalem. The World according to XML. - PowerPoint PPT PresentationTRANSCRIPT
Ontology-based Integration of XML Web resources
Irini Fundulaki
CNAM-Paris, INRIA-Futurs (France)
Bernd Amann, Michel SchollCNAM-Paris, INRIA-Futurs (France)
Catriel BeeriThe Hebrew University, Jerusalem
Irini Fundulaki, BDA 2002, Evry
The World according to XML
• XML is the standard for the representation and
exchange of Web data
• Success of XML : “Semantic” Tags Structured
Querying
• But :
– “Semantic” tags are not always appropriate
– Semantics is hidden in the document structure
– XML DTDs can be very complex
• Solution : Ontologies Semantic Querying
Irini Fundulaki, BDA 2002, Evry
Outline
• Problems for querying XML sources
• The STYX approach for querying and integrating
XML Web sources
– The Ontology
– Publishing XML sources
– Answering Queries
– Semantic Keys
• Conclusions and Contributions
Irini Fundulaki, BDA 2002, Evry
The XML World : A simple example
<!ELEMENT Film (Crew)>
<!ATTLIST Film Title #CDATA #REQUIRED>
<!ELEMENT Crew (Member*)>
<!ELEMENT Member EMPTY>
<!ATTLIST Member Name #CDATA >Film
Crew Title
Name
‘Suleiman’
‘Intervention Divine’Member
‘Khader’‘Yitzak’
Name Name
Irini Fundulaki, BDA 2002, Evry
XML World : What about Semantics and Querying ?
• What about querying ?
– Be aware of the XML query language supported by the source
– Be aware of the structure and the semantics
• Where are the semantics ?
– Some in the DTD : Element names and parent/child
relationships
• a Film element “contains” a Title and a Crew elements
– Some in the XML document structure :
• the first Crew element represents the film’s director
• the second Crew element represents the film’s assistant
directorAsk God for the semantics ! (I.e. source administrator)
Irini Fundulaki, BDA 2002, Evry
Querying the XML World
• Simple Query :«The director and assistant director
of the film ‘Intervention Divine’»
• Simple (?) XQuery expression :
FOR $a IN document(‘URL’/Film),
$b IN $a/Crew/Member[1]
$c IN $b/following-sibling::*[1]
WHERE $a/@Title = ‘Intervention Divine’
RETURN $b/@Name , $c/@Name
Irini Fundulaki, BDA 2002, Evry
From the XML World to the Semantic Web
• XML only does not answer the needs of the
Semantic Web
• Need for richer models that precise/clarify the
semantics of XML data : rich domain schemas
(e.g. ontologies)
• Applications of the Semantic Web:
– Querying and
– Data Integration
Irini Fundulaki, BDA 2002, Evry
The STYX approach for integrating and querying XML Web resources
• Integrating XML resources:
– Integration schema (Ontology): conceptual schema with
semantic keys, symmetric relationships and inheritance
– XML resources are described by mapping rules between
paths in the XML tree (XPath location paths) and
ontology paths
• Query Mediation:
– User queries are defined in terms of the ontology
– Query rewriting using mapping rules
– Query evaluation over multiple sources
– Joining the results using semantic keys
Irini Fundulaki, BDA 2002, Evry
A ‘Simple’ World Assumption
• Domain of interest contains:
– Entities, semantic relationships between entities and
properties of entities
• The STYX Ontology models the domain of interest
and is comprised of :
– Concepts
– symmetric binary roles between concepts
– attributes of concepts and
– inheritance relations to model commonality of structures
and subset relationships between concepts
Irini Fundulaki, BDA 2002, Evry
Example of a (simple) STYX Ontology
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Concepts
Inheritance Relations
Roles
Inverse Roles
Attributes
Semantics ? No Need to ask God!
Irini Fundulaki, BDA 2002, Evry
Querying in STYX
• Simple Query :«The director and assistant director
of the film ‘Intervention Divine’»
SELECT e,f
FROM FILM a,
a.has title b,
b.directed_by c,
c.assisted by d,
c.has_name e,
d.has_name f
WHERE b = ‘Intervention Divine’
Get the film
Get its title
Get the director
Get the assistant director
Check the title
Return the requested values
Get their names
Irini Fundulaki, BDA 2002, Evry
Publishing XML sources in STYX
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Film
Crew Title
Name
Member
R1 : URL/Film as u1 POLITICAL FILM
Irini Fundulaki, BDA 2002, Evry
Publishing XML sources in STYX
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Film
Crew Title
Name
Member
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
Irini Fundulaki, BDA 2002, Evry
Publishing XML sources in STYX
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Film
Crew Title
Name
Member
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R3 : u1/Crew/Member[1] as u3 directed by
Irini Fundulaki, BDA 2002, Evry
Publishing XML sources in STYX
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Film
Crew Title
Name
Member
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
Irini Fundulaki, BDA 2002, Evry
Publishing XML sources in STYX
directed by
actorassisted by
filmed
took place at
POLITICAL FILMPOLITICAL FILM
FILMFILM PERSONPERSONEVENTEVENT
PLACEPLACE
took place inInteger has title String has name
String
(directed)
(played in)
(assisted)
(filming of)
(place of)
Film
Crew Title
Name
Member
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R5: u3/@Name as u5 has name
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
R6: u4/@Name as u6 has name
Irini Fundulaki, BDA 2002, Evry
Querying in STYX
• Queries are simple tree queries expressed in
terms of the STYX ontology
– No joins, restructuring, aggregation
• Query Evaluation over multiple sources
– A source, returns only a subset of the possible answers
for the query
– To get additional answers, we must evaluate the query
over all published sources
– The partial results are finally processed by the mediator
Irini Fundulaki, BDA 2002, Evry
Querying one source in STYX
• To evaluate a query over a source:
I. find the mapping rules that give answers to the
query variables
binding variables to rules
II. rewrite the query into an XML query expressed in the
schema of the XML source
III. the XML query is evaluated by the source
IV. and the answers are returned to the STYX mediator
Irini Fundulaki, BDA 2002, Evry
Query Rewriting in STYX
a
FILM
has title
b
directed by
c
«The director and assistant director of the film ‘Intervention Divine’»
assisted by
d
has name
e
f
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
R5 : u3/@Name as u5 has name
[a R1]
has name
Variable to Rule Bindings
R6 : u4/@Name as u6 has name
Irini Fundulaki, BDA 2002, Evry
Query Rewriting in STYX
«The director and assistant director of the film ‘Intervention Divine’»
a
FILM
has title
b
directed by
cassisted by
d
has name
e
f
has name
[a R1]
Variable to Rule Bindings
[a R1, b R2]
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
R5 : u3/@Name as u5 has name
R6 : u4/@Name as u6 has name
Irini Fundulaki, BDA 2002, Evry
Query Rewriting in STYX
«The director and assistant director of the film ‘Intervention Divine’»
a
FILM
has title
b
directed by
cassisted by
d
has name
e
f
has name
Variable to Rule Bindings
[a R1, b R2] [a R1, b R2, c R3]
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
R5 : u3/@Name as u5 has name
R6 : u4/@Name as u6 has name
Irini Fundulaki, BDA 2002, Evry
Query Rewriting in STYX
«The director and assistant director of the film ‘Intervention Divine’»
a
FILM
has title
b
directed by
cassisted by
d
has name
e
f
has name
Variable to Rule Bindings
[a R1, b R2, c R3, d R4, e R5, f R6]
R2 : u1/@Title as u2 has title
R1 : URL/Film as u1 POLITICAL FILM
R4 : u3/following-sibling::*[1] as u4 assisted by
R3 : u1/Crew/Member[1] as u3 directed by
R5 : u3/@Name as u5 has name
R6 : u4/@Name as u6 has name
Full Binding
Irini Fundulaki, BDA 2002, Evry
Rewriting to XQuery expression
a
FILM
has title
b
directed by
cassisted by
d e
f
has name
has name
a R1 ( URL/Film )
b R2 ( a/@Title)
c R3 ( a/Crew/Member[1])
d R4 ( c/following-sibling::*[1]
f R6 ( d/@Name)
e R5 ( c/@Name)
a
b c
URL/Film
@Title Crew/Member[1]
following-sibling::*[1]
d e
f
@Name
@Name
FOR $a document(‘URL’/Film), $b IN $a/@Title, $c IN $a/Crew/Member[1] $d IN $c/following-sibling::*[1], $e IN $c/@Name, $f IN $d/@NameWHERE $b = ‘Intervention Divine’RETURN $e, $f
Irini Fundulaki, BDA 2002, Evry
What about queries that cannot be answered by a source ?
«The director and assistant director of the film ‘Intervention
Divine’ and its year of creation ?»
aFILM
has title
bdirected by
cassisted by
d e
f
has name
has name
filmed.took place in
Variable to Rule Bindings
[a R1, b R2, c R3, d R4, e R5, f R6]
g
Partial Binding
Irini Fundulaki, BDA 2002, Evry
Partial Bindings
• To get a full answer, we need to evaluate the
sub-query that the source cannot answer to the
other sources and then join the partial results
• To obtain this (those) sub-query (queries) we need
to decompose the query into :
1. a prefix query that the source answers
2. and one or more suffix queries (sub-queries)
that are possibly answered by the other
sources
• To join, we need keys!
Irini Fundulaki, BDA 2002, Evry
Semantic Keys in STYX : Ontology Revisited
• XML keys
– Local ID/IDREF attributes (internal pointers)
– XML Schema keys are defined in terms of local
element/attribute values
• No formal agreement !
• Solution : define keys at the ontology
level !
Irini Fundulaki, BDA 2002, Evry
Semantic Keys in STYX : Ontology Revisited
• Semantic Keys defined in concepts of the
ontology independently of any possible keys
defined at the XML sources
• A key for a concept is a set of attribute paths
– Example : a film is identified by its title
• Instances of concepts are identified by the values
of the keys obtained by the mapping rules
Irini Fundulaki, BDA 2002, Evry
Decomposing the query
aFILM
has title
bdirected by
cassisted by
d e
fhas name
has name
filmed.took place in
g
a
FILM
has title
b
directed by
cassisted by
d e
f
has name
has name
PREFIX QUERY
Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]
Irini Fundulaki, BDA 2002, Evry
Decomposing the query
aFILM
has title
bdirected by
cassisted by
d e
fhas name
has name
filmed.took place in
g
a
FILM
has title
b
directed by
cassisted by
d e
f
has name
has name
PREFIX QUERY
Variables to Rules Binding : [a R1, b R2, c R3, d R4, e R5, f R6]
aFILM
filmed.took place in
gSUFFIX QUERY
Irini Fundulaki, BDA 2002, Evry
After Decomposition : Add Keys
a
FILM
has titleb
directed by
c
assisted by
d e
f
has name
has name
PREFIX QUERY
aFILM
filmed.took place in
gSUFFIX QUERY
thas title
thas title
The join between the prefix and the suffix queriesis the join between values of variable t
Irini Fundulaki, BDA 2002, Evry
Conclusions and Contributions
• Adding semantics to XML
– Ontology = rich description of the domain of interest
– Simple but powerful mapping language that associates
XPath location paths to ontology paths
– Semantic keys for XML data integration
• Integration System for XML : STYX prototype
– Implementation of the query rewriting and query
decomposition algorithms
– Web application