introduction to query rewriting optimisation with dependencies
DESCRIPTION
Introduction to query rewriting optimisation with dependencies in APEX lab, Shanghai 2012.TRANSCRIPT
DependenciesMaking Ontology Based Data Access Work in Practice
Mariano Rodriguez-Muro and Diego Calvanese{rodriguez,calvanese}@inf.unibz.it
KRDB Research CentreFree University of Bozen Bolzano
July, 2011
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 1 / 33
The context
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 2 / 33
DL Ontologies
Description Logics:
• Formalisms for knowledge representation.
• Decidable fragments of FOL
• Base of OWL
• World is described by means of Concepts and Roles
Ontologies
• Intentional knowledge: TBox T .
• Extensional knowledge: ABox A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
DL Ontologies
Description Logics:
• Formalisms for knowledge representation.
• Decidable fragments of FOL
• Base of OWL
• World is described by means of Concepts and Roles
Ontologies
• Intentional knowledge: TBox T .
• Extensional knowledge: ABox A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 3 / 33
OBDA with DL-Lite
A family of light-weight ontology languages
• DL-LiteF conceptsB := A | ∃R
• DL-LiteF rolesR := P | P−
• DL-LiteF TBoxes
B v B | B v ¬B | (funct R)
• DL-LiteF ABoxesA(a) | R(a, b)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
OBDA with DL-Lite
A family of light-weight ontology languages
• DL-LiteF conceptsB := A | ∃R
• DL-LiteF rolesR := P | P−
• DL-LiteF TBoxes
B v B | B v ¬B | (funct R)
• DL-LiteF ABoxesA(a) | R(a, b)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
OBDA with DL-Lite
A family of light-weight ontology languages
• DL-LiteF conceptsB := A | ∃R
• DL-LiteF rolesR := P | P−
• DL-LiteF TBoxes
B v B | B v ¬B | (funct R)
• DL-LiteF ABoxesA(a) | R(a, b)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
OBDA with DL-Lite
A family of light-weight ontology languages
• DL-LiteF conceptsB := A | ∃R
• DL-LiteF rolesR := P | P−
• DL-LiteF TBoxes
B v B | B v ¬B | (funct R)
• DL-LiteF ABoxesA(a) | R(a, b)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
OBDA with DL-Lite
A family of light-weight ontology languages
• DL-LiteF conceptsB := A | ∃R
• DL-LiteF rolesR := P | P−
• DL-LiteF TBoxes
B v B | B v ¬B | (funct R)
• DL-LiteF ABoxesA(a) | R(a, b)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 4 / 33
Query Answering
TBox:
Man v Person,Woman v Person,Person v ∃hasFather ,
∃hasFather− v Person
ABox:Man(mariano)
Queries:q(x)← Person(x), hasFather(x , y),Person(y)
Problem: Compute the certain answers of Q, denoted cert(Q,O).
The promise
We can do this as efficiently as answering DB queries, also in the virtualsetting.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
Query Answering
TBox:
Man v Person,Woman v Person,Person v ∃hasFather ,
∃hasFather− v Person
ABox:Man(mariano)
Queries:q(x)← Person(x), hasFather(x , y),Person(y)
Problem: Compute the certain answers of Q, denoted cert(Q,O).
The promise
We can do this as efficiently as answering DB queries, also in the virtualsetting.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
Query Answering
TBox:
Man v Person,Woman v Person,Person v ∃hasFather ,
∃hasFather− v Person
ABox:Man(mariano)
Queries:q(x)← Person(x), hasFather(x , y),Person(y)
Problem: Compute the certain answers of Q, denoted cert(Q,O).
The promise
We can do this as efficiently as answering DB queries, also in the virtualsetting.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
Query Answering
TBox:
Man v Person,Woman v Person,Person v ∃hasFather ,
∃hasFather− v Person
ABox:Man(mariano)
Queries:q(x)← Person(x), hasFather(x , y),Person(y)
Problem: Compute the certain answers of Q, denoted cert(Q,O).
The promise
We can do this as efficiently as answering DB queries, also in the virtualsetting.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 5 / 33
Query Answering with PerfectRef (2005)
Query:q(x)← Person(x), hasFather(x , y),Person(y)
Reformulation:
q(x)← Person(x), hasFather(x , y),Person(y)
q(x)← Person(x), hasFather(x , y), hasFather(z , y)
q(x)← Person(x), hasFather(x , y)
q(x)← Person(x),Person(x)
q(x)← Person(x)
q(x)← Person(x), hasFather(x , y),Man(y)
q(x)← Person(x), hasFather(x , y),Woman(y)
q(x)← hasFather(x ,m), hasFather(x , y),Person(y)
q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)
q(x)← hasFather(x ,m), hasFather(x , y)
q(x)← hasFather(x ,m),Person(x)
q(x)← hasFather(x ,m), hasFather(x , t)
q(x)← hasFather(x ,m)
q(x)← hasFather(x ,m), hasFather(x , y),Man(y)
q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)
q(x)← Man(x), hasFather(x , y),Person(y)
q(x)← Man(x), hasFather(x , y), hasFather(y , z)
q(x)← Man(x), hasFather(x , y),Man(y)
q(x)← Man(x), hasFather(x , y),Woman(y)
q(x)←Woman(x), hasFather(x , y),Person(y)
q(x)←Woman(x), hasFather(x , y), hasFather(y , z)
q(x)←Woman(x), hasFather(x , y),Man(y)
q(x)←Woman(x), hasFather(x , y),Woman(y)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
Query Answering with PerfectRef (2005)Query:
q(x)← Person(x), hasFather(x , y),Person(y)
Reformulation:
q(x)← Person(x), hasFather(x , y),Person(y)
q(x)← Person(x), hasFather(x , y), hasFather(z , y)
q(x)← Person(x), hasFather(x , y)
q(x)← Person(x),Person(x)
q(x)← Person(x)
q(x)← Person(x), hasFather(x , y),Man(y)
q(x)← Person(x), hasFather(x , y),Woman(y)
q(x)← hasFather(x ,m), hasFather(x , y),Person(y)
q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)
q(x)← hasFather(x ,m), hasFather(x , y)
q(x)← hasFather(x ,m),Person(x)
q(x)← hasFather(x ,m), hasFather(x , t)
q(x)← hasFather(x ,m)
q(x)← hasFather(x ,m), hasFather(x , y),Man(y)
q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)
q(x)← Man(x), hasFather(x , y),Person(y)
q(x)← Man(x), hasFather(x , y), hasFather(y , z)
q(x)← Man(x), hasFather(x , y),Man(y)
q(x)← Man(x), hasFather(x , y),Woman(y)
q(x)←Woman(x), hasFather(x , y),Person(y)
q(x)←Woman(x), hasFather(x , y), hasFather(y , z)
q(x)←Woman(x), hasFather(x , y),Man(y)
q(x)←Woman(x), hasFather(x , y),Woman(y)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
Query Answering with PerfectRef (2005)Query:
q(x)← Person(x), hasFather(x , y),Person(y)
Reformulation:
q(x)← Person(x), hasFather(x , y),Person(y)
q(x)← Person(x), hasFather(x , y), hasFather(z , y)
q(x)← Person(x), hasFather(x , y)
q(x)← Person(x),Person(x)
q(x)← Person(x)
q(x)← Person(x), hasFather(x , y),Man(y)
q(x)← Person(x), hasFather(x , y),Woman(y)
q(x)← hasFather(x ,m), hasFather(x , y),Person(y)
q(x)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)
q(x)← hasFather(x ,m), hasFather(x , y)
q(x)← hasFather(x ,m),Person(x)
q(x)← hasFather(x ,m), hasFather(x , t)
q(x)← hasFather(x ,m)
q(x)← hasFather(x ,m), hasFather(x , y),Man(y)
q(x)← hasFather(x ,m), hasFather(x , y),Woman(y)
q(x)← Man(x), hasFather(x , y),Person(y)
q(x)← Man(x), hasFather(x , y), hasFather(y , z)
q(x)← Man(x), hasFather(x , y),Man(y)
q(x)← Man(x), hasFather(x , y),Woman(y)
q(x)←Woman(x), hasFather(x , y),Person(y)
q(x)←Woman(x), hasFather(x , y), hasFather(y , z)
q(x)←Woman(x), hasFather(x , y),Man(y)
q(x)←Woman(x), hasFather(x , y),Woman(y)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 6 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
Alternatives
• Improved version of PerfectRef (2007-2011)
• RQR (Urbina et, al. 2007)
Too many unions, cannot execute!.
• PRESTO (Rosati et al., 2010)
Better, eventually it breaks.
• Combined Approach (Kontchakov et. al., 2010)
Fast. But too much data and too much time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 7 / 33
What can we do?
?
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 8 / 33
Query AnsweringIt is not only about existential constants
Query:q(x , y)← Person(x), hasFather(x , y),Person(y)
Reformulation:
q(x , y)← Person(x), hasFather(x , y),Person(y)
q(x , y)← Person(x), hasFather(x , y), hasFather(z , y)
q(x , y)← Person(x), hasFather(x , y),Man(y)
q(x , y)← Person(x), hasFather(x , y),Woman(y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Person(y)
q(x , y)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Man(y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Woman(y)
q(x , y)← Man(x), hasFather(x , y),Person(y)
q(x , y)← Man(x), hasFather(x , y), hasFather(z , y)
q(x , y)← Man(x), hasFather(x , y),Man(y)
q(x , y)← Man(x), hasFather(x , y),Woman(y)
q(x , y)←Woman(x), hasFather(x , y),Person(y)
q(x , y)←Woman(x), hasFather(x , y), hasFather(z , y)
q(x , y)←Woman(x), hasFather(x , y),Man(y)
q(x , y)←Woman(x), hasFather(x , y),Woman(y)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
Query AnsweringIt is not only about existential constants
Query:q(x , y)← Person(x), hasFather(x , y),Person(y)
Reformulation:
q(x , y)← Person(x), hasFather(x , y),Person(y)
q(x , y)← Person(x), hasFather(x , y), hasFather(z , y)
q(x , y)← Person(x), hasFather(x , y),Man(y)
q(x , y)← Person(x), hasFather(x , y),Woman(y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Person(y)
q(x , y)← hasFather(x ,m), hasFather(x , y), hasFather(z , y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Man(y)
q(x , y)← hasFather(x ,m), hasFather(x , y),Woman(y)
q(x , y)← Man(x), hasFather(x , y),Person(y)
q(x , y)← Man(x), hasFather(x , y), hasFather(z , y)
q(x , y)← Man(x), hasFather(x , y),Man(y)
q(x , y)← Man(x), hasFather(x , y),Woman(y)
q(x , y)←Woman(x), hasFather(x , y),Person(y)
q(x , y)←Woman(x), hasFather(x , y), hasFather(z , y)
q(x , y)←Woman(x), hasFather(x , y),Man(y)
q(x , y)←Woman(x), hasFather(x , y),Woman(y)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 9 / 33
The full picture: Ontology Based DataAccess
SourceUser SourceUser
Queries Ontology
Mappings
Source
To deal with OBDA we need to consider:
• If in the backend we have RDBMSs, we cannot go beyond theircapabilities.
• All systems are composed by T , D = 〈R, I〉, M.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 10 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v Employee
In the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
First ObservationIs my data complete?
Completeness of A
The TBox sais: Manager v EmployeeIn the ABox: all Managers are already employees.
In any realistic scenario:
• We don’t use arbitrary sources;
• Intersection of semantics is reflected in completeness (e.g., no need tochase, expand or rewrite)
• This happens a lot!
Keyword
Redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 11 / 33
Second ObservationThere are no ABoxes
THERE ARE NO ABOXES!
Any Ontology based query answering systems today:
• Uses relational DBs to store the ABox data;
• In such D, both, R and I can be manipulated;
• Implementors may choose any M for their system;
Opportunity
To complete an ABox we can do more than expansion.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
Second ObservationThere are no ABoxes
THERE ARE NO ABOXES!
Any Ontology based query answering systems today:
• Uses relational DBs to store the ABox data;
• In such D, both, R and I can be manipulated;
• Implementors may choose any M for their system;
Opportunity
To complete an ABox we can do more than expansion.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 12 / 33
How to approach the problemTwo level approach
How to approach OBDA in practice?
• Efficient ways to deal with redundancy due to completeness.
• Efficient ways to complete (virtual) ABoxes.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
How to approach the problemTwo level approach
How to approach OBDA in practice?
• Efficient ways to deal with redundancy due to completeness.
• Efficient ways to complete (virtual) ABoxes.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
How to approach the problemTwo level approach
How to approach OBDA in practice?
• Efficient ways to deal with redundancy due to completeness.
• Efficient ways to complete (virtual) ABoxes.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
How to approach the problemTwo level approach
How to approach OBDA in practice?
• Efficient ways to deal with redundancy due to completeness.
• Efficient ways to complete (virtual) ABoxes.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 13 / 33
ContributionsDealing with redundancy
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 14 / 33
Characterizing completeness
ABox Dependencies
Definition
An assertion B vA B that restricts valid ABoxes.
Syntax B2 vA B2
Semantics: A |= Manager vA Employee if Manager(x)∈ A impliesEmployee(x)∈ A.
ABox dependencies are fundamentally different than TBox assertions.Think open world
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
Characterizing completeness
ABox Dependencies
Definition
An assertion B vA B that restricts valid ABoxes.
Syntax B2 vA B2
Semantics: A |= Manager vA Employee if Manager(x)∈ A impliesEmployee(x)∈ A.
ABox dependencies are fundamentally different than TBox assertions.Think open world
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 15 / 33
Where to deal with redundancy?
Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?
Available Options:
• Optimize the query reformulation algorithm to deal with Σ.
• Optimize the TBox T with respect to Σ.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
Where to deal with redundancy?
Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:
• Optimize the query reformulation algorithm to deal with Σ.
• Optimize the TBox T with respect to Σ.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
Where to deal with redundancy?
Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:
• Optimize the query reformulation algorithm to deal with Σ.
• Optimize the TBox T with respect to Σ.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
Where to deal with redundancy?
Given a TBox T , an ABox A, a set of dependencies Σ and a query Q,what do we do?Available Options:
• Optimize the query reformulation algorithm to deal with Σ.
• Optimize the TBox T with respect to Σ.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 16 / 33
When is an assertion redundant?
Direct Redundancy: Case 1
Let T be implied the followinghierarchy:
∃hasFather
Person
Human
Redundant if Σ is:
∃hasFather
Person
Human
Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
When is an assertion redundant?
Direct Redundancy: Case 1
Let T be implied the followinghierarchy:
∃hasFather
Person
Human
Redundant if Σ is:
∃hasFather
Person
Human
Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
When is an assertion redundant?
Direct Redundancy: Case 1
Let T be implied the followinghierarchy:
∃hasFather
Person
Human
Redundant if Σ is:
∃hasFather
Person
Human
Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
When is an assertion redundant?
Direct Redundancy: Case 1
Let T be implied the followinghierarchy:
∃hasFather
Person
Human
Redundant if Σ is:
∃hasFather
Person
Human
Σ sais hasFather(mariano, ramon) ∈ A → Human(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 17 / 33
When is an assertion redundant?
Direct Redundancy: Case 2
Let T be the following TBox:
Person
∃hasFather−
∃hasFather
Man
Redundant if Σ is:
Person
∃hasFather−
∃hasFather
Man
Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
When is an assertion redundant?
Direct Redundancy: Case 2
Let T be the following TBox:
Person
∃hasFather−
∃hasFather
Man
Redundant if Σ is:
Person
∃hasFather−
∃hasFather
Man
Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
When is an assertion redundant?
Direct Redundancy: Case 2
Let T be the following TBox:
Person
∃hasFather−
∃hasFather
Man
Redundant if Σ is:
Person
∃hasFather−
∃hasFather
Man
Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
When is an assertion redundant?
Direct Redundancy: Case 2
Let T be the following TBox:
Person
∃hasFather−
∃hasFather
Man
Redundant if Σ is:
Person
∃hasFather−
∃hasFather
Man
Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
When is an assertion redundant?
Direct Redundancy: Case 2
Let T be the following TBox:
Person
∃hasFather−
∃hasFather
Man
Redundant if Σ is:
Person
∃hasFather−
∃hasFather
Man
Σ sais Man(ramon) ∈ A → ∃a′ | hasFather(ramon, a′) ∧ Person(a′) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 18 / 33
When is an assertion redundant?Indirect Redundancy
Let T be the following TBox:
Animal
Man Human
Redundant if Σ is:
Animal
Man Human
Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
When is an assertion redundant?Indirect Redundancy
Let T be the following TBox:
Animal
Man Human
Redundant if Σ is:
Animal
Man Human
Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
When is an assertion redundant?Indirect Redundancy
Let T be the following TBox:
Animal
Man Human
Redundant if Σ is:
Animal
Man Human
Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
When is an assertion redundant?Indirect Redundancy
Let T be the following TBox:
Animal
Man Human
Redundant if Σ is:
Animal
Man Human
Σ sais Man(mariano) ∈ A then Animal(mariano) ∈ A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 19 / 33
Formalization: Redundancy
Given a TBox T and a set of dependencies Σ over T , the optimized versionof T w.r.t. Σ, denoted optim(T ,Σ), is the set of inclusion assertions
{α ∈ sat(T ) | α is not redundant in sat(T ) w.r.t. sat(Σ)}
We can compute optim(T ,Σ) in linear time.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 20 / 33
ContributionsCompleting ABoxes
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 21 / 33
General considerations
OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.
If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:
• Degree of completeness (# of dependencies),
• Cost of the procedure
• Performance of Query answering.
We can complete virtual ABoxes up to B v ∃R without the need for newdata.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
General considerations
OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.
If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.
Trade-off:
• Degree of completeness (# of dependencies),
• Cost of the procedure
• Performance of Query answering.
We can complete virtual ABoxes up to B v ∃R without the need for newdata.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
General considerations
OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.
If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:
• Degree of completeness (# of dependencies),
• Cost of the procedure
• Performance of Query answering.
We can complete virtual ABoxes up to B v ∃R without the need for newdata.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
General considerations
OBDA systems have no ABoxes, instead virtual ABoxes V = 〈D,M〉 withD = 〈R, I〉.
If we that V |= A vA B, we check make sure that mappings for B includeall the data coming from the mappings of A.Trade-off:
• Degree of completeness (# of dependencies),
• Cost of the procedure
• Performance of Query answering.
We can complete virtual ABoxes up to B v ∃R without the need for newdata.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 22 / 33
Semantic Index for OBDA
General Idea
• To encode the semantics of T in numeric indexes and ranges forconcept names and roles.
• Store the ABox in the database using those indexes and ranges.
• Make mappings for the system that take the ranges into account.
We can do this by using the implied hierarchy of T to generate the indexand ranges!
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
Semantic Index for OBDA
General Idea• To encode the semantics of T in numeric indexes and ranges for
concept names and roles.
• Store the ABox in the database using those indexes and ranges.
• Make mappings for the system that take the ranges into account.
We can do this by using the implied hierarchy of T to generate the indexand ranges!
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
Semantic Index for OBDA
General Idea• To encode the semantics of T in numeric indexes and ranges for
concept names and roles.
• Store the ABox in the database using those indexes and ranges.
• Make mappings for the system that take the ranges into account.
We can do this by using the implied hierarchy of T to generate the indexand ranges!
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
Semantic Index for OBDA
General Idea• To encode the semantics of T in numeric indexes and ranges for
concept names and roles.
• Store the ABox in the database using those indexes and ranges.
• Make mappings for the system that take the ranges into account.
We can do this by using the implied hierarchy of T to generate the indexand ranges!
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
Semantic Index for OBDA
General Idea• To encode the semantics of T in numeric indexes and ranges for
concept names and roles.
• Store the ABox in the database using those indexes and ranges.
• Make mappings for the system that take the ranges into account.
We can do this by using the implied hierarchy of T to generate the indexand ranges!
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 23 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
A
B C
D
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
1A
B2
C3
4D
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
1A
B2
C3
4D
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
1, {(1, 3)}A
B2, {(2, 2)}
C3, {(3, 3)}
4, {(3, 4)}D
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Semantic Index Example
T = {B v A,C v A,C v D}
1, {(1, 3)}A
B2, {(2, 2)}
C3, {(3, 3)}
4, {(3, 4)}D
We create a table TC with constant and idx columns. To insert the datawe use the indexes. e.g., B(mariano) ∈ A then we put (mariano, 2) ∈ TC
We create the mappings using the ranges, e.g., SELECT constant
FROM TC WHERE IDX ≥ 1 AND IDX ≤ 3; A(constant)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 24 / 33
Experimentation I
The Resource Index features:
• Search over 22 document collections
• Semantics given by the hierarchies of 200 ontologies (SNOMED, GO)
Implementation in a nutshell:
(i) Understand documents with natural language processing andannotate
Cervical Cancer(′doc224′)
(ii) Expand the ABox
(iii) Pose queries that retrieve documents as
q(x)← A1(x) ∧ · · · ∧ An(x)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 25 / 33
Experimentation II
The challenge:
• ≈ 3 million concepts and ≈ 2.5 million is-a assertions
• Split second responses
• 150 GB of data
• Expansion data: 1.5 TB
The experimentation data:
• Clinical Trials.gov (CT)
• 181 million assertion (≈ 14 GB of data, ≈ 140 GB when expanded.)
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 26 / 33
Results
The query:
q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)
Results:
• Traditional reformulation: Union of 467874 SQL SPJ queries;
• Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Timeto compute semantic index: 1 min; Size of data: +≈ 4 GB.
• ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansiontime ≈ 7 days; Size of data +≈ 126 GB.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
Results
The query:
q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)
Results:
• Traditional reformulation: Union of 467874 SQL SPJ queries;
• Semantic Index: 1 SQL; execution 3.582s (0.082s if warm); Timeto compute semantic index: 1 min; Size of data: +≈ 4 GB.
• ABox expansion: 1 SQL; executing 3s (0.6s if warm); Expansiontime ≈ 7 days; Size of data +≈ 126 GB.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 27 / 33
The Query
The query:
q(x)← DNA Repair Gene(x) ∧ Antigen Gene(x) ∧ Cancer Gene(x)
SELECT DISTINCT r0.element_id as element_id
FROM
RESOURCE_INDEX.CT_ANN r0 JOIN RESOURCE_INDEX.CT_ANN r1
ON r0.element_id = r1.element_id
JOIN RESOURCE_INDEX.CT_ANN r2
ON r1.element_id = r2.element_id
WHERE
((r0.idx >= 1783559 AND r0.idx <= 1783657)) AND
((r1.idx >= 1782996 AND r1.idx <= 1783029)) AND
((r2.idx >= 1783115 AND r2.idx <= 1783253));
Standard SQL query efficient in ANY DBMS.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 28 / 33
Conclusions
Contributions
• We indicated that efficient OBDA requires to take into account morethan only T , A and Q.
• Provided means to deal with redundancy at the level of the TBox.
• We showed that expansion is not necessary that we can completeABoxes.
• We presented to efficient ways to complete ABoxes, one for thegeneral OBDA setting and one for the virtual setting.
Future work
• Exploring more expressive languages.
• Exploring the RDFS/SPARQL setting.
• Handling updates of T and A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
Conclusions
Contributions
• We indicated that efficient OBDA requires to take into account morethan only T , A and Q.
• Provided means to deal with redundancy at the level of the TBox.
• We showed that expansion is not necessary that we can completeABoxes.
• We presented to efficient ways to complete ABoxes, one for thegeneral OBDA setting and one for the virtual setting.
Future work
• Exploring more expressive languages.
• Exploring the RDFS/SPARQL setting.
• Handling updates of T and A.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 29 / 33
Extra examples
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 30 / 33
First Observation (cont.)Mappings will introduce dependencies over ABoxes
Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:
SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)
SELECT id,dept FROM employee
WHERE salary > 1000
;q(id , dept)← Manager(id)∧MANAGES(id, dept)
Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
First Observation (cont.)Mappings will introduce dependencies over ABoxes
Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:
SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)
SELECT id,dept FROM employee
WHERE salary > 1000
;q(id , dept)← Manager(id)∧MANAGES(id, dept)
Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).
This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
First Observation (cont.)Mappings will introduce dependencies over ABoxes
Let R be a DB schema with the relation schema employee with attributesid, dept, and salary. Let M be the following mappings:
SELECT id,dept FROM employee ;q(id , dept)← Employee(id) ∧WORKS-FOR(id, dept)
SELECT id,dept FROM employee
WHERE salary > 1000
;q(id , dept)← Manager(id)∧MANAGES(id, dept)
Then for any instance I, if Manager(John) ∈ A we have thatEmployee(John).This is an indicator of completeness of all ABoxes A for M and R, e.g., Ais complete w.r.t. Manager vA Employee.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 31 / 33
Formalization: Chains
Let T be a TBox, B, C basic concepts, and Σ a set of dependencies overT . A T -chain from B to C in T (resp., a Σ-chain from B to C in Σ) is asequence of concept inclusion assertions (Bi v B ′i )
ni=0 in T (resp., a
sequence of inclusion dependencies (Bi vA B ′i )ni=0 in Σ), for some n ≥ 0,
such that:
1 B0 = B, B ′n = C , and
2 for 1 ≤ i ≤ n, we have that B ′i−1 and Bi are basic concepts s.t., either
(i) B ′i−1 = Bi , or(ii) B ′i−1 = ∃R and Bi = ∃R−, for some basic role R.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 32 / 33
Formalization: Redundancy
Let T be a TBox, B, C basic concepts, and Σ a set of dependencies. Theconcept inclusion assertion B v C is directly redundant in T w.r.t. Σ if
(i) Σ |= B vA C and
(ii) for every T -chain (Bi v B ′i )ni=0 with B ′n = B in T , there is a Σ-chain
(Bi vA B ′i )ni=0.
Then, B v C is redundant in T w.r.t. Σ if
(a) it is directly redundant, or
(b) there exists B ′ 6= B s.t.
(i) T |= B ′ v C ,(ii) B ′ v C is not redundant in T w.r.t. Σ, and(iii) B v B ′ is directly redundant in T w.r.t. Σ.
Rodriguez-Muro and Calvanese (UNIBZ) APEX-Shanghai, 2011 July, 2011 33 / 33