logical approach to physical data independence and query …david/cs848s14/intro.pdf · such plans...

64
Logical Approach to Physical Data Independence and Query Compilation Introduction, Background, and Goals David Toman D.R. Cheriton School of Computer Science 1 / 27

Upload: others

Post on 17-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Logical Approach to Physical Data Independenceand Query Compilation

Introduction, Background, and Goals

David Toman

D.R. Cheriton School of Computer Science

1 / 27

Page 2: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

ORGANIZATION

Introduction and Goals 2 / 27

Page 3: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Lectures/Exercises

web page:http://cs.uwaterloo.ca/∼david/cs848s14/

schedule:

Thursday 9:30-12:19

Organization Introduction and Goals 3 / 27

Page 4: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Textbook (aka Shameless plug)

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

D. Toman and G. Weddell.Fundamentals of Physical Designand Query Compilation.Morgan and Claypool Data Man-agement Series, 2011.

Organization Introduction and Goals 4 / 27

Page 5: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Projects

Preferred projects will be related to applying (some) of the ideas presented inthis class to your own area of research: this can further your own researchand may help you to consider alternative views/approaches to what you havebeen thinking about already.

1 project proposal: one page due Lecture 6;2 project presentation: 10-20 minutes (depending on the number of

projects) in Lecture 10;3 report (in pdf, up to 10 pages), source code (if applicable) within a week

of last Lecture.

Organization Introduction and Goals 5 / 27

Page 6: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Assessment

1 class participation, including assignments (15%)2 in class presentation either of your project or of a paper from the reading

list (25%)3 project (60%)

Organization Introduction and Goals 6 / 27

Page 7: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

USE SCENARIOS AND GOALS

Organization Introduction and Goals 7 / 27

Page 8: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users.physical storage details hidden fromusers,changes to physical storage withoutaffecting conceptual view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

Use Scenarios and Goals Introduction and Goals 8 / 27

Page 9: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users.

physical storage details hidden fromusers,changes to physical storage withoutaffecting conceptual view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

Use Scenarios and Goals Introduction and Goals 8 / 27

Page 10: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users.physical storage details hidden fromusers,changes to physical storage withoutaffecting conceptual view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

Use Scenarios and Goals Introduction and Goals 8 / 27

Page 11: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users.

physical storage details hidden fromusers,changes to physical storage withoutaffecting conceptual view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

Use Scenarios and Goals Introduction and Goals 8 / 27

Page 12: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Example: PAYROLL

A Conceptual (user) view of PAYROLL data:

Example of PAYROLL data:1 Mary is an employee.2 Mary’s employee number is 3412.3 Mary’s salary is 72000.

Example of PAYROLL:4 There is a kind of entity called an employee.5 There are attributes called enumber, name and salary.6 Each employee entity has attributes enumber, name and salary.7 Employees are identified by their enumber.

Use Scenarios and Goals Introduction and Goals 9 / 27

Page 13: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Example: PAYROLL

A physical design for PAYROLL:

8 There is a file of records called emp-file.9 There are record fields emp-num, emp-name and emp-salary.

10 Each emp-file record has the fieldsemp-num, emp-name and emp-salary.

11 File emp-file is organized as a B-tree data structure that supports anemp-lookup operation, given a value for attribute enumber.

12 Records in file emp-file correspond one-to-one to employee entities.13 Record fields in file emp-file encode the corresponding attribute values

for employee entities, for example, emp-num encodes an enumber.

Use Scenarios and Goals Introduction and Goals 9 / 27

Page 14: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Ontology-based Data Access

IDEA:Queries are answered not only w.r.t. explicit data

but also w.r.t. background knowledge⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (background)

List all MORTALs⇒ {Socrates} (query)

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved [Calvanese et al.: Mastro]

Question:Is Aristoteles a MORTAL?

. . . can we really say “NO”?

Use Scenarios and Goals Introduction and Goals 10 / 27

Page 15: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Ontology-based Data Access

IDEA:Queries are answered not only w.r.t. explicit data

but also w.r.t. background knowledge⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (background)

List all MORTALs⇒ {Socrates} (query)

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved [Calvanese et al.: Mastro]

Question:Is Aristoteles a MORTAL?

. . . can we really say “NO”?

Use Scenarios and Goals Introduction and Goals 10 / 27

Page 16: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Ontology-based Data Access

IDEA:Queries are answered not only w.r.t. explicit data

but also w.r.t. background knowledge⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (background)

List all MORTALs⇒ {Socrates} (query)

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved [Calvanese et al.: Mastro]

Question:Is Aristoteles a MORTAL?

. . . can we really say “NO”?

Use Scenarios and Goals Introduction and Goals 10 / 27

Page 17: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Ontology-based Data Access

IDEA:Queries are answered not only w.r.t. explicit data

but also w.r.t. background knowledge⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (background)

List all MORTALs⇒ {Socrates} (query)

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved [Calvanese et al.: Mastro]

Question:Is Aristoteles a MORTAL?

. . . can we really say “NO”?

Use Scenarios and Goals Introduction and Goals 10 / 27

Page 18: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Ontology-based Data Access

IDEA:Queries are answered not only w.r.t. explicit data

but also w.r.t. background knowledge⇒ Ontology-based Data Access (OBDA)

ExampleSocrates is a MAN (explicit data)Every MAN is MORTAL (background)

List all MORTALs⇒ {Socrates} (query)

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved [Calvanese et al.: Mastro]

Question:Is Aristoteles a MORTAL? . . . can we really say “NO”?

Use Scenarios and Goals Introduction and Goals 10 / 27

Page 19: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Data Exchange

PROBLEM:How to transfer (reformat) data conforming to a source schema to

data conforming to a target schema?

[Arenas et al: Foundations of Data Exchange]

Issues:what should happen when the target is more complex than the source?how do we answer queries over the target?

Use Scenarios and Goals Introduction and Goals 11 / 27

Page 20: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Data Exchange

PROBLEM:How to transfer (reformat) data conforming to a source schema to

data conforming to a target schema?

[Arenas et al: Foundations of Data Exchange]

Issues:what should happen when the target is more complex than the source?how do we answer queries over the target?

Use Scenarios and Goals Introduction and Goals 11 / 27

Page 21: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Data Exchange

PROBLEM:How to transfer (reformat) data conforming to a source schema to

data conforming to a target schema?

[Arenas et al: Foundations of Data Exchange]

Issues:what should happen when the target is more complex than the source?how do we answer queries over the target?

Use Scenarios and Goals Introduction and Goals 11 / 27

Page 22: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Information Integration

IDEA:Data integration provides a uniform access to a set of data sources, through aunified representation called global schema. A mapping specifies therelationship between the global schema and the sources.

[Genesereth: Data Integration]

Variants “which way do the arrows point” [Lenzerini]GAV (global as a view), LAV (local as a view), and GLAV (“both ways”).

Use Scenarios and Goals Introduction and Goals 12 / 27

Page 23: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Information Integration

IDEA:Data integration provides a uniform access to a set of data sources, through aunified representation called global schema. A mapping specifies therelationship between the global schema and the sources.

[Genesereth: Data Integration]

Variants “which way do the arrows point” [Lenzerini]GAV (global as a view), LAV (local as a view), and GLAV (“both ways”).

Use Scenarios and Goals Introduction and Goals 12 / 27

Page 24: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, . . . )⇒ mappings connect the schemas⇒ (source) data only “in” the physical schema⇒ queries only over the conceptual/logical schema

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata and mappings3 (user level) Data representation4 (user level) Query language (semantics–aka when is an answer an answer?)

5 Algorithms/Execution model for queries: e.g., does materialization matter?

Use Scenarios and Goals Introduction and Goals 13 / 27

Page 25: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, . . . )⇒ mappings connect the schemas⇒ (source) data only “in” the physical schema⇒ queries only over the conceptual/logical schema

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata and mappings3 (user level) Data representation4 (user level) Query language (semantics–aka when is an answer an answer?)

5 Algorithms/Execution model for queries: e.g., does materialization matter?

Use Scenarios and Goals Introduction and Goals 13 / 27

Page 26: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, . . . )⇒ mappings connect the schemas⇒ (source) data only “in” the physical schema⇒ queries only over the conceptual/logical schema

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata and mappings3 (user level) Data representation4 (user level) Query language (semantics–aka when is an answer an answer?)5 Algorithms/Execution model for queries: e.g., does materialization matter?

Use Scenarios and Goals Introduction and Goals 13 / 27

Page 27: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence: My Motivation

Goal: Application of the Ideas to Embedded Systems1 High-level conceptual view of the system

[relational]

2 High level query (and, eventually, update) language

[SQL]

3 Fine-grained physical schema description

[records, pointers, . . . ]

4 Flexible conceptual-physical mappings5 Queries (updates) compiled to operations on physical level

[pointer navigation, field extraction, conditionals, . . . ]

ChallengeThe code generated from queries must be competitive with hand-written code.

LINUX-INFO System Introduction and Goals 14 / 27

Page 28: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence: My Motivation

Goal: Application of the Ideas to Embedded Systems1 High-level conceptual view of the system [relational]2 High level query (and, eventually, update) language [SQL]3 Fine-grained physical schema description [records, pointers, . . . ]4 Flexible conceptual-physical mappings5 Queries (updates) compiled to operations on physical level

[pointer navigation, field extraction, conditionals, . . . ]

ChallengeThe code generated from queries must be competitive with hand-written code.

LINUX-INFO System Introduction and Goals 14 / 27

Page 29: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Phyical Data Independence: My Motivation

Goal: Application of the Ideas to Embedded Systems1 High-level conceptual view of the system [relational]2 High level query (and, eventually, update) language [SQL]3 Fine-grained physical schema description [records, pointers, . . . ]4 Flexible conceptual-physical mappings5 Queries (updates) compiled to operations on physical level

[pointer navigation, field extraction, conditionals, . . . ]

ChallengeThe code generated from queries must be competitive with hand-written code.

LINUX-INFO System Introduction and Goals 14 / 27

Page 30: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

LINUX-INFO System: Conceptual View

Example of LINUX-INFO data:1 process (called) gcc is running;2 gcc’s process number is 1234;3 the user (id) running gcc is 145;4 gcc uses files “foo.c” and “foo.o”.

Example of LINUX-INFO metadata:5 There entities called process and file.6 There are attributes called pno, pname, uname, and fname.7 Each process entity has attributes pno, pname and uname.8 Each file entity has attribute fname.9 Processes are identified by their pno.

10 Files are identified by their fname.11 There is a relationship uses between processes and files.

LINUX-INFO System Introduction and Goals 15 / 27

Page 31: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

LINUX-INFO System: Conceptual View

Example of LINUX-INFO data:1 process (called) gcc is running;2 gcc’s process number is 1234;3 the user (id) running gcc is 145;4 gcc uses files “foo.c” and “foo.o”.

Example of LINUX-INFO metadata:5 There entities called process and file.6 There are attributes called pno, pname, uname, and fname.7 Each process entity has attributes pno, pname and uname.8 Each file entity has attribute fname.9 Processes are identified by their pno.

10 Files are identified by their fname.11 There is a relationship uses between processes and files.

LINUX-INFO System Introduction and Goals 15 / 27

Page 32: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

The LINUX-INFO System: Physical Design

A physical design for LINUX (selected by Linus Torvalds).12 There are process records called task-struct.13 Each task-struct record has record fields pid, uid, comm, and fds.14 All task-structs is organized as a tree data structure.15 The task-struct records correspond one-to-one to process entities.16 Record fields in task-struct encode the corresponding attribute

values for process entities, for example, pid encodes an pno, etc.17 Similarly, fss correspond appropriately to (open) file entities.18 fds field of task-struct is an array of fds; a non-null entry in this

array indicates that the process corresponding to this task-struct isusing the file identified by the name field of the fd record in the array.

LINUX-INFO System Introduction and Goals 16 / 27

Page 33: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

LINUX-INFO System: Queries and Query Plans

Back to DesiderataUser Query:

find all files used by processes invoked by user 145.

Query Plan:

for each task-struct t in tree of task-structscheck if t ’s uid field is 145 and, if so

scan the fds array in t andif the file descriptor (fd) is non-NULL

print out the name of file field in fd.

Is the plan correct?. . . and how do/can we answer this question?

LINUX-INFO System Introduction and Goals 17 / 27

Page 34: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

LINUX-INFO System: Queries and Query Plans

Back to DesiderataUser Query:

find all files used by processes invoked by user 145.

Query Plan:

for each task-struct t in tree of task-structscheck if t ’s uid field is 145 and, if so

scan the fds array in t andif the file descriptor (fd) is non-NULL

print out the name of file field in fd.

Is the plan correct?. . . and how do/can we answer this question?

LINUX-INFO System Introduction and Goals 17 / 27

Page 35: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

LINUX-INFO System: Queries and Query Plans

Back to DesiderataUser Query:

find all files used by processes invoked by user 145.

Query Plan:

for each task-struct t in tree of task-structscheck if t ’s uid field is 145 and, if so

scan the fds array in t andif the file descriptor (fd) is non-NULL

print out the name of file field in fd.

Is the plan correct?. . . and how do/can we answer this question?

LINUX-INFO System Introduction and Goals 17 / 27

Page 36: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

UNIFYING LOGIC-BASED APPROACH

LINUX-INFO System Introduction and Goals 18 / 27

Page 37: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Signatures

Vocabularies: Relational Model for both Conceptual and Physical Schemata.

Conceptual/Logical (SL):

predicate symbols R1/a1, . . . ,Rk/ak (ai is the arity of Ri )(possibly) constants c1, . . . , cn

Physical (SP):

predicate symbols S1/b1, . . . ,Sk/bk

a distinguished subset SA ⊆ SP of access paths⇒ denote capabilities to retrieve tuples (i.e., data structures)⇒ (optionally) binding patterns (restrictions on tuple retrieval)⇒ associated with set of tuples (closed-world semantics)

. . . a standard way of defining interpretations

Unifying Logic-based Approach Introduction and Goals 19 / 27

Page 38: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Signatures

Vocabularies: Relational Model for both Conceptual and Physical Schemata.

Conceptual/Logical (SL):

predicate symbols R1/a1, . . . ,Rk/ak (ai is the arity of Ri )(possibly) constants c1, . . . , cn

Physical (SP):

predicate symbols S1/b1, . . . ,Sk/bk

a distinguished subset SA ⊆ SP of access paths⇒ denote capabilities to retrieve tuples (i.e., data structures)⇒ (optionally) binding patterns (restrictions on tuple retrieval)⇒ associated with set of tuples (closed-world semantics)

. . . a standard way of defining interpretations

Unifying Logic-based Approach Introduction and Goals 19 / 27

Page 39: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Signatures

Vocabularies: Relational Model for both Conceptual and Physical Schemata.

Conceptual/Logical (SL):

predicate symbols R1/a1, . . . ,Rk/ak (ai is the arity of Ri )(possibly) constants c1, . . . , cn

Physical (SP):

predicate symbols S1/b1, . . . ,Sk/bk

a distinguished subset SA ⊆ SP of access paths⇒ denote capabilities to retrieve tuples (i.e., data structures)⇒ (optionally) binding patterns (restrictions on tuple retrieval)⇒ associated with set of tuples (closed-world semantics)

. . . a standard way of defining interpretations

Unifying Logic-based Approach Introduction and Goals 19 / 27

Page 40: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Constraints

Metadata: First-order sentences Σ over SL ∪ SP.

Conceptual/Logical (ΣL):

⇒ keys, inclusion dependencies, hierarchies, . . .

Physical (ΣP):

⇒ keys, inclusion dependencies, hierarchies, . . .⇒ formulae that link to symbols in SL (mapping constraints).

. . . we resort to fragments of FOL to gain better computational properties

Unifying Logic-based Approach Introduction and Goals 20 / 27

Page 41: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Constraints

Metadata: First-order sentences Σ over SL ∪ SP.

Conceptual/Logical (ΣL):

⇒ keys, inclusion dependencies, hierarchies, . . .

Physical (ΣP):

⇒ keys, inclusion dependencies, hierarchies, . . .⇒ formulae that link to symbols in SL (mapping constraints).

. . . we resort to fragments of FOL to gain better computational properties

Unifying Logic-based Approach Introduction and Goals 20 / 27

Page 42: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Metadata and Constraints

Metadata: First-order sentences Σ over SL ∪ SP.

Conceptual/Logical (ΣL):

⇒ keys, inclusion dependencies, hierarchies, . . .

Physical (ΣP):

⇒ keys, inclusion dependencies, hierarchies, . . .⇒ formulae that link to symbols in SL (mapping constraints).

. . . we resort to fragments of FOL to gain better computational properties

Unifying Logic-based Approach Introduction and Goals 20 / 27

Page 43: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Example: LINUX-INFO

Conceptual/Logical:

SL = { process/3,file/1,uses/2 }ΣL = { process(x , y1, z1) ∧ process(x , y2, z2)→ y1 = y2 ∧ z1 = z2,

uses(x , y)→ ∃z,w .process(x , z,w) ∧ file(y), . . . }

Physical:

SA = {task_struct/1/0,pid/2/1,uid/2/1,fds/2/1,fname/2/1}ΣP = { task_struct(x)→ ∃y , z,w .pid(x , y) ∧ uid(z) ∧ fds(x ,w)

pid(x1, y) ∧ pid(x2, y)→ x1 = x1process(x , y , z)→ ∃t .task_struct(t) ∧ pid(t , x), . . . }

Unifying Logic-based Approach Introduction and Goals 21 / 27

Page 44: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Queries and Answers

Queries: First-order formulae (ϕ) over SL.

⇒ ∃p,n,u.process(p,n,u) ∧ u = 145 ∧ uses(p, f ) ∧ file(f )

Data D:Sets of (ground) tuples that fix meaning of every access path.

Query Answers:answers in common when evaluating ϕ over every interpretation (database)that is a model of Σ and that extend D.

Definition (Certain Answers)certΣ,D(ϕ) = {~a | Σ ∪ D |= ϕ(~a)} logical implication

=⋂

I|=Σ∪D{~a | I |= ϕ(~a)} answer in every model

Unifying Logic-based Approach Introduction and Goals 22 / 27

Page 45: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Queries and Answers

Queries: First-order formulae (ϕ) over SL.

⇒ ∃p,n,u.process(p,n,u) ∧ u = 145 ∧ uses(p, f ) ∧ file(f )

Data D:Sets of (ground) tuples that fix meaning of every access path.

Query Answers:answers in common when evaluating ϕ over every interpretation (database)that is a model of Σ and that extend D.

Definition (Certain Answers)certΣ,D(ϕ) = {~a | Σ ∪ D |= ϕ(~a)} logical implication

=⋂

I|=Σ∪D{~a | I |= ϕ(~a)} answer in every model

Unifying Logic-based Approach Introduction and Goals 22 / 27

Page 46: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Queries and Answers

Queries: First-order formulae (ϕ) over SL.

⇒ ∃p,n,u.process(p,n,u) ∧ u = 145 ∧ uses(p, f ) ∧ file(f )

Data D:Sets of (ground) tuples that fix meaning of every access path.

Query Answers:answers in common when evaluating ϕ over every interpretation (database)that is a model of Σ and that extend D.

Definition (Certain Answers)certΣ,D(ϕ) = {~a | Σ ∪ D |= ϕ(~a)} logical implication

=⋂

I|=Σ∪D{~a | I |= ϕ(~a)} answer in every model

Unifying Logic-based Approach Introduction and Goals 22 / 27

Page 47: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Queries and Answers

Queries: First-order formulae (ϕ) over SL.

⇒ ∃p,n,u.process(p,n,u) ∧ u = 145 ∧ uses(p, f ) ∧ file(f )

Data D:Sets of (ground) tuples that fix meaning of every access path.

Query Answers:answers in common when evaluating ϕ over every interpretation (database)that is a model of Σ and that extend D.

Definition (Certain Answers)certΣ,D(ϕ) = {~a | Σ ∪ D |= ϕ(~a)} logical implication

=⋂

I|=Σ∪D{~a | I |= ϕ(~a)} answer in every model

Unifying Logic-based Approach Introduction and Goals 22 / 27

Page 48: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

The BAD News (and what can be done)

Theorem“~a ∈ certΣ,D(ϕ)?” is undecidable.

⇒ sources of undecidability: both Σ and ϕ!

Standard solution:1 restrict Σ to decidable fragments of FOL (e.g., DLs)2 restrict ϕ to a decidable fragment of FOL (e.g., UCQ)

SL,ΣL SP,ΣP queries

OBDA (lite) TBox ABox CQ/UCQ

Data Exchange target, target deps source, st-tgds CQ/UCQ

Information Integration global view local view, {G|L}AV CQ/UCQ

Unifying Logic-based Approach Introduction and Goals 23 / 27

Page 49: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

The BAD News (and what can be done)

Theorem“~a ∈ certΣ,D(ϕ)?” is undecidable.

⇒ sources of undecidability: both Σ and ϕ!

Standard solution:1 restrict Σ to decidable fragments of FOL (e.g., DLs)2 restrict ϕ to a decidable fragment of FOL (e.g., UCQ)

SL,ΣL SP,ΣP queries

OBDA (lite) TBox ABox CQ/UCQ

Data Exchange target, target deps source, st-tgds CQ/UCQ

Information Integration global view local view, {G|L}AV CQ/UCQ

Unifying Logic-based Approach Introduction and Goals 23 / 27

Page 50: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

The BAD News (and what can be done)

Theorem“~a ∈ certΣ,D(ϕ)?” is undecidable.

⇒ sources of undecidability: both Σ and ϕ!

Standard solution:1 restrict Σ to decidable fragments of FOL (e.g., DLs)2 restrict ϕ to a decidable fragment of FOL (e.g., UCQ)

SL,ΣL SP,ΣP queries

OBDA (lite) TBox ABox CQ/UCQ

Data Exchange target, target deps source, st-tgds CQ/UCQ

Information Integration global view local view, {G|L}AV CQ/UCQ

Unifying Logic-based Approach Introduction and Goals 23 / 27

Page 51: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

What do Relational Systems do??

IDEA: “make it look like a single model”(severely) restrict what logical schema may look like:

every logical predicate P(~x) must correspond 1-1 to some access path.

. . . conceptual/logical symbols in queries are (mere aliases of) access paths.. . . completely against the idea of physical data independence.

IDEA-2: “only queries that think there is a single model”A formula ϕ is domain independent if for all pairs of models I1, I2 of D andvaluation θ we have

I1, θ |= ϕ if and only if I2, θ |= ϕ.

. . . I1 and I2 can only differ in their domains (hence the name).

Unifying Logic-based Approach Introduction and Goals 24 / 27

Page 52: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

What do Relational Systems do??

IDEA: “make it look like a single model”(severely) restrict what logical schema may look like:

every logical predicate P(~x) must correspond 1-1 to some access path.

. . . conceptual/logical symbols in queries are (mere aliases of) access paths.. . . completely against the idea of physical data independence.

Is this enough?

¬P(x)? ∀x .P(x)?

. . . depend on the domain of the model

IDEA-2: “only queries that think there is a single model”A formula ϕ is domain independent if for all pairs of models I1, I2 of D andvaluation θ we have

I1, θ |= ϕ if and only if I2, θ |= ϕ.

. . . I1 and I2 can only differ in their domains (hence the name).

Unifying Logic-based Approach Introduction and Goals 24 / 27

Page 53: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

What do Relational Systems do??

IDEA: “make it look like a single model”(severely) restrict what logical schema may look like:

every logical predicate P(~x) must correspond 1-1 to some access path.

. . . conceptual/logical symbols in queries are (mere aliases of) access paths.. . . completely against the idea of physical data independence.

Is this enough? ¬P(x)? ∀x .P(x)?

. . . depend on the domain of the model

IDEA-2: “only queries that think there is a single model”A formula ϕ is domain independent if for all pairs of models I1, I2 of D andvaluation θ we have

I1, θ |= ϕ if and only if I2, θ |= ϕ.

. . . I1 and I2 can only differ in their domains (hence the name).

Unifying Logic-based Approach Introduction and Goals 24 / 27

Page 54: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

What do Relational Systems do??

IDEA: “make it look like a single model”(severely) restrict what logical schema may look like:

every logical predicate P(~x) must correspond 1-1 to some access path.

. . . conceptual/logical symbols in queries are (mere aliases of) access paths.. . . completely against the idea of physical data independence.

Is this enough? ¬P(x)? ∀x .P(x)? . . . depend on the domain of the model

IDEA-2: “only queries that think there is a single model”A formula ϕ is domain independent if for all pairs of models I1, I2 of D andvaluation θ we have

I1, θ |= ϕ if and only if I2, θ |= ϕ.

. . . I1 and I2 can only differ in their domains (hence the name).

Unifying Logic-based Approach Introduction and Goals 24 / 27

Page 55: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

What do Relational Systems do??

IDEA: “make it look like a single model”(severely) restrict what logical schema may look like:

every logical predicate P(~x) must correspond 1-1 to some access path.

. . . conceptual/logical symbols in queries are (mere aliases of) access paths.. . . completely against the idea of physical data independence.

IDEA-2: “only queries that think there is a single model”A formula ϕ is domain independent if for all pairs of models I1, I2 of D andvaluation θ we have

I1, θ |= ϕ if and only if I2, θ |= ϕ.

. . . I1 and I2 can only differ in their domains (hence the name).

Unifying Logic-based Approach Introduction and Goals 24 / 27

Page 56: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

A LOGSPACE Algorithm

IDEADomain independent formulae can be evaluated in a model based on theactive domain of D (set of individuals that appear in the access paths).

. . . active domain of D is a finite set.

A Turing machine Tϕ

read only input tape storing (an encoding of) ~a and D;read/write work tape storing a counter for each variable in ϕ (log |D| bits)and fixed number of auxiliary counters;a finite control that implements top-down satisfaction check w.r.t. avaluation defined by the current state of the counters

⇒ used as pointers to individuals on the work tape.

Theorem

certΣ,D(ϕ) = {~a | 〈~a,D〉 ∈ L(Tϕ)}.

Unifying Logic-based Approach Introduction and Goals 25 / 27

Page 57: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

A LOGSPACE Algorithm

IDEADomain independent formulae can be evaluated in a model based on theactive domain of D (set of individuals that appear in the access paths).

. . . active domain of D is a finite set.

A Turing machine Tϕ

read only input tape storing (an encoding of) ~a and D;read/write work tape storing a counter for each variable in ϕ (log |D| bits)and fixed number of auxiliary counters;a finite control that implements top-down satisfaction check w.r.t. avaluation defined by the current state of the counters

⇒ used as pointers to individuals on the work tape.

Theorem

certΣ,D(ϕ) = {~a | 〈~a,D〉 ∈ L(Tϕ)}.

Unifying Logic-based Approach Introduction and Goals 25 / 27

Page 58: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

A LOGSPACE Algorithm

IDEADomain independent formulae can be evaluated in a model based on theactive domain of D (set of individuals that appear in the access paths).

. . . active domain of D is a finite set.

A Turing machine Tϕ

read only input tape storing (an encoding of) ~a and D;read/write work tape storing a counter for each variable in ϕ (log |D| bits)and fixed number of auxiliary counters;a finite control that implements top-down satisfaction check w.r.t. avaluation defined by the current state of the counters

⇒ used as pointers to individuals on the work tape.

Theorem

certΣ,D(ϕ) = {~a | 〈~a,D〉 ∈ L(Tϕ)}.

Unifying Logic-based Approach Introduction and Goals 25 / 27

Page 59: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

A LOGSPACE Algorithm

IDEADomain independent formulae can be evaluated in a model based on theactive domain of D (set of individuals that appear in the access paths).

. . . active domain of D is a finite set.

A Turing machine Tϕ

read only input tape storing (an encoding of) ~a and D;read/write work tape storing a counter for each variable in ϕ (log |D| bits)and fixed number of auxiliary counters;a finite control that implements top-down satisfaction check w.r.t. avaluation defined by the current state of the counters

⇒ used as pointers to individuals on the work tape.

Theorem

certΣ,D(ϕ) = {~a | 〈~a,D〉 ∈ L(Tϕ)}.

Unifying Logic-based Approach Introduction and Goals 25 / 27

Page 60: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Range-restricted Formulas and Relational Algebra

Nobody uses that algorithm!

Instead:Range-restricted Formulae (queries):

ϕ ::= R(~x) | ϕ ∧ x = y | ϕ ∧ ϕ | ∃s.ϕ | ϕ ∨ ϕ | ϕ ∧ ¬ϕ

Bottom-up “Algebraic” Query Evaluation:every production above maps (at least naively) to a algebraicoperation on finite relations:

scan (with renaming),selection,join,projection,union, anddifference.

Datalog (limited iteration)additional predicates defined as a fixpoint positive queryallows PTIME-complete problems.

Unifying Logic-based Approach Introduction and Goals 26 / 27

Page 61: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Range-restricted Formulas and Relational Algebra

Nobody uses that algorithm! Instead:Range-restricted Formulae (queries):

ϕ ::= R(~x) | ϕ ∧ x = y | ϕ ∧ ϕ | ∃s.ϕ | ϕ ∨ ϕ | ϕ ∧ ¬ϕ

Bottom-up “Algebraic” Query Evaluation:every production above maps (at least naively) to a algebraicoperation on finite relations:

scan (with renaming),selection,join,projection,union, anddifference.

Datalog (limited iteration)additional predicates defined as a fixpoint positive queryallows PTIME-complete problems.

Unifying Logic-based Approach Introduction and Goals 26 / 27

Page 62: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Range-restricted Formulas and Relational Algebra

Nobody uses that algorithm! Instead:Range-restricted Formulae (queries):

ϕ ::= R(~x) | ϕ ∧ x = y | ϕ ∧ ϕ | ∃s.ϕ | ϕ ∨ ϕ | ϕ ∧ ¬ϕ

Bottom-up “Algebraic” Query Evaluation:every production above maps (at least naively) to a algebraicoperation on finite relations:

scan (with renaming),selection,join,projection,union, anddifference.

Datalog (limited iteration)additional predicates defined as a fixpoint positive queryallows PTIME-complete problems.

Unifying Logic-based Approach Introduction and Goals 26 / 27

Page 63: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Summary

comprehensive framework based on certain answers that unifies manydatabase/KR approaches to handling information in presence ofbackground information/theory/ontology;too expressive and in turn computationally in-feasible;practical (relational) systems: (almost) trivial instance of the framework.

Plan of Lectures:1 Classical OBDA: another way of gaining tractability (and its limits)2 Database Approach Extension and Interpolation3 Modeling Complex Physical Designs4 Updates of Data and Future Directions

Unifying Logic-based Approach Introduction and Goals 27 / 27

Page 64: Logical Approach to Physical Data Independence and Query …david/cs848s14/intro.pdf · Such plans access various concrete data sources through their low-level often iterator-based

Summary

comprehensive framework based on certain answers that unifies manydatabase/KR approaches to handling information in presence ofbackground information/theory/ontology;too expressive and in turn computationally in-feasible;practical (relational) systems: (almost) trivial instance of the framework.

Plan of Lectures:1 Classical OBDA: another way of gaining tractability (and its limits)2 Database Approach Extension and Interpolation3 Modeling Complex Physical Designs4 Updates of Data and Future Directions

Unifying Logic-based Approach Introduction and Goals 27 / 27