introduction to database theory · introduction relational model recursive queries query evaluation...

110
Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion Introduction to Database Theory Pierre Senellart ÉCOLE NORMALE SUPÉRIEURE RESEARCH UNIVERSITY PARIS École de Printemps en Informatique Théorique, 2019

Upload: others

Post on 19-Jul-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Introduction to Database Theory

Pierre Senellart

ÉCOLE NORMALE

S U P É R I E U R E

RESEARCH UNIVERSITY PARIS

École de Printemps en Informatique Théorique, 2019

Page 2: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

2/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data management

Numerous applications (standalone software, Web sites, etc.)need to manage data:� Structure data useful to the application� Store them in a persistent manner (data retained even

when the application is not running)� Efficiently query information within large data volumes� Update data without violating some structural constraints� Enable data access and updates by multiple users, possibly

concurrently

Often, desirable to access the same data from several distinctapplications, from distinct computers.

Page 3: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

3/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Role of a DBMS

Database Management SystemSoftware that simplifies the design of applications that handledata, by providing a unified access to the functionalitiesrequired for data management, whatever the application.

DatabaseCollection of data (specific to a given application) managed bya DBMS

Page 4: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

4/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Features of DBMSs (1/2)Physical independence. The user of a DBMS does not need to

know how data are stored (in a file, on a rawpartition, in a distributed filesystem, etc.); storagecan be modified without impacting data access

Logical independence. It is possible to provide the user with apartial view of the data, corresponding to what heneeds and is allowed to access

Ease of data access. Use of a declarative language describingqueries and updates on the data, specifying theintent of a user rather than the way this will beimplemented

Query optimization. Queries are automatically optimized to beimplemented as efficiently as possible on thedatabase

Page 5: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

5/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Features of DBMSs (2/2)Logical integrity. The DBMS imposes constraints on data

structure; every modification violating theseconstraints is denied

Physical integrity. The database remains in a coherent state,and data are durably preserved, even in case ofsoftware or hardware failure

Data sharing. Data are accessible by multiple users,concurrently, and these multiple and concurrentaccesses cannot violate logical or physical dataintegrity

Standardization. The use of a DBMS is standardized, so that itmay be possible to replace a DBMS with anotherwithout changing (in a major way) the code of theapplication

Page 6: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

6/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Major types of DBMSsRelational (RDBMS). Tables, complex queries (SQL), rich

featuresXML. Trees, complex queries (XQuery),

features similar to RDBMSGraph/Triples. Graph data, complex queries expressing

graph navigationObjects. Complex data model, inspired by OOP

Documents. Complex data, organized in documents,relatively simple queries and features

Key–Value. Very basic data model, focus onperformance

Column Stores. Data model in between key–value andRDBMS; focus on iteration andaggregation on columns

NoS

QL

Page 7: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

6/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Major types of DBMSsRelational (RDBMS). Tables, complex queries (SQL), rich

featuresXML. Trees, complex queries (XQuery),

features similar to RDBMSGraph/Triples. Graph data, complex queries expressing

graph navigationObjects. Complex data model, inspired by OOP

Documents. Complex data, organized in documents,relatively simple queries and features

Key–Value. Very basic data model, focus onperformance

Column Stores. Data model in between key–value andRDBMS; focus on iteration andaggregation on columns

NoS

QL

Page 8: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

7/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Classical relational DBMSs

� Based on the relational model: decomposition of data intorelations (i.e., tables)� A standard query language: SQL� Data stored on disk� Relations (tables) stored line after line� Centralized system, with limited distribution possibilities

Page 9: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

8/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Database theory

� Well-established research area within computer science,concerned with foundational et theoretical aspects of datamanagement

� Motivations:� provide through theoretical tools advances in the practice of

data management� provide through the scope of data management advances in

theoretical computer science

� The success of current-day DBMSs relies in particular on atheoretical result: Codd’s theorem� This lecture: High-level introduction to database theory,focusing on the relational model

Page 10: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

9/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational ModelModelRelational AlgebraRelational calculus

Recursive Queries

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 11: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

10/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Relational schemaWe fix countably infinite sets:� L of labels� V of values� T of types, s.t., 8� 2 T ; � � V

DefinitionA relation schema (of arity n) is an n-tuple (A1; : : : ; An) whereeach Ai (called an attribute) is a pair (Li; �i) with Li 2 L,�i 2 T and such that all Li are distinct

DefinitionA database schema is defined by a finite set of labels L � L(relation names), each label of L being mapped to a relationschema.

Page 12: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

11/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Example database schema� Universe:

� L the set of alphanumeric character strings starting with aletter

� V the set of finite sequences of bits� T is formed of types such as INTEGER (representation as a

sequence of bits of integers between �231 and 231 � 1), REAL(representation of floating-point numbers following IEEE754), TEXT (UTF-8 representation of character strings),DATE (ISO8601 representation of dates), etc.

� Database schema formed of 2 relation names, Guest andReservation� Guest: ((id; INTEGER); (name; TEXT); (email; TEXT))� Reservation:((id; INTEGER); (guest; INTEGER); (room; INTEGER),(arrival; DATE); (nights; INTEGER))

Page 13: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

12/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Database

DefinitionAn instance of a relation schema ((L1; �1); : : : ; (Ln; �n)) (alsocalled a relation on this schema) is a finite set ft1; : : : ; tkg oftuples of the form tj = (vj1; : : : ; vjn) with 8j8i vji 2 �i.

DefinitionAn instance of a database schema (or, simply, a database onthis schema) is a function that maps each relation name to aninstance of the corresponding relation schema.

Note: Relation is used somewhat ambiguously to talk about arelation schema or an instance of a relation schema.

Page 14: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

13/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

ExampleGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Page 15: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

14/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Some notation

� If A = (L; � ) is the ith attribute of a relation R, and t ann-tuple of an instance of R, we note t[A] (or t[L]) the valueof the ith component of t.� Similarly, if A is a k-tuple of attributes among the nattributes of R, t[A] is the k-tuple formed from t byconcatenating the t[A] for A in A.� A tuple is an n-tuple for some n.

Page 16: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

15/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Variants: named and unnamed perspectivesThe version presented considers the attributes of a relation areordered and have a name. This is what best matches the wayRDBMSs work, but not necessarily the most pleasant to reasonon the relational model.

Named perspective. We forget the position of attributes, andconsider they are uniquely identified by theirnames.

Unnamed perspective. We forget the name of attributes, andconsider they are uniquely identified by theirposition. One uses notation such as t[2] to accessthe value of the second attribute of a tuple.

No major impact, one will use one or the other depending onwhat is convenient.

Page 17: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

16/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Variant: bag semantics

� A relation instance is defined as a (finite) set of tuples.One can also consider a bag semantics of the relationalmodel, where a relation instance is a multiset of tuples.� This is what best matches how RDBMSs work. . .� . . . but most of relational database theory has beenestablished for the set semantics, more convenient to workwith� We will mostly discuss the set semantics in this lecture

Page 18: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

17/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Variant: untyped version

� In implementations, attributes are always typed� In models and theoretical results, one often abstractsattribute types away and considers each attribute has auniversal type V� We will most often omit attribute types

Page 19: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

18/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational ModelModelRelational AlgebraRelational calculus

Recursive Queries

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 20: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

19/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

The relational algebra� Algebraic language to express queries� A relational algebra expression produces a new relationfrom the database relations� Each operator takes 0, 1, or 2 subexpressions� Main operators:

Op. Arity Description Condition

R 0 Relation name R 2 L

�A!B 1 Renaming A;B 2 L

�A1:::An 1 Projection A1 : : : An 2 L

�� 1 Selection � formula� 2 Cross product[ 2 Unionn 2 Difference./� 2 Join � formula

Page 21: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

20/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Relation nameGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: GuestResult:

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Page 22: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

21/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

RenamingGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �id!guest(Guest)Result:

guest name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Page 23: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

22/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

ProjectionGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �email;id(Guest)Result:

email id

[email protected] [email protected] [email protected] 3

Page 24: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

23/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

SelectionGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �arrival>2017-01-12^guest=2(Reservation)Result:

id guest room arrival nights

4 2 504 2017-01-15 25 2 107 2017-01-30 1

The formula used in the selection can be any Booleancombination of comparisons of attributes to attributes orconstants.

Page 25: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

24/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Cross productGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �id(Guest)� �name(Guest)Result:

id name

1 Alice Black2 Alice Black3 Alice Black1 John Smith2 John Smith3 John Smith

Page 26: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

25/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

UnionGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �room(�guest=2(Reservation)) [�room(�arrival=2017-01-15(Reservation))

Result:room

107302504

This simple union could have been written�room(�guest=2_arrival=2017-01-15(Reservation)). Not always possible.

Page 27: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

25/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

UnionGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �room(�guest=2(Reservation)) [�room(�arrival=2017-01-15(Reservation))

Result:room

107302504

This simple union could have been written�room(�guest=2_arrival=2017-01-15(Reservation)). Not always possible.

Page 28: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

26/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

DifferenceGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �room(�guest=2(Reservation)) n�room(�arrival=2017-01-15(Reservation))

Result:room

107

This simple difference could have been written�room(�guest=2^arrival6=2017-01-15(Reservation)). Not alwayspossible.

Page 29: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

26/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

DifferenceGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: �room(�guest=2(Reservation)) n�room(�arrival=2017-01-15(Reservation))

Result:room

107

This simple difference could have been written�room(�guest=2^arrival6=2017-01-15(Reservation)). Not alwayspossible.

Page 30: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

27/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

JoinGuest

id name email

1 John Smith [email protected] Alice Black [email protected] John Smith [email protected]

Reservation

id guest room arrival nights

1 1 504 2017-01-01 52 2 107 2017-01-10 33 3 302 2017-01-15 64 2 504 2017-01-15 25 2 107 2017-01-30 1

Expression: Reservation ./guest=id GuestResult:

id guest room arrival nights name email

1 1 504 2017-01-01 5 John Smith [email protected] 2 107 2017-01-10 3 Alice Black [email protected] 3 302 2017-01-15 6 John Smith [email protected] 2 504 2017-01-15 2 Alice Black [email protected] 2 107 2017-01-30 1 Alice Black [email protected]

The formula used in the join can be any Boolean combinationof comparisons of attributes of the table on the left toattributes of the table on the right.

Page 31: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

28/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Note on the join

� The join is not an elementary operator of the relationalalgebra (but it is very useful)� It can be seen as a combination of renaming, cross product,selection, projection� Thus:

Reservation ./guest=id Guest

��id;guest;room;arrival;nights;name;email(

�guest=temp(Reservation� �id!temp(Guest)))

� If R and S have for attributes A and B, we note R ./ S thenatural join of R and S, where the join formula isVA2A\B A = A.

Page 32: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

29/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Bag semantics

In bag semantics (what is actually used by RDBMS):

� All operations return multisets� In particular, projection and union can introduce multisets

even when initial relations are sets

Page 33: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

30/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Extension: Aggregation� Various extensions have been proposed to the relationalalgebra to add additional features� In particular, aggregation and grouping [Klug, 1982,Libkin, 2003] of results� With a syntax inspired from [Libkin, 2003]:

�avg>3( avgroom[�x:avg(x)](�room;nights(Reservation)))

computes the average number of nights per reservation foreach room having an average greater than 3

room avg

302 6504 3.5

Page 34: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

31/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational ModelModelRelational AlgebraRelational calculus

Recursive Queries

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 35: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

32/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Relational calculus

� Logical language to express queries� First-order logic formula, without function symbols, andwith relation symbols the labels of the database schema(plus comparison predicates)� Unnamed, untyped perspective� Fix:

� A set X of variables� A set V of values� A database schema S

Page 36: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

33/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Relational calculus: Syntax

� For every relation R 2 S of arity n, for every(�1; : : : ; �n) 2 (X [ V)n: R(�1; : : : ; �n) 2 FO

� Also allow equality predicate, possibly inequality� For every (�1; �2) 2 FO2, for every x 2 X :

� �1 ^ �2 2 FO� �1 _ �2 2 FO� :�1 2 FO� 8x �1 2 FO� 9x �1 2 FO

� Free variables of � 2 FO: variables x appearing in � andnot qualified by a 8x or a 9x� One writes a relational calculus query in the formQ(x1; : : : ; xm) = � where x1; : : : ; xm are free variables of �

Page 37: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

34/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Relational calculus: Semantics� A relational calculus query on schema S can be seen as a

function with input a database D over S and producing arelation as output� adom(D): active domain of D, set of values in D

� If Q(x1; : : : ; xn) = � is a calculus query over S and D adatabase over S, then:

Q(D) = f (v1; : : : ; vn) 2 (adom(D))n j D j= �[x1=v1; : : : ; xn=vn] g

where D j= � is defined inductively:� D j= R(u1; : : : ; um) () R(u1; : : : ; um) 2 D� D j= �1 ^ �2 () D j= �1 ^D j= �2� D j= �1 _ �2 () D j= �1 _D j= �2� D j= :�1 () D 6j= �1� D j= 8x �1 () 8v 2 adom(D)D j= �1[x=v]� D j= 9x �1 () 9v 2 adom(D)D j= �1[x=v]

Page 38: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

35/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Codd’s theorem

Theorem ([Codd, 1972])The relational algebra and the relational calculus areequivalent:� for every relational algebra query q over a schema S,there exists a relational calculus query Q over S suchthat for every database D over S, q(D) = Q(D)

� for every relational calculus query Q over a schema S,there exists a relational algebra query q over S suchthat for every database D over S, q(D) = Q(D)

Furthermore, translating from one formalism to the othercan be done in polynomial time.

Page 39: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

36/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Why is this important?� Allows using a declarative formalism to specify queries:logics. . . or SQL� These queries are then compiled via Codd’s transformationinto an algebraic formalism� Algebraic queries are then optimized, by using theproperties of the relational algebra (transformation rules,e.g., pushing selection within joins, exploiting associativityof joins, etc.)� Optimized queries can then be evaluated, by exploiting thefact that each operator of the relational algebra can easilybe implemented (in several different ways, to be chosenbased on a cost function)� This is RDBMS Implementation 101, a main reason of thesuccess of RDBMSs!

Page 40: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

37/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Subclasses of queries

� Conjunctive query (CQ): relational calculus query without_, :, 8� Positive query (PQ): relational calculus query without :, 8� Union of conjunctive queries (UCQ): special case ofpositive query where the _ and ^ form a DNF formula

Expressiveness

� CQs are equivalent to the relational algebra without [and n, and where � does not feature disjunction� UCQs are equivalent to PQs (but exponential blow-up),and equivalent to the relational algebra without n

Page 41: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

37/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Subclasses of queries

� Conjunctive query (CQ): relational calculus query without_, :, 8� Positive query (PQ): relational calculus query without :, 8� Union of conjunctive queries (UCQ): special case ofpositive query where the _ and ^ form a DNF formula

Expressiveness

� CQs are equivalent to the relational algebra without [and n, and where � does not feature disjunction� UCQs are equivalent to PQs (but exponential blow-up),and equivalent to the relational algebra without n

Page 42: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

38/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive QueriesDatalogAlgebraFixpoint logics

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 43: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

39/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Motivation: recursive queries

� Query languages considered so far (relational algebra,calculus) have a limited horizon

� Some data structures (trees, graphs) require arbitrarilydeep navigation, recursion

� How can we build a theory of recursive query languages?� RDBMSs are not always adapted to this type of

data/queries, cf. XML or graph DBMSs� Example application: transitive closure of a graphG(from; to)

Page 44: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

40/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive QueriesDatalogAlgebraFixpoint logics

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 45: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

41/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Datalog

� Simplest recursive query language: adding recursion toconjunctive queries� Inspired from logic programming� Datalog query (or program): set of rules that produceintensional facts� Schema of a Datalog program: classical database schema(extensional schema) + (disjoint) schema of intensionalfacts (intensional schema)� Fix a distinguished relation Goal of the intensionalschema, whose arity is the arity of the query

Page 46: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

42/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Syntax

Finite set of rules r of the form:

S(y)| {z }head

R1(x1); : : : ; Rn(xn)| {z }body

with:� S relation of the intensional schema� R1; : : : ; Rn relations of the intensional or extensional

schemas� x1; : : : ;xn;y: tuples of variables (or possibly constants), of

arity compatible with the relations� Each variable in the head is present in the body

Page 47: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

43/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Fix-point semantics� Each rule r of a program P can be seen as a conjunctive

query on the database D:

r(D) ��= fS(y) j 9z1 : : : zk R1(x1) 2 D ^ � � � ^Rn(xn) 2 Dg

where the zi’s are the variables of the rule body� Consequence operator �P defined by:

�P (D) := D [[

r2P

fr(D)g

� We consider the sequence (Dn) defined by:D0 = D, Dn+1 = �P (Dn)

� The semantics of P over D is the set of facts of the relationGoal in D1, the fixpoint of the sequence (Dn)

Page 48: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

44/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Example: transitive closure

Goal(x; y) G(x; y)

Goal(x; y) Goal(x; z); G(z; y)

Page 49: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

45/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive QueriesDatalogAlgebraFixpoint logics

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 50: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

46/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

While operator� Algebra: essentially imperative programming (in contrast

to calculus)� One can add to the algebra:

� the possibility to define intermediary variables and toassign them values (does not affect expressive power)

� a while operator of the form:

while change do

. . . assign value to one or several variables. . .

done

� Semantics: the content of the loop is evaluated while theassignments change the underlying variables

� Infinite loops possible!

Page 51: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

47/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Inflationary vs non-inflationary while

� Two variants:Non-inflationary Assignment operator ��=, arbitrary

assignmentInflationary Assignment operator +=, the assigned value

can only grow� Infinite loops impossible with an inflationary while

(because the active domain is finite)� Non-inflationary potentially more expressive

Page 52: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

48/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Example: transitive closureWe introduce a relation schema C with attributesfrom and to, similarly to G.Non-inflationary

C ��= G

while change do

C ��= C [ �from;to(�to!int (C) on �from!int (G))

done

CInflationary

C += G

while change do

C += �from;to(�to!int (C) on �from!int (G))

done

C

Page 53: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

49/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive QueriesDatalogAlgebraFixpoint logics

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 54: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

50/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Non-inflationary fixpoint

� We add to the relational calculus a fixpoint construction� Let �(T ) be a calculus formula, mentioning the schemarelations as well as a new relation T , having for freevariables x1; : : : ; xn� Then �T [�(T )](x1; : : : ; xn) is a formula of the fixpoint

calculus� Semantics: in terms of the least fixpoint (if it exists) of therelation T , obtained by replacing T at each step with theset of facts of the form T (x1; : : : ; xn) for �(T )(x1; : : : ; xn)satisfied (starting with T = ;)� Equivalent to algebra with non-inflationary while!

Page 55: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

51/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Inflationary fixpoint

� We add to the relational calculus a fixpoint construction� Let �(T ) be a calculus formula, mentioning the schema

relations as well as a new relation T , having for freevariables x1; : : : ; xn� Then �+T [�(T )](x1; : : : ; xn) is a formula of the fixpointcalculus� Semantics: in terms of the least fixpoint of the relation T ,

obtained by adding to T at each step with the set of factsof the form T (x1; : : : ; xn) for �(T )(x1; : : : ; xn) satisfied(starting with T = ;)� Equivalent to algebra with inflationary while!

Page 56: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

52/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Example: transitive closure

Non-inflationary

f (x; y) j

�C [G(x; y) _ C(x; y) _ (9z C(x; z) ^G(z; y))](x; y)g

Inflationary

f (x; y) j

�+C [G(x; y) _ (9z C(x; z) ^G(z; y))](x; y)g

Page 57: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

53/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query EvaluationConjunctive QueriesFirst-order logicFixpoint logics

Static Analysis of Queries

Conclusion

Page 58: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

54/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Query evaluation

� Query Q in some query language (CQ, FO, FO+�,FO+�+. . . ) – we will use a logical formalism here

� Database D (always finite!)� Query evaluation: Computing Q(D)

� Complexity of this problem?� To simplify the study of complexity, we often assume thatQ is a Boolean query, i.e., it returns > or ?

Page 59: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

55/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity

For some fixed Q, what is the complexity of computing Q(D) interms of the size of the database D?

Page 60: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

56/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Combined complexity

For some query language Q, what is the complexity ofcomputing Q(D) in terms of the size of the query Q 2 Q and ofthe database D?

Page 61: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

57/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Complexity classes� We restrict to Boolean problems (returning > or ?)� Set of all problems solvable by a resource-constrainedcomputing method:� For example:

PTIME: deterministic Turing machine in polynomialtime

NP: non-deterministic Turing machine inpolynomial time

PSPACE: deterministic Turing machine in polynomialspace

AC0: Boolean circuit of polynomial size andconstant depth

� We know that: AC0 ( PTIME � NP � PSPACE� Open whether PSPACE � PTIME (!)

Page 62: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

58/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Membership and hardness for a class

� A problem P belongs to a complexity class C (or in C) if itis solvable by the corresponding resource-constrainedcomputing method� A problem P is hard for a complexity class C (or C-hard) ifthere exists a reduction that transforms whatever problemP 0 2 C into an instance of the problem P

� complete: in C + C-hard� Several ways to define reductions� Here, we assume that there exists a function computable inpolynomial time that transforms one instance I 0 of problemP 0 into an instance I of P such that P (I) = P 0(I 0)

Page 63: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

59/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Descriptive complexity

� A query language Q captures a complexity class C if:� For all Q 2 Q, query evaluation of query Q in in C (data

complexity)� For all problem P in C, there exists a query Q 2 Q such

that evaluating Q exactly solves P (without a reduction)!

� If Q captures C and if C has problems that are complete forC, then there exists Q 2 Q such that Q is C-complete, butthe converse is not true

Page 64: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

60/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query EvaluationConjunctive QueriesFirst-order logicFixpoint logics

Static Analysis of Queries

Conclusion

Page 65: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

61/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity

TheoremCQ evaluation is PTIME in data complexity.

Proof.By enumerating all valuations of variables of the query in thedatabase. We will see later a much stronger result.

Page 66: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

62/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Combined complexity

TheoremCQ evaluation is NP-complete in combined complexity.

Proof.Membership in NP is easy. Hardness for NP can be proved byreduction from graph 3-colorability.

Page 67: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

63/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

�-acyclic query

� A CQ can be seen as a hypergraph (vertices are variables,hyperedges the atoms of the CQ, labeled by the relationname)� A hypergraph H has a join tree where one can find a treewhose nodes are labeled by the hyperedges of H and suchthat:� every hyperedge of H appears as the label of one node of

the tree;� for every vertex x of H, the set of tree nodes labeled by a

hyperedge referring to x is a connected subtree

� A query is �-acyclic if its hypergraph has a join tree� Can be obtained in linear time if it exists [Tarjan andYannakakis, 1984]

Page 68: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

64/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Yannakakis’s algorithm [Yannakakis, 1981]

� Algorithm to evaluate acyclic queries (non-necessarilyBoolean):1. Construct the join tree2. Eliminate all useless tuples of a relation with the semijoin

operator n: Rn S = �R:�(R on S) by navigating twice inthe join tree: from bottom up, then from top down

3. Evaluate the query bottom up, by computing joins followingthe tree and by projecting useless variables out as you go

� Polynomial complexity in the size of the query, the input,and the output (combined complexity)

Page 69: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

65/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query EvaluationConjunctive QueriesFirst-order logicFixpoint logics

Static Analysis of Queries

Conclusion

Page 70: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

66/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity

TheoremFO evaluation is PTIME in data complexity.

Proof.By rewriting in prenex form and naive evaluation.

Page 71: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

67/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

FO does not capture the whole of PTIME

TheoremOne cannot compute in FO that a relation containing atotal order has an even number of elements, or that a graphis connected.

Fairly complex to prove, relies on Ehrenfeucht–Fraïssé games(see [Libkin, 2004]).

Page 72: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

68/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity, more precise

TheoremFO evaluation is AC0 in data complexity.

Proof.By rewriting to the relational algebra.

Page 73: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

69/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Combined complexity

TheoremFO evaluation is PSPACE-complete in combinedcomplexity.

Proof.Membership in PSPACE easy. Hardness for PSPACE from theQSAT problem.

Page 74: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

70/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query EvaluationConjunctive QueriesFirst-order logicFixpoint logics

Static Analysis of Queries

Conclusion

Page 75: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

71/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity

TheoremEvaluation of FO+�+ is PTIME in data complexity.Evaluation of FO+� is PSPACE in data complexity.

Proof.Direct. Only need polynomial space to detect infinite loops.

Page 76: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

72/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Connectedness, parity

TheoremConnectedness is expressible in FO+�+.

The problem of determining whether an arbitrary relationhas an even number of elements (or tuples) cannot beexpressed in FO+�.

Proof.First part easy, recall transitive closure query.Second part shown in [Abiteboul et al., 1995], chapter 17.

Page 77: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

73/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Parity and FO+�+, with order

TheoremFO+�+ can compute if a relation that contains a totalorder has an even number of elements.

Proof.Explicit construction.

Page 78: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

74/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

More general results

TheoremFO+�+ captures PTIME on databases including a totalorder relation of the domain.

FO+� captures PSPACE on databases including a totalorder relation of the domain.

Proof.Simulation of polynomial-time or polynomial-spacedeterministic Turing machines with a fixpoint computation.

Page 79: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

75/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Data complexity (bis)

TheoremEvaluating FO+�+ is PTIME-complete in data complexity.Evaluating FO+� is PSPACE-complete in data complexity.

Proof.Hardness comes from descriptive complexity on orderedstructures.

Page 80: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

76/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of QueriesContainment and EquivalenceConjunctive QueriesRelational Calculus

Conclusion

Page 81: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

77/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of QueriesContainment and EquivalenceConjunctive QueriesRelational Calculus

Conclusion

Page 82: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

78/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Query optimization

� Goal: Given a query q in some query language Q and adatabase D, find a query equivalent to q on D and faster toevaluate on D

� Here: Q in the relational calculus (or a fragment thereof),and one looks for a query faster on whatever database (wedo not look D, we perform static analysis)� In actual RDBMSs: Q is the set of query execution plans(a specialization of the relational algebra whereimplementations are chosen for each operator) andstatistics on D are used

Page 83: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

79/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Global optimization

� We consider global optimization techniques, considering aquery in its entirety (techniques on execution plans aremore local, e.g., local rewritings)� We formally define:Equivalence: q � q0 if for all database D, q(D) = q0(D)

Minimality: q0 is the “best” query equivalent to q in Q

Page 84: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

80/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Containment and equivalence

DefinitionA query q is contained in a query q0 (denoted q v q0) if for alldatabase D, q(D) � q0(D)

Propositionq � q0 iff q v q0 and q0 v q.

Proof.Immediate.

Page 85: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

80/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Containment and equivalence

DefinitionA query q is contained in a query q0 (denoted q v q0) if for alldatabase D, q(D) � q0(D)

Propositionq � q0 iff q v q0 and q0 v q.

Proof.Immediate.

Page 86: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

81/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of QueriesContainment and EquivalenceConjunctive QueriesRelational Calculus

Conclusion

Page 87: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

82/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Case of the conjunctive queries

� We consider conjunctive queries (CQ) of the form:

q(x) 9y : R1(z1) ^ � � � ^Rn(zn)

where each zi is a tuple of variables among x and y, andwhere each xj appears at least in one zj

� Set semantics: for all database D, q(D) is a finite set oftuples

Page 88: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

83/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Homomorphism

DefinitionA homomorphism from a CQ q to a CQ q0 is a function � fromthe variables x;y of q fo the variables x0;y0 of q0 such that:� �(x) = x0

� for every atom R(zi) of q, there exists an atom R(z0i0) of q0

such that �(zi) = z0i0

DefinitionA homomorphism is an isomorphism if it is one-to-one and itsconverse is a homomorphism.

Page 89: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

84/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Instance associated to a query

DefinitionFor all conjunctive query

q(x) 9y : R1(z1) ^ � � � ^Rn(zn)

one can construct the instance associated to q, denoted Iq,where the active domain is f az j z 2 x [ y g and which isformed of the n tuples R(azi1;:::;zik) for R(zi1; : : : ; zik) atom of q

PropositionFor all CQs q(x), q0(x0), there exists a homomorphismfrom q to q0 iff (ax0

1; : : : ; ax0

j) 2 q(Iq0).

Page 90: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

84/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Instance associated to a query

DefinitionFor all conjunctive query

q(x) 9y : R1(z1) ^ � � � ^Rn(zn)

one can construct the instance associated to q, denoted Iq,where the active domain is f az j z 2 x [ y g and which isformed of the n tuples R(azi1;:::;zik) for R(zi1; : : : ; zik) atom of q

PropositionFor all CQs q(x), q0(x0), there exists a homomorphismfrom q to q0 iff (ax0

1; : : : ; ax0

j) 2 q(Iq0).

Page 91: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

85/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Homomorphism theorem

Theorem ([Chandra and Merlin, 1977])For all CQs q, q0, q v q0 iff there exists a homomorphismfrom q0 to q.

Page 92: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

86/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Minimal query

DefinitionA conjunctive query is minimal if it has a minimal number ofatoms among all equivalent conjunctive queries.

� Translation of a CQ to an algebra query: if there are natoms, we obtain n� 1 joins� Joins are the most costly operations of the relational

algebra (bar cross products)� Finding a minimal query amounts to global optimization

Page 93: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

86/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Minimal query

DefinitionA conjunctive query is minimal if it has a minimal number ofatoms among all equivalent conjunctive queries.

� Translation of a CQ to an algebra query: if there are natoms, we obtain n� 1 joins� Joins are the most costly operations of the relationalalgebra (bar cross products)� Finding a minimal query amounts to global optimization

Page 94: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

87/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Unicity of minimal queryProposition ([Chandra and Merlin, 1977])Let q be a CQ. Then there exists a CQ q0 obtained byremoving atoms from q which is minimal.

Proof.Consider a minimal query equivalent to q and apply thehomomorphism theorem.

Proposition ([Chandra and Merlin, 1977])Let q, q0 be two equivalent minimal CQs. Then there existsan isomorphism from q to q0.

Proof.Apply the homomorphism theorem. The image by thehomomorphism is an equivalent minimal query.

Page 95: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

88/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Minimization algorithm

Apply the following procedure to minimize a query:For every atom of the query, test if there exists anequivalent query not containing this atom, and thusif there exists a homomorphism sending this atom toanother atom of the query. If so, delete it, and con-tinue until obtaining an equivalent minimal query.

Page 96: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

89/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Complexity issues

PropositionThe following problems are NP-complete:� given two CQs q, q0, determine whether q v q0

� given two CQs q, q0, determine whether q � q0

� given a CQ q, determine if q is non-minimal

Proof.NP-hardness is by reduction from 3-colorability, as forcombined complexity of query evaluation. Membership in NP isdirect.

NP-hard. . . in the queries. Queries may be small enough sothat an exponential algorithm may not be an issue.

Page 97: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

89/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Complexity issues

PropositionThe following problems are NP-complete:� given two CQs q, q0, determine whether q v q0

� given two CQs q, q0, determine whether q � q0

� given a CQ q, determine if q is non-minimal

Proof.NP-hardness is by reduction from 3-colorability, as forcombined complexity of query evaluation. Membership in NP isdirect.

NP-hard. . . in the queries. Queries may be small enough sothat an exponential algorithm may not be an issue.

Page 98: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

90/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Bag semantics[Chaudhuri and Vardi, 1993]

� In practice, RDBMSs implement a bag semantics� Two queries in bag semantics are equivalent if and only if

they are isomorphic (intuitively, because two similar butnon isomorphic queries can introduce a different number ofresults)

� Query containment: �P2 -hard claimed (but not proved).

Decidability (and precise complexity if decidable): open!

Page 99: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

91/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of QueriesContainment and EquivalenceConjunctive QueriesRelational Calculus

Conclusion

Page 100: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

92/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Satisfiability in the relational calculus

DefinitionA Boolean relational calcululs query q is satisfiable if thereexists a (finite) database D such that D j= q.

Theorem ([Trakhtenbrot, 1963])Satisfiability of the relational calculus (in the finite case) isundecidable.

Proof.Reduction possible from the POST correspondence problem,technical, see [Abiteboul et al., 1995].

Page 101: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

92/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Satisfiability in the relational calculus

DefinitionA Boolean relational calcululs query q is satisfiable if thereexists a (finite) database D such that D j= q.

Theorem ([Trakhtenbrot, 1963])Satisfiability of the relational calculus (in the finite case) isundecidable.

Proof.Reduction possible from the POST correspondence problem,technical, see [Abiteboul et al., 1995].

Page 102: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

93/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Containment and equivalence of the calculus

TheoremContainment and equivalence of relational calculus queriesare undecidable and co-recursively enumerable.

Proof.Undecidability is by direct reduction from the undecidability ofsatisfiability.Co-recursive enumerability is shown directly, by enumeratingpossible counter-examples.

Page 103: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

93/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Containment and equivalence of the calculus

TheoremContainment and equivalence of relational calculus queriesare undecidable and co-recursively enumerable.

Proof.Undecidability is by direct reduction from the undecidability ofsatisfiability.Co-recursive enumerability is shown directly, by enumeratingpossible counter-examples.

Page 104: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

94/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 105: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

95/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Plan

Introduction

Introduction to The Relational Model

Recursive Queries

Complexity of Query Evaluation

Static Analysis of Queries

Conclusion

Page 106: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

96/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Database theory: a rich field of research

� Rich connections with various areas of theoreticalcomputer science and mathematics: logics, algorithms,graph theory, automata theory, typing theory, algebra, etc.� This lecture: very high-level overview, no time for proofs,some quite interesting� Further lectures at EPIT: more on efficient queryevaluation, XML and graph databases, descriptivecomplexity, logical independence, uncertainty in databases,connections with description logics� Plenty of other interesting topics: data exchange andintegration, probabilistic databases, theory of indexingstructures, ranking and top-k queries. . .

Page 107: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

97/97

Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion

Reference

� Main database theory textbook: [Abiteboul et al., 1995]

� In this lecture: (part of) chapters 3, 4, 5, 6, 12, 14, 17

Page 108: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

Bibliographie I

Serge Abiteboul, Richard Hull, and Victor Vianu. Foundationsof Databases. Addison-Wesley, 1995.

Ashok K. Chandra and Philip M. Merlin. Optimalimplementation of conjunctive queries in relational databases. In Proceedings of the 9th Annual ACM Symposiumon Theory of Computing, May 4-6, 1977, Boulder,Colorado, USA, pages 77–90, 1977. doi:10.1145/800105.803397. URLhttp://doi.acm.org/10.1145/800105.803397.

Surajit Chaudhuri and Moshe Y. Vardi. Optimization of Realconjunctive queries. In Proceedings of the Twelfth ACMSIGACT-SIGMOD-SIGART Symposium on Principles ofDatabase Systems, May 25-28, 1993, Washington, DC,USA, pages 59–70, 1993. doi: 10.1145/153850.153856. URLhttp://doi.acm.org/10.1145/153850.153856.

Page 109: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

Bibliographie II

Edgar F. Codd. Relational completeness of data basesublanguages. In Data Base Systems. Courant ComputerScience Symposium, 1972.

Anthony C. Klug. Equivalence of relational algebra andrelational calculus query languages having aggregatefunctions. J. ACM, 29(3):699–717, 1982.

Leonid Libkin. Expressive power of SQL. Theor. Comput.Sci., 296(3):379–404, 2003.

Leonid Libkin. Elements of Finite Model Theory. Texts inTheoretical Computer Science. An EATCS Series. Springer,2004. doi: 10.1007/978-3-662-07003-1. URLhttp://dx.doi.org/10.1007/978-3-662-07003-1.

Page 110: Introduction to Database Theory · Introduction Relational Model Recursive Queries Query Evaluation Static Analysis Conclusion MajortypesofDBMSs Relational(RDBMS).Tables,complexqueries(SQL),rich

Bibliographie III

Robert Endre Tarjan and Mihalis Yannakakis. Simplelinear-time algorithms to test chordality of graphs, testacyclicity of hypergraphs, and selectively reduce acyclichypergraphs. SIAM J. Comput., 13(3):566–579, 1984. doi:10.1137/0213035. URLhttp://dx.doi.org/10.1137/0213035.

Boris A. Trakhtenbrot. Impossibility of an algorithm for thedecision problem in finite classes. American MathematicalSociety Translations Series 2, 23:1–5, 1963.

Mihalis Yannakakis. Algorithms for acyclic database schemes.In VLDB, pages 82–94, 1981.