chapter 2: manipulating query expressions
DESCRIPTION
CHAPTER 2: MANIPULATING QUERY EXPRESSIONS. PRINCIPLES OF DATA INTEGRATION. ANHAI DOAN ALON HALEVY ZACHARY IVES. Introduction. How does a data integration system decide which sources are relevant to a query? Which are redundant? How to combine multiple sources to answer a query? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/1.jpg)
ANHAI DOAN ALON HALEVY ZACHARY IVES
CHAPTER 2: MANIPULATING QUERY EXPRESSIONS
PRINCIPLES OF
DATA INTEGRATION
![Page 2: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/2.jpg)
Introduction
How does a data integration system decide which sources are relevant to a query? Which are redundant? How to combine multiple sources to answer a query?
Answer: by reasoning about the contents of data sources. Data sources are often described by queries / views.
This chapter describes the fundamental tools for manipulating query expressions and reasoning about them.
![Page 3: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/3.jpg)
Outline
Review of basic database concepts Query unfolding Query containment Answering queries using views
![Page 4: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/4.jpg)
Basic Database Concepts
Relational data model Integrity constraints Queries and answers Conjunctive queries Datalog
![Page 5: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/5.jpg)
Relational Terminology
Relational schemas Tables, attributes
Relation instances Sets (or multi-sets) of tuples
Integrity constraints Keys, foreign keys, inclusion dependencies
![Page 6: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/6.jpg)
PName Price Category Manufacturer
Gizmo $19.99 Gadgets GizmoWorks
Powergizmo $29.99 Gadgets GizmoWorks
SingleTouch $149.99 Photography Canon
MultiTouch $203.99 Household Hitachi
Product
Attribute names
Tuples or rows
Table/relation name
![Page 7: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/7.jpg)
SQL (very basic)
Interview: candidate, date, recruiter, hireDecision, grade
EmployeePerf: empID, name, reviewQuarter, grade, reviewer
select recruiter, candidatefrom Interview, EmployeePerfwhere recruiter=name AND grade < 2.5
![Page 8: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/8.jpg)
Query Answers
Q(D): the set (or multi-set) of rows resulting from applying the query Q on the database D.
Unless otherwise stated, we will consider sets rather than multi-sets.
![Page 9: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/9.jpg)
SQL (w/aggregation)
EmployeePerf: empID, name, reviewQuarter, grade, reviewer
select reviewer, Avg(grade)from EmployeePerfwhere reviewQuarter=“1/2007”
![Page 10: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/10.jpg)
Integrity Constraints (Keys)
A key is a set of columns that uniquely determine a row in the database: There do not exist two tuples, t1 and t2 such that t1 ≠ t2
and t1 and t2 have the same values for the key columns. (EmpID, reviewQuarter) is a key for
EmployeePerf
![Page 11: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/11.jpg)
Integrity Constraints (Functional Dependencies)
A set of attribute A functionally determines a set of attributes B if: whenever , t1 and t2 agree on the values of A , they must also agree on the values of B.
For example, (EmpID, reviewQuarter) functionally determine (grade).
Note: a key dependency is a functional dependency where the key determines all the other columns.
![Page 12: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/12.jpg)
Integrity Constraints (Foreign Keys)
Given table T with key B and table S with key A: A is a foreign key of B in T if whenever a S has a row where the value of A is v, then T must have a row where the value of B is v.
Example: the empID attribute of EmployeePerf is a foreign key for attribute emp of Employee.
![Page 13: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/13.jpg)
General Integrity Constraints
Tuple generating dependencies (TGD’s)
Equality generating dependencies (EGD’s): right hand side contains only equalities.
Exercise: express the previous constraints using general integrity constraints.
![Page 14: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/14.jpg)
Conjunctive Queries
Q(X,T) :- Interview(X,D,Y,H,F), EmployeePerf(E,Y,T,W,Z), W < 2.5.
select recruiter, candidatefrom Interview, EmployeePerfwhere recruiter=name AND grade < 2.5
Joins are expressed with multiple occurrences of the same variable
![Page 15: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/15.jpg)
Conjunctive Queries (interpreted predicates)
Q(X,T) :- Interview(X,D,Y,H,F), EmployeePerf(E,Y,T,W,Z), W < 2.5.
select recruiter, candidatefrom Interview, EmployeePerfwhere recruiter=name AND grade < 2.5
Interpreted (or comparison) predicates. Variables must also appear in regular atoms.
![Page 16: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/16.jpg)
Conjunctive Queries (negated subgoals)
Q(X,T) :- Interview(X,D,Y,H,F), EmployeePerf(E,Y,T,W,Z), OfferMade(X, date).
Safety: every head variable must appear in a positive subgoal.
![Page 17: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/17.jpg)
Unions of Conjunctive Queries
Q(R,C) :- Interview(X,D,Y,H,F), EmployeePerf(E,Y,T,W,Z), W < 2.5.
Q(R,C) :- Interview(X,D,Y,H,F), EmployeePerf(E,Y,T,W,Z), Manager(y), W > 3.9.
Multiple rules with the same head predicate express a union
![Page 18: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/18.jpg)
Datalog (recursion)
Path(X,Y) :- edge(X,Y)
Path(X,Y) :- edge(X,Z), path(Z,Y)
Database: edge(X,Y) – describing edges in a graph.Recursive query finds all paths in the graph.
![Page 19: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/19.jpg)
Outline
Review of basic database concepts Query unfolding Query containment Answering queries using views
![Page 20: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/20.jpg)
Query Unfolding
Query composition is an important mechanism for writing complex queries. Build query from views in a bottom up fashion.
Query unfolding “unwinds” query composition. Important for:
Comparing between queries expressed with views Query optimization (to examine all possible join orders) Unfolding may even discover that the composition of two
satisfiable queries is unsatisfiable! (exercise: find such an example).
![Page 21: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/21.jpg)
Query Unfolding Example
The unfolding of Q3 is:
![Page 22: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/22.jpg)
Query Unfolding Algorithm
Find a subgoal p(X1 ,…,Xn) such that p is defined by a rule r.
Unify p(X1 ,…,Xn) with the head of r.
Replace p(X1 ,…,Xn) with the result of applying the unifier to the subgoals of r (use fresh variables for the existential variables of r).
Iterate until no unifications can be found. If p is defined by a union of r1, …, rn, create n rules,
for each of the r’s.
![Page 23: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/23.jpg)
Query Unfolding Summary
Unfolding does not necessarily create a more efficient query! Just lets the optimizer explore more evaluation strategies. Unfolding is the opposite of rewriting queries using views
(see later).
The size of the resulting query can grow exponentially (exercise: show how).
![Page 24: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/24.jpg)
Outline
Review of basic database conceptsQuery unfolding Query containment Answering queries using views
![Page 25: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/25.jpg)
Query Containment: Motivation (1)
Intuitively, the unfolding of Q3 is equivalent to Q4:
How can we justify this intuition formally?
![Page 26: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/26.jpg)
Query Containment: Motivation (2)
Furthermore, the query Q5 that requires going through two hubs is contained in Q’3
We need algorithms to detect these relationships.
![Page 27: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/27.jpg)
Query Containment and Equivalence: Definitions
Query Q1 contained in query Q2 if for every database D Q1(D) Q2(D)
Query Q1 is equivalent to query Q2 ifQ1(D) Q2(D) and Q2(D) Q1(D)
Note: containment and equivalence are properties of the queries, not of the database!
![Page 28: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/28.jpg)
Notation Apology
Powerpoint does not have square The book uses the square notation for containment,
but the slides use the rounded version.
![Page 29: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/29.jpg)
Reality Check #1
![Page 30: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/30.jpg)
Reality Check #2
![Page 31: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/31.jpg)
Why Do We Need It?
When sources are described as views, we use containment to compare among them.
If we can remove sub-goals from a query, we can evaluate it more efficiently.
Actually, containment arises everywhere…
![Page 32: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/32.jpg)
Reconsidering the Example
Relations: Flight(source, destination) Hub(city)
Views: Q1(X,Y) :- Flight(X,Z), Hub(Z), Flight(Z,Y) Q2(X,Y) :- Hub(Z), Flight(Z,X), Flight(X,Y)
Query: Q3(X,Z) :- Q1(X,Y), Q2(Y,Z)
Unfolding: Q’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(W), Flight(W,Y), Flight(Y,Z)
![Page 33: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/33.jpg)
Remove Redundant Subgoals
Redundant subgoals? Q’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(W), Flight(W,Y), Flight(Y,Z) Q’’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Flight(Y,Z)
Is Q’’3 equivalent to Q’3?
Q’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y) Hub(W), Flight(W,Y), Flight(Y,Z)
![Page 34: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/34.jpg)
Containment: Conjunctive Queries
No interpreted predicates (,)or negation for now.
Recall semantics:if maps the body subgoals to tuples in Dthen, is an answer.
![Page 35: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/35.jpg)
Containment Mappings
: Vars(Q1) Vars(Q2) is a containment mapping if:
and
![Page 36: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/36.jpg)
Example Containment Mapping
Q’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(W), Flight(W,Y), Flight(Y,Z)
Q’’3(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Flight(Y,Z)
Identity mapping on all variables, except:
![Page 37: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/37.jpg)
Theorem[Chandra and Merlin, 1977]
Q1 contains Q2 if and only if there is acontainment mapping from Q1 to Q2.
Deciding whether Q1 contains Q2
is NP-complete.
![Page 38: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/38.jpg)
Proof (sketch)
(if): assume exists:
Let t be an answer to Q2 over database D(i.e., there is a mapping from Vars(Q2) to D)
Consider .
![Page 39: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/39.jpg)
Proof (only-if direction)
Assume containment holds.
Consider the frozen database D’ of Q2. (variables of Q2 are constants in D’).
The mapping from Q1 to D’: containment map.
![Page 40: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/40.jpg)
Frozen Database
Q(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Flight(Y,Z)
Frozen database for Q(X,Z) is:
Flight: {(X,U), (U,Y), (Y,Z)} Hub: {(U)}
![Page 41: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/41.jpg)
Two Views of this Result
Variable mapping: a condition on variable mappings that guarantees
containment
Representative (canonical) databases: We found a (single) database that would offer a
counter example if there was one
Containment results typically fall into one of these two classes.
![Page 42: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/42.jpg)
Union of Conjunctive Queries
Theorem: a CQ is contained in a union ofCQ’s if and only if it is contained in one of the conjunctive queries.
Corollary: containment is still NP-complete.
![Page 43: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/43.jpg)
CQ’s with Comparison Predicates
: Vars(Q1) Vars(Q2):
and
A tweak on containment mappings provides a sufficient condition:
![Page 44: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/44.jpg)
Example Containment Mapping
![Page 45: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/45.jpg)
Containment Mappings are not Sufficient
No containment mapping, but
![Page 46: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/46.jpg)
Query Refinements
We consider the refinements of Q2
The red containment mapping applies for the first refinement and the blue to the second.
![Page 47: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/47.jpg)
Constructing Query Refinements
Consider all complete orderings of the variables and constants in the query.
For each complete ordering, create a conjunctive query.
The result is a union of conjunctive queries.
![Page 48: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/48.jpg)
Complete Orderings
Given a conjunction C of interpreted atoms over a set of variables X1,…,Xn
a set of constants a1,…,am
CT is a complete ordering if: CT |= C, and
![Page 49: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/49.jpg)
Query Refinements
Let CT be a complete ordering of C1
Then:
is a refinement of Q1
![Page 50: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/50.jpg)
Theorem: [Queries with Interpreted Predicates][Klug, 88, van der Meyden, 92]
Q1 contains Q2 if and only if there is acontainment mapping from Q1 to everyrefinement of Q2.
Deciding whether Q1 contains Q2
is
In practice, you can do much better (see CQIPContainment Algorithm in Sec. 2.3.4
![Page 51: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/51.jpg)
Queries with Negation
Queries assumed safe: every head variable appears in a positive sub-goal in the body.Revised containment mappings: map negative subgoals in Q1
to negative subgoals in Q2. Sufficient condition, but not necessary.
![Page 52: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/52.jpg)
Containment with no Containment Mapping
x zy
a b c d
![Page 53: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/53.jpg)
Theorem [Queries with Negation]
B: total number of variables and constants in Q2.
Q1 contains Q2 if and only ifQ1(D)Q2(D) for all databases D withat most B constants.
Deciding whether Q1 contains Q2
is
![Page 54: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/54.jpg)
Bag Semantics
Origin Destination Departure Time
SF Seattle 8AM
SF Seattle 10AM
Seattle Anchorage 1PM
Set semantics: {(SF, Anchorage)}
Bags: {(SF, Anchorage), (SF, Anchorage)}
![Page 55: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/55.jpg)
Theorem [Conjunctive Queries, Bag Semantics]
Q1 is equivalent to Q2 if and only ifthere is a 1-1 containment mapping.
Trivial example on non-equivalence:
Query containment?
![Page 56: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/56.jpg)
Grouping and Aggregation
Count queries are sensitive to multiplicity Max queries are not sensitive to multiplicity and many more results (see Section 2.3.6).
![Page 57: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/57.jpg)
Outline
Review of basic database conceptsQuery unfoldingQuery containment Answering queries using views
![Page 58: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/58.jpg)
Motivating Example (Part 1)
Movie(ID,title,year,genre)Director(ID,director)Actor(ID, actor)
Containment is enough to show that V1 can be used to answer Q.
![Page 59: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/59.jpg)
Motivating Example (Part 2)
Containment does not hold, but intuitively, V2 and V3 are useful for answering Q.
How do we express that intuition?
Answering queries using views!
![Page 60: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/60.jpg)
Problem Definition
Input: Query Q View definitions: V1,…,Vn
A rewriting: a query Q’ that refers onlyto the views and interpreted predicates
An equivalent rewriting of Q using V1,…,Vn: a rewriting Q’, such that Q’ Q.
![Page 61: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/61.jpg)
Motivating Example (Part 3)
Movie(ID,title,year,genre)Director(ID,director)Actor(ID, actor)
maximally-contained rewriting
![Page 62: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/62.jpg)
Maximally-Contained Rewritings
Input: Query Q Rewriting query language L View definitions: V1,…,Vn
Q’ is a maximally-contained rewriting ofQ given V1,…,Vn and L if:
1. Q’ L, 2. Q’ Q, and3. there is no Q’’ in L such that Q’’ Q and Q’ Q’’
![Page 63: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/63.jpg)
Motivation (in words)
LAV-style data integration Need maximally-contained rewritings
Query optimization Need equivalent rewritings Implemented in most commercial DBMS
Physical database design Describe storage structures as views
![Page 64: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/64.jpg)
Exercise: which of these views can be used to answer Q?
![Page 65: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/65.jpg)
Algorithms for answering queries using views
Step 1: we’ll bound the space of possible query rewritings we need to consider (no interpreted predicates)
Step 2: we’ll find efficient methods for searching the space of rewritings Bucket Algorithm, MiniCon Algorithm
Step 2b: we consider “logical approaches” to the problem: The Inverse-Rules Algorithm
We’ll consider interpreted predicates, …
![Page 66: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/66.jpg)
Bounding the Rewriting Length
Query:
Rewriting:
Expansion:
Proof: Only n subgoals in Q can contribute to the image of the containment mapping
Theorem: if there is an equivalent erwriting, there is one with at most n subgoals.
![Page 67: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/67.jpg)
Complexity Result[LMSS, 1995]
Applies to queries with no interpreted predicates. Finding an equivalent rewriting of a query using
views is NP-complete Need only consider rewritings of query length or less.
Maximally-contained rewriting: Union of all conjunctive rewritings of length n or less.
![Page 68: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/68.jpg)
The Bucket Algorithm
Key idea: Create a bucket for each subgoal g in the query. The bucket contains view atoms that contribute to g. Create rewritings from the Cartesian product of the
buckets.
![Page 69: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/69.jpg)
Bucket Algorithm in Action
View atoms that can contribute to Movie: V1(ID,year), V2(ID,A’), V4(ID,D’,year)
![Page 70: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/70.jpg)
The Buckets and Cartesian productMovie(ID,title, year,genre)
Revenues(ID, amount)
Director(ID,dir)
V1(ID,year) V1(ID,Y’) V4(ID,Dir,Y’)
V2(ID,A’) V2(ID,amount)
V4(ID,D’,year)
Consider first candidate rewriting: first V1 subgoal is redundant, and V1 and V4 are mutually exclusive.
![Page 71: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/71.jpg)
Next Candidate Rewriting
Movie(ID,title,
year,genre)
Revenues(ID,amount)
Director(ID,dir)
V1(ID,year) V1(ID,Y’) V4(ID,Dir,Y’)
V2(ID,A’) V2(ID,amount)
V4(ID,D’,year)
![Page 72: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/72.jpg)
The Bucket Algorithm: Summary
Cuts down the number of rewriting that need to be considered, especially if views apply many interpreted predicates.
The search space can still be large because the algorithm does not consider the interactions between different subgoals. See next example.
![Page 73: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/73.jpg)
The MiniCon Algorithm
Intuition: The variable I is not in the head of V5, hence V5 cannot be used in a rewriting.MiniCon discards this option early on, while the Bucket algorithm does not notice the interaction.
![Page 74: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/74.jpg)
MinCon Algorithm Steps
Create MiniCon descriptions (MCDs): Homomorphism on view heads Each MCD covers a set of subgoals in the query with a set
of subgoals in a view Combination step:
Any set of MCDs that covers the query subgoals (without overlap) is a rewriting
No need for an additional containment check!
![Page 75: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/75.jpg)
MiniCon Descriptions (MCDs)An atomic fragment of the ultimate containment mapping
MCD: mapping:
covered subgoals of Q: {2,3}
![Page 76: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/76.jpg)
MCDs: Detail 1
MCD: mapping:
covered subgoals of Q: {2,3}
Need to specialize the view first:
![Page 77: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/77.jpg)
MCDs: Detail 2
MCD: mapping:
covered subgoals of Q still: {2,3}
Note: the third subgoal of the view is not included in the MCD.
![Page 78: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/78.jpg)
Inverse-Rules Algorithm
A “logical” approach to AQUV Produces maximally-contained rewriting in
polynomial time To check whether the rewriting is equivalent to the query,
you still need a containment check. Conceptually simple and elegant
Depending on your comfort with Skolem functions…
![Page 79: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/79.jpg)
Inverse Rules by Example
And the following tuple in V7: V7(79,Manhattan,1979,Comedy)
Then we can infer the tuple: Movie(79,Manhattan,1979,Comedy)Hence, the following ‘rule’ is sound:IN1: Movie(I,T,Y,G) :- V7(I,T,Y,G)
Given the following view:
![Page 80: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/80.jpg)
Skolem Functions
Now suppose we have the tuple V7(79,Manhattan,1979,Comedy)
Then we can infer that there exists some director. Hence, the following rules hold (note that they both use the same Skolem function):
IN2: Director(I,f1(I,T,Y,G)):- V7(I,T,Y,G)IN3: Actor(I,f1(I,T,Y,G)):- V7(I,T,Y,G)
![Page 81: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/81.jpg)
Inverse Rules in GeneralRewriting = Inverse Rules + Query
Given Q2, the rewriting would include:IN1, IN2, IN3, Q2.
Given input: V7(79,Manhattan,1979,Comedy)Inverse rules produce: Movie(79,Manhattan,1979,Comedy) Director(79,f1(79,Manhattan,1979,Comedy)) Actor(79,f1(79,Manhattan,1979,Comedy))
Movie(Manhattan,1979,Comedy)(the last tuple is produced by applying Q2).
![Page 82: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/82.jpg)
Comparing Algorithms
Bucket algorithm: Good if there are many interpreted predicates Requires containment check. Cartesian product can be big
MiniCon: Good at detecting interactions between subgoals
![Page 83: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/83.jpg)
Algorithm Comparison (Continued)
Inverse-rules algorithm: Conceptually clean Can be used in other contexts (see later) But may produce inefficient rewritings because it “undoe
s” the joins in the views (see next slide) Experiments show MiniCon is most efficient. Recently: MiniCon improved by using constraint
satisfaction techniques.
![Page 84: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/84.jpg)
Inverse Rules Inefficiency Example
Now we need to re-compute the join…
![Page 85: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/85.jpg)
View-Based Query Answering
Maximally-contained rewritings are parameterized by query language.
More general question: Given a set of view definitions, view instances and a
query, what are all the answers we can find? We introduce certain answers as a mechanism for
providing a formal answer.
![Page 86: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/86.jpg)
View Instances = Possible DB’s
V8: {Allen, Copolla}V9: {Keaton, Pacino}
Consider the two views:
And suppose the extensions of the views are:
![Page 87: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/87.jpg)
Possible Databases
There are multiple databases that satisfy the above view definitions: (we ignore the first argument of Movie below)
DB1. {(Allen, Keaton), (Coppola, Pacino)}DB2. {(Allen, Pacino), (Coppola, Keaton)}
If we ask whether Allen directed a movie in which Keaton acted, we can’t be sure.
Certain answers are those true in all databases that are consistent with the views and their extensions.
![Page 88: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/88.jpg)
Certain Answers: Formal Definition [Open-world Assumption]
Given: Views: V1,…,Vn
View extensions v1,…vn
A query Q A tuple t is a certain answer to Q under the open-
world assumption if t Q(D) for all databases D such that: Vi(D) vi for all i.
![Page 89: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/89.jpg)
Certain Answers[Closed-world Assumption]
Given: Views: V1,…,Vn
View extensions v1,…vn
A query Q A tuple t is a certain answer to Q under the open-
world assumption if t Q(D) for all databases D such that: Vi(D) = vi for all i.
![Page 90: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/90.jpg)
Certain Answers: Example
V8: {Allen}V9: {Keaton}
Under closed-world assumption:single DB possible (Allen, Keaton)
Under open-world assumption:no certain answers.
![Page 91: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/91.jpg)
The Good News
The MiniCon and Inverse-rules algorithms produce all certain answers Assuming no interpreted predicates in the query (ok to
have them in the views) Under open-world assumption Corollary: they produce a maximally-contained rewriting
![Page 92: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/92.jpg)
In Other News…
Under closed-world assumption finding all certain answers is co-NP hard!
Proof: encode a graph -- G = (V,E)
q has a certain tuple iff G is not 3-colorable
![Page 93: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/93.jpg)
Interpreted Predicates
In the views: no problem (all results hold) In the query Q:
If the query contains interpreted predicates, finding all certain answers is co-NP-hard even under open-world assumption
Proof: reduction to CNF.
![Page 94: CHAPTER 2: MANIPULATING QUERY EXPRESSIONS](https://reader036.vdocuments.net/reader036/viewer/2022062301/56812d8b550346895d92a447/html5/thumbnails/94.jpg)
Summary of Chapter 2
Query containment and answering queries using views are fundamental tools in our arsenal.
In general, they are NP-complete (or worse), but in practice they are not the bottleneck.
Certain answers are the formalism we use to model answers in data integration systems: They capture our knowledge about what tuples must be in
the instance of the mediated schema.