motivation for datalog

55
1 Motivation for Datalog

Upload: gareth

Post on 16-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Motivation for Datalog. Motivation (1). We have a relation Bus(from, to). Consider the following 2 queries:. SELECT DISTINCT B1.from, B2.to FROM Bus B1, Bus B2 WHERE B1.to = B2.from;. What do these queries compute?. SELECT DISTINCT B1.from, B2.to FROM Bus B1, Bus B2, Bus B3 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Motivation for Datalog

1

Motivation for Datalog

Page 2: Motivation for Datalog

2

Motivation (1)

SELECT DISTINCT B1.from, B2.to

FROM Bus B1, Bus B2

WHERE B1.to = B2.from;

We have a relation Bus(from, to). Consider the following 2 queries:

SELECT DISTINCT B1.from, B2.to FROM Bus B1, Bus B2, Bus B3 WHERE B1.to = B2.from and

B1.to = B3.from;

What do these queries compute?

Page 3: Motivation for Datalog

3

Query Equivalence

• From looking carefully we can conclude that the queries always return the same values.

• Wouldn’t it be nice if any time someone wrote the second query in a database, the first one would be computed instead? (With one less join!!)

• Problem: Given a query Q, how can we find the most efficient query Q’ that is equivalent to Q?

Page 4: Motivation for Datalog

4

Motivation (2)

SELECT S.sid, R.bid

FROM Sailors S, Reserves R

WHERE S.sid = R.sid;

Suppose that we computed the first 2 queries. Can we used its results in order to compute the third query?

SELECT *

FROM Boats B

WHERE color = ‘red’;

SELECT DISTINCT S.sid

FROM Sailors S, Reserves R, Boats B

WHERE S.sid = R.sid and R.bid = B.bid and

B.color = ‘red’;

Page 5: Motivation for Datalog

5

View Usability

• We can use the first 2 queries to return the third.

• Computing the third query using the results of the previous 2 is more efficient then computing it from scratch.

• Problem: Given computed queries V1, ..., Vk and a new query Q, can we compute Q using only the results of V1, ..., Vk?

Page 6: Motivation for Datalog

6

Query Language Formalism

• We need a formalism for a query language that allows use to make such analyses.

== Datalog (Similar to First Order Logic)

Page 7: Motivation for Datalog

7

Datalog Language

Page 8: Motivation for Datalog

8

Datalog Program

• A Datalog program is a set of rules of the form:

p(X1,...,Xn) :- a1(Y1,...,Ym), ..., ak(Z1,...,Zj)

• Example:

ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)

ShortTrip(X, Y) :- Bus(X, Y)

Head of the Rule

Body of the Rule

Page 9: Motivation for Datalog

9

Some Definitions

• An atom has the form p(Y1,...,Ym)• In the atom above, p is a predicate symbol• A ground atom is an atom that has only

constants as arguments. For example:– Bus(‘Jerusalem’, ‘Tel Aviv’) is a ground atom– Bus(‘Jerusalem’, X) is not a ground atom– Bus(Y, X) is not a ground atom

• A Datalog rule has a set of atoms in its body and a single atom in its head

Page 10: Motivation for Datalog

10

More Definitions

• A relation is a set of ground atoms for the same predicate symbol. For example:– {Bus(‘Jerusalem’, ‘Tel Aviv’), Bus(‘Tel Aviv’, ‘Haifa’),

Bus(‘Ashdod’, ‘Haifa’)} is a relation for the predicate symbol Bus

• A database is a set of ground atoms. For example:– {Bus(‘Jerusalem’, ‘Tel Aviv’), Bus(‘Tel Aviv’, ‘Haifa’),

Bus(‘Ashdod’, ‘Haifa’), Flight(‘Ben Gurion’, ‘Paris’) }

Page 11: Motivation for Datalog

11

EDB and IDB Predicates

• Given a Datalog program there are 2 types of predicates:– EDB: These are predicates that only appear in the

body of rules– IDB: These are predicates that appear in the head of

at least one rule

• Intuition– EDB: Represent relations in the database– IDB: Represent relations computed from the database

Page 12: Motivation for Datalog

12

EDB and IDB Example

ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)

ShortTrip(X, Y) :- Bus(X, Y)

LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y, Z)

LongTrip(X,Z) :- ShortTrip(X,Y), ShortTrip(Y,Z)

Question: Which predicates are EDB? Which are IDB?

Page 13: Motivation for Datalog

13

More Definitions

• An assignment is a mapping of variables to variables and constants. Assignments can be applied to atoms.

• Example: Bus(X,Y)– if f(X) = ‘Jerusalem’, f(Y) = ‘Haifa’, then

f(Bus(X,Y)) is Bus(‘Jerusalem’, ‘Haifa’)– if g(X) = Z, g(Y) = Z, then g(Bus(X,Y)) is Bus(Z, Z)– if h(X) = Z, h(Y) = ‘Haifa’, then h(Bus(X,Y)) is

Bus(Z, ‘Haifa’)

Page 14: Motivation for Datalog

14

Applying Assignments

• An assignment can also be applied to a rule. An assignment is applied to a rule by applying it to each atom in the rule

• Example: r: ShortTrip(X, Y) :- Bus(X, Y)– if f(X) = ‘Lod’, f(Y) = ‘Haifa’, then f(r) is

ShortTrip(‘Lod’, ‘Haifa’) := Bus(‘Lod’, ‘Haifa’)

• Notation: We sometimes write a rule as H:-B. The application of f to this rule is f(H):-f(B)

Page 15: Motivation for Datalog

15

Computing a Datalog Program

• A set of Datalog rules is called a program.

• We can compute a program, given a database that contains ground atoms only for the EDB predicates in the program.

Page 16: Motivation for Datalog

16

Computing a Datalog Program

Compute(P,D)• Result := D• While there are changes to Result do

– If there is a rule H:-B in P, and an assignment f to the variables in H and B, such that the all the atoms in f(B) are in Result, then

Result := Result f(H)

Page 17: Motivation for Datalog

17

Example

Program:

ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)

LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:

{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Page 18: Motivation for Datalog

18

Before While Loop

Program:ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Result:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Page 19: Motivation for Datalog

19

Iteration 1 of While Loop

Program:ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Result:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’), ShortTrip(‘Lod’, ‘Tel Aviv’)}

Rule 1:X=‘Lod’Y=‘Haifa’Z=‘Tel Aviv’

Page 20: Motivation for Datalog

20

Iteration 2 of While Loop

Program:ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Result:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’), ShortTrip(‘Lod’, ‘Tel Aviv’), LongTrip(‘Lod’, ‘Eilat’)}

Rule 2:X=‘Lod’Y=‘Tel Aviv’Z=‘Eilat’

Page 21: Motivation for Datalog

21

Iteration 3 of While Loop

Program:ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Result:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’), ShortTrip(‘Lod’, ‘Tel Aviv’), LongTrip(‘Lod’, ‘Eilat’), ShortTrip(‘Haifa’, ‘Eilat’)}

Rule 1: X=‘Haifa’Y=‘Tel Aviv’Z=‘Eilat’

Page 22: Motivation for Datalog

22

Finished!

Program:ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z)

Database:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’)}

Result:{Bus(‘Lod’, ‘Haifa’), Bus(‘Haifa’, ‘Tel Aviv’),

Bus(‘Tel Aviv’, ‘Eilat’), ShortTrip(‘Lod’, ‘Tel Aviv’), LongTrip(‘Lod’, ‘Eilat’), ShortTrip(‘Haifa’, ‘Eilat’)}

Page 23: Motivation for Datalog

23

Understanding the Intuition

• A rule of the form H:-B means

If B is true then H is true• Given the relation Sailors(sname, sid, rating,

age), the following query finds the names of all the sailors:

name(n):-Sailors(n, i, r, a)

Page 24: Motivation for Datalog

24

Understanding the Intuition

• How can we find the names of the Sailors who have the same rating as their age?

• What does the following rule compute?

name(sn):-Sailors(sn, si, r, a), Reserves(si, bi, d),

Boats(bi, bn, ‘red’)

Page 25: Motivation for Datalog

25

Unsafe Rules

• How can we compute the following rule?

CanGo(X, Y):- Bus(X, ‘Jerusalem’)• Suppose our database is the fact

{Bus(‘Haifa’, ‘Jerusalem’)}• By definition, our result can contain:

{CanGo(‘Haifa’, ‘Jerusalem’), CanGo(‘Haifa’,’Lod’),

CanGo(‘Haifa’,’Taiwan’)....}

Page 26: Motivation for Datalog

26

The Problem

• We can assign Y any value. It does not depend on the facts in the database. The values returned depend only on the domain to which we are mapping.

• The active domain of a program P, given a database D is the set of constants appearing in P and D. We denote this set by:

Active(P,D)

Page 27: Motivation for Datalog

27

The Solution

• Definition: A Datalog program P is domain independent if for all databases D, the result of computing P with respect to a domain containing Active(P,D) is the same as the result of computing P with respect to Active(P,D).

• Intuition: If a program is domain independent we only have to try assignments that map variables to constants in the Active domain. Nothing else will yield additional results.

Page 28: Motivation for Datalog

28

• Safety is a syntactic rule that ensures domain independence.

• Definition: A Datalog rule is safe if every variable appearing in its head also appears in an atom in its body

We will only consider safe programs

Domain Independent Programs

Safety vs. Domain Independence

Safe Programs

Page 29: Motivation for Datalog

29

Safe Rules: Examples

• Safe:– CanGo(X, Y):- Bus(X, Y)– CanGo(X, Z):- Bus(X, Y), CanGo(Y,Z)– CanGo(‘Haifa’, ‘Haifa’). – CanBuy(X):- ForSale(X), X < 200

• Unsafe:– CanGo(X, Y):- Bus(X, ‘Jerusalem’)– CanGo(X, X).– CanBuy(X):- X < 200

Note that this is a fact, i.e., a rule without a body

Page 30: Motivation for Datalog

30

Safe Rules - Algorithm

• For safe rules, the algorithm on Slide 16 is finite, since it is enough to try assignments that map variables to constants in the database.

• Otherwise, the algorithm would be infinite.

We only consider safe rules

Page 31: Motivation for Datalog

31

Dependency Graph and Recursion

• A dependency graph is a graph that models the way that predicates depend on themselves.

• Given a program P, the dependency graph of P has:– a node for each predicate in P– an edge from a predicate p to a predicate q if there

is a rule with q in the head and p in the body

• A recursive predicate in a program P is a predicate that is in a cycle in P’s dependency graph

Page 32: Motivation for Datalog

32

Example (1)

CanGo(X, Y):- Bus(X, Y)

CanGo(X, Z):- Bus(X, Y), CanGo(Y,Z)

Bus

CanGo• CanGo is recursive

• Bus is not recursive

What does this program compute?

Page 33: Motivation for Datalog

33

Example (2)

p(X):- r(X), q(X)

q(X):- r(X), p(X)

r

q• Which predicates are recursive?

• What does this program compute?

p

Page 34: Motivation for Datalog

34

Expressiveness:Datalog vs. Relational Algebra• We can express queries in Datalog that are

not expressible in Relational Algebra.• Example: Transitive closure. (See CanGo

predicate)• This is possible because of recursion.• Now we will consider only non-recursive

programs.• In this case can we translate queries between

Datalog and relational algebra?

Page 35: Motivation for Datalog

35

Translating RA to Datalog

• We start by translating RA queries with SELECT, PROJECT, TIMES, UNION (without MINUS).

• Lemma: Every relational algebra expression produces the same relation as some relational algebra expression whose selections are only of the form XY where is an arithmetic comparison operator.

Page 36: Motivation for Datalog

36

Example

• Consider: ¬($1=$2 and ($1<$3 or $2<$3)) (R)

• Remember DeMorgan’s laws:– ¬(X and Y) = ¬X or ¬Y– ¬(X or Y) = ¬X and ¬Y

• So, the expression above is equivalent to ¬($1=$2) or ¬($1<$3 or $2<$3) (R) =

¬($1=$2) or (¬$1<$3 and ¬$2<$3) (R) =

($1<>$2) or ($1>=$3 and $2>=$3) (R)

Page 37: Motivation for Datalog

37

Example (continued)

• Now, or because union and and becomes composition of select. So:

($1<>$2) or ($1>=$3 and $2>=$3) (R) =

($1<>$2) (R) U ($1>=$3 and $2>=$3) (R) =

($1<>$2) (R) U ($1>=$3) ( ($2>=$3) (R))

We did it! From now on we assume all RA expressions are of this form

Page 38: Motivation for Datalog

38

Translating RA to Datalog (1)

• Theorem: Every query expressible in RA without minus is expressible in a non-recursive Datalog program.

• Proof: By induction on j the number of operators in the query.– Base j=0: The query is a relation R. Then R is an

EDB expression and is “available” without any rules.

Page 39: Motivation for Datalog

39

Translating RA to Datalog (2)

• Assume for queries with j operators. We show for j+1:

• Case 1: The expression is E = E1 U E2 . Then, by the inductive hypothesis there are predicates e1 and e2 defined by non-recursive Datalog rules whose relations are the same as E1 and E2. Suppose that they have arity n. Then for E we have the rules:

e(X1,...,Xn) :- e1 (X1,...,Xn)

e(X1,...,Xn) :- e2 (X1,...,Xn)

Page 40: Motivation for Datalog

40

Translating RA to Datalog (3)

• Case 2: E=E1 x E2 . Then, there are e1 and e2 as before. Suppose that e1 has arity n and e2 has arity m. Then for E we have the rule:

e(X1,...,Xn+m) :- e1 (X1,...,Xn), e2 (Xn+1,...,Xn+m)

• Case 3: E= $i $j (E1). Then, there is e1 as before. Suppose that the arity of e1 is n. Then, for E we have the rule:

e(X1,...,Xn) :- e1 (X1,...,Xn), Xi Xj

Page 41: Motivation for Datalog

41

Translating RA to Datalog (4)

• Case 4: E= i1,..,ik (E1). Then, there is an e1 as

before. Suppose that e1 has arity n. Then for E we have the rule:

e(Xi1,...,Xik

) :- e1 (X1,...,Xn)

�• We can prove that with the class of Datalog

queries seen so far we can’t express MINUS. • We introduce negation in the queries which

will allow us to deal with MINUS.

Page 42: Motivation for Datalog

42

Translation Example

• Query: Boat ids of red and green boats:

• In RA:

• In Datalog:

Page 43: Motivation for Datalog

43

Negation

• We allow negated atoms in the body of a query.

• New safety rule: All variables in the query must also appear in non-negated atoms in the body.

• Example:

CanBuy(X,Y):- Likes(X,Y), ¬Broke(X) Bachelor(X):- Male(X), ¬Married(X, Y)

Page 44: Motivation for Datalog

44

Topological Ordering

• Before we explain how Datalog rules with negation are computed, we recall how to find a topological ordering of the variables in a graph.

• Definition: A topological ordering of the nodes of a graph G is an ordering of the nodes in G such that if there is an edge from n to m, then n is before m in the ordering.

• Fact: Every acyclic graph has a topological ordering

Page 45: Motivation for Datalog

45

s

Finding a Topological Ordering

• Algorithm: Find a node n with no incoming edges. Make n the first node in the ordering. Remove n and its out-coming edges. Continue recursively.

• Example:

Ordering: r, t, q, p, s

r

p q

t

Page 46: Motivation for Datalog

46

Notation

• We introduce some notation before presenting the algorithm. Suppose that H:-B is a rule, possibly with negated atoms. – Pos(B): the non-negated atoms in B– Neg(B): the negated atoms in B

• Suppose that P is a program. – IDB(P) are the IDB predicated in P– Dep(P) is the dependency graph of P

Page 47: Motivation for Datalog

47

Computing Datalog Programs with Negation

Compute(P,D)• Let Q be an ordering of IDB(P) determined by a

topological sort of dep(P).• Result := D• While Q is not empty

– r := Q.dequeue();– While there is a rule H:-B in P with r in its head and there is an

assignment f to the variables in H and B, such that f(Pos(B)) is contained in Result and there is no atom in f(Neg(B)) that is in Result, then

Result := Result f(H)

Page 48: Motivation for Datalog

48

Example

Program:

ShortTrip (X, Y) :- Bus(X,Y)

ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)

LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z),¬ShortTrip(X, Z)

Database: {Bus(1, 2), Bus(2, 3), Bus(3, 4)}

Topological Sort of IDB: ShortTrip, LongTrip

Page 49: Motivation for Datalog

49

Before Outer While Loop

Program:

ShortTrip (X, Y) :- Bus(X,Y)

ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)

LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z),¬ShortTrip(X, Z)

Database: {Bus(1, 2), Bus(2, 3), Bus(3, 4)}

Result: {Bus(1, 2), Bus(2, 3), Bus(3, 4)}

Page 50: Motivation for Datalog

50

Iteration for Predicate ShortTrip

Program:ShortTrip (X, Y) :- Bus(X,Y)ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z),¬ShortTrip(X, Z)

Database: {Bus(1, 2), Bus(2, 3), Bus(3, 4)}

Result: {Bus(1, 2), Bus(2, 3), Bus(3, 4), ShortTrip(1, 2), ShortTrip(2, 3), ShortTrip(3, 4), ShortTrip(1, 3), ShortTrip(2,4)}

Page 51: Motivation for Datalog

51

Iteration for Predicate LongTrip

Program:ShortTrip (X, Y) :- Bus(X,Y)ShortTrip(X, Z) :- Bus(X, Y), Bus(Y, Z)LongTrip(X,Z) :- ShortTrip(X,Y), Bus(Y,Z),¬ShortTrip(X, Z)

Database: {Bus(1, 2), Bus(2, 3), Bus(3, 4)}

Result: {Bus(1, 2), Bus(2, 3), Bus(3, 4), ShortTrip(1, 2), ShortTrip(2, 3), ShortTrip(3, 4), ShortTrip(1, 3), ShortTrip(2,4), LongTrip(1, 4)}

Page 52: Motivation for Datalog

52

Translating RA to Datalog (5)

• We can now translate RA queries with MINUS.

• Case 5: The expression is E = E1 — E2 . Then, by the inductive hypothesis there are predicates e1 and e2 defined by non-recursive Datalog rules whose relations are the same as E1 and E2. Suppose that they have arity n. Then for E we have the rule:

e(X1,...,Xn) :- e1 (X1,...,Xn), ¬e2 (X1,...,Xn) �

Page 53: Motivation for Datalog

53

Expressiveness (So Far)

• We have shown that every RA query can be expressed as a non-recursive Datalog program with negation.

• Can we express every non-recursive Datalog program with negation as an RA query?

• Yes. We will prove this now.

Page 54: Motivation for Datalog

54

Translating Datalog to RA (1)

• We start by showing how to translate rules without negative atoms.

• We take a topological ordering p1...pn of the nodes in the dependency graph and compute relations for pi in that order, knowing that all the relations for the predicates in the body have been computed.

Page 55: Motivation for Datalog

55

Translating Datalog to RA (2)

• Basic Idea: To compute a relation for pi:

– For each rule r with pi at its head, compute the relation corresponding to the body of r. This relation has one field for each variable in the body.

– We create the relation for itself by taking the projection of the body onto the components in the head.

– We take a UNION over all rules with pi in the head