relational algebra for bags · 2013. 2. 25. · datalog • datalog is a logic based query language...
TRANSCRIPT
Relational Algebra for Bags
CSCI 4380 Database Systems
Monday, October 4, 2010
Bags• A bag is a multi-set.
• For a set, {1,2,2,3} = {1,2,3}
• For a bag, {1,2,2,3} ≠{1,2,3}
• There is no specific notation for bags that is universally accepted.
• A bag model for a relation means that a tuple may appear more than once.
Monday, October 4, 2010
Bags and relational algebra• Database implementations allow relations to be
defined as bags:
• there will be multiple copies of tuples if a primary key is not defined.
• Relational algebra implemented in databases uses bag semantics.
• We will now extend relational algebra to bags.
Monday, October 4, 2010
Bag operations
• Given a tuple t appears n times in R, m times in S
• t appears n+m times in R∪S
• t appears min(n,m) times in R∩S
• t appears min(0, n-m) times in R-S
Monday, October 4, 2010
Other operators• Selection, projection, Cartesian product and join are extended
in the usual way.
• In selection, each tuple that passes the condition is put in the output (no duplicate elimination).
• In projection, the columns are removed but there is no duplicate elimination.
• In Cartesian product (RxS), all pairs of tuples from R and S are put in the output.
• Join is simply Cartesian product followed by bag selection.
Monday, October 4, 2010
New Operators• Duplicate elimination (δ (R))
• Removes duplicate tuples
• Extended projection (π (R)) projects
• attributes in relation R in the usual way, but attributes can be repeated
• constant values which creates a new column where each tuple has the constant value for the new column
• arithmetic and string operations involving attributes in R and constants
• attributes can be renamed with operation (➝).
Monday, October 4, 2010
Example
• πA+C➝E, B|D➝F, 2➝G,D,D (R)
A B C D
1 a 6 c
3 d 4 e
5 a 1 c
R
E F G D D
7 ac 2 c c
7 de 2 e e
6 ac 2 c c
Monday, October 4, 2010
Join
• The join operation we have seen up to know is called the inner join.
• The inner join R ⋈c S returns tuples from R and S that satisfy the join condition C.
• All other tuples are omitted.
Monday, October 4, 2010
Inner join
• The marked tuples do not participate in the join as they have no matching tuples in the other relation.
A B C1 a 63 d 45 a 1
C D E6 x 01 y 18 z 26 w 3
R S !"R S
A B C D E1 a 6 x 01 a 6 w 35 a 1 y 1
Monday, October 4, 2010
Outer join
• The unmatched tuples are included in (full) outer join. For attributes with missing values, null values are added (⊥).
A B C1 a 63 d 45 a 1
C D E6 x 01 y 18 z 26 w 3
R S R S
A B C D E1 a 6 x 01 a 6 w 35 a 1 y 13 d 4 ⊥ ⊥⊥ ⊥ 8 z 2
o!"
Monday, October 4, 2010
Left outer join
• Only the unmatched tuples from the left relation are added.
A B C1 a 63 d 45 a 1
C D E6 x 01 y 18 z 26 w 3
R S R S
A B C D E1 a 6 x 01 a 6 w 35 a 1 y 13 d 4 ⊥ ⊥
o!"L
Monday, October 4, 2010
Right outer join
• Only the unmatched tuples from the right relation are added.
A B C1 a 63 d 45 a 1
C D E6 x 01 y 18 z 26 w 3
R S R S
A B C D E1 a 6 x 01 a 6 w 35 a 1 y 1⊥ ⊥ 8 z 2
o!"R
Monday, October 4, 2010
Aggregate operators• It is possible to find the
• sum, min, max, avg (and other functions) of all tuples for an attribute or the result of an arithmetic/string operation over the attributes
• Examples, given R(A,B,C,D,E)
• sum(A), min(B+C), max(C)
• min(D|E)
Monday, October 4, 2010
Aggregate example
• Υsum(A), min(C) R
A B C D1 a 6 c3 d 4 e5 a 1 c
R
sum(A) min(C)9 1
Monday, October 4, 2010
Grouping operator
• Instead of computing the aggregate for all the tuples, we can compute it for groups of tuples
• Group by attributes
• Compute aggregates for each group
Monday, October 4, 2010
Group by• Group by BA B C D
1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e
3 d 4 e1 d 3 c
1 a 6 c5 a 1 c
3 f 2 e
• Aggregrate sum(A), min(C) 6 1
4 3
3 2
Monday, October 4, 2010
Group byA B C D1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e
• ϒB, sum(A), min(C)→MC R
B sum(A) MCa 6 1d 4 3f 3 2
R
Monday, October 4, 2010
Group byGroup by A,D
A B C D1 a 6 c3 d 4 e5 a 1 c1 d 3 c3 f 2 e 3 d 4 e
3 f 2 e
1 a 6 c1 d 3 c
Aggregrate sum(A*C)
9
5
18
5 a 1 c
• ϒA, D, sum(A*C)→AC RA D AC1 c 95 c 53 e 18
Monday, October 4, 2010
Datalog
• Datalog is a logic based query language that supports queries equivalent to the set oriented relational algebra
• no group by, aggregate operators
• set oriented -though bag oriented extensions exist
Monday, October 4, 2010
Datalog• A relation is represented as a predicate
• The attributes of the relation are arguments of the predicate
• A term is either a simple constant or a variable (variables are denoted by lower case letters
• An atom is a predicate p(*) that contains terms for each of its arguments
Monday, October 4, 2010
Atoms• Atoms with no variables are called facts.
• Atoms evaluate to true or false.
• A database contains a set of facts.
• Relation R to the right contains two facts:
• R(1,2,’S’)
• R(3,4,’A’)
A B C1 2 ‘S’3 4 ‘A’
R
Monday, October 4, 2010
Queries• We will represent queries as new predicates defined
using rule in terms of the predicate representing stored relations (also called extensional database).
• A Datalog rule is of the form:
• A ← B1,...,Bn
• where A (head), B1,...,Bn (body) are atoms.
• All rules with the same predicate p in the head represent the definition of p.
Monday, October 4, 2010
Queries• A rule of the form
• A ← B1,...,Bn
• is interpreted as follows:
• For all possible instances of the variables in B1,...,Bn that makes B1 and ... and Bn true, return a tuple in A.
• For all tuples in A, there exists a set of variable substitutions in B1,...,Bn such that B1 and ... and Bn is true.
Monday, October 4, 2010
Safety• A rule of the form:
• A ← B1,...,Bn
• is considered safe is every variable that appears in a negated atom also appears in a positive safe predicate.
• All predicates corresponding to database predicates are considered safe.
• If all the rules defining a new predicate p are safe, then p is also considered safe.
Monday, October 4, 2010
Example queries• q(x,y,z) ←r(x,y,z)
• returns all the tuples in r
• q(x,y,z) ←r(x,y,z), z = 1
• returns all tuples in r where the third attribute is 1.
• q(x,y) ←r(x,y,_)
• returns all x,y from r ( _ is the don’t care symbol).
Monday, October 4, 2010
Relational Algebra to Datalog
• Selection: P = σX>1 R:
• P(x,y,z) ← R(x,y,z), X>1
• Project P = πA,B R:
• P(x,y) ← R(x,y,_)
Monday, October 4, 2010
Relational Algebra to Datalog• Cartesian Product
• Example: given R(A,B,C), S(D,E)
• T = RxS is equivalent to
• T(x,y,z,w,g) ← R(x,y,z), S(w,g)
• Join: T = R S
• T(x,y,z,w,g) ← R(x,y,z), S(w,g), z=w
!"C=D
Monday, October 4, 2010
Relational Algebra to Datalog• Set union, P = R∪S
• P(x,y,z) ← R(x,y,z)
• P(x,y,z) ← S(x,y,z)
• Set intersection, P = R∩S
• P(x,y,z) ← R(x,y,z), S(x,y,z)
• Set difference, P = R-S
• P(x,y,z) ← R(x,y,z), not S(x,y,z)
Monday, October 4, 2010
Recursion• It is possible to write recursive rules in Datalog
which cannot be expressed in relational algebra.
• Example:
• Given: parent(x,y) meaning x is a parent of y.
• ancestor(x,y) ← parent(x,y)
• ancestor(x,y) ← parent(x,z), ancestor(z,y).
Monday, October 4, 2010