chapter 7 relational algebra. topics in this chapter closure revisited the original algebra: syntax...

Post on 13-Jan-2016

231 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chapter 7

Relational Algebra

Topics in this Chapter

• Closure Revisited • The Original Algebra: Syntax and

Semantics• What is the Algebra For?• Further Points and Additional Operators• Grouping and Ungrouping

Relational Algebra

• The relational algebra is a collection of operators that take relations as their operands and return a relation as their result

• Eight operators, in two groups of four• Union, intersect, difference, Cartesian product• Restrict, project, join, divide• The set of possible relational operators is

essentially unlimited• The operators are “read only”

Fig. 7.1 The original eight operators (overview)

Closure Revisited

• The output from any relation operator is another relation: the closure property

• Relation expressions can be nested (analogously to arithmetic expressions)

• Every relation has a head and a body; relational algebra must address both

• Attribute type inference must be supported• RENAME changes the name of an attribute

without changing its type or content

RENAME

S RENAME CITY AS SCITY

+------+-------+--------+--------+| S# | SNAME | STATUS | SCITY |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+

RENAME

(S RENAME CITY AS SCITY)

+------+-------+--------+--------+| S# | SNAME | STATUS | SCITY |+------+-------+--------+--------+| S1 | Smith | 20 | London || S2 | Jones | 10 | Paris || S3 | Blake | 30 | Paris || S4 | Clark | 20 | London || S5 | Adams | 30 | Athens |+------+-------+--------+--------+

expression

value

The Syntax of the Original Algebra

The algebra exists because of the nature and definition of relations. The algebra is independent of its description.

Note that the Date text defines the algebra using words rather than symbols.

Most texts use symbols to describe the syntax of the algebra. Some who prefer that say that it “looks more scientific.”

More about the symbols later.

The Syntax of the Original Algebra

BNF grammar for the relational algebra:

::= “is defined as”< > to indicate category names | “or” […] to indicate something optional

upper case words such as WHERE are elements of the language{ and } are symbols in the language, not BNFuse of “commalist” for repetition

The Syntax of the Original Algebra

• Each operator returns a relation, and operates on a relation

• Each operator assigns a relation value to the new relation, based on alterations to the tables being operated upon

• Generically:

<relation expression>:= RELATION { <tuple expression

commalist>

}

The Syntax of the Original Algebra –General Format

<relation expression>

:= RELATION {<tuple expression commalist>}

| <relvar name>

| <relation operator invocation>

| <with expression>

| <introduced name>

| ( <relation expression>)

<relation operation invocation> ::= <project> | <nonproject>

 

<project> :: = <relation expression>

{ [ ALL BUT ] <attribute name commalist> }

 

(The <relation expression> must not be a <nonproject>)

 

 

<nonproject> ::= <rename> | <union> | <intersect> | <minus>

| <times> | <where> | <join> | <divide>

 

<rename> ::= <relation expression> RENAME <renaming commalist>

 

(The <relation expression> must not be a <nonproject>)

 

 

<union> ::= <relation expression> UNION <relation expression>

 

(The <relation expression>s must not be <nonproject>s,

except either or both can be another <union>)

 

<intersect> ::= <relation expression> INTERSECT <relation expression>

  (The <relation expression>s must not be <nonproject>s)

(Except either or both can be another <intersect>)

 

<minus> ::= <relation expression> MINUS <relation expression>

  (The <relation expression>s must not be <nonproject>s)

 

<times> ::= <relation expression> TIMES <relation expression>

  (The <relation expression>s must not be <nonproject>s)

(Except either or both can be another <times>)

 

 

 

<where> ::= <relation expression> WHERE <boolean expression>

  (The <relational expression> must not be a <nonproject>)

 

 

<join> ::= <relation expression> JOIN <relation expression>

  (The <relation expression>s must not be <nonproject>s)

(Except either or both can be another <join>)

 

 

 

 

<divide> ::= <relation expression>

DIVIDEBY <relation expression> PER <per>

  (The <relation expression>s must not be <nonproject>s)

  

<per> ::= <relation expression>

| (<relation expression>, <relation expression> )

  (The <relation expression>s must not be <nonproject>s)

<with expression> ::= WITH <name intro commalist> : <expression>

<name intro> ::= <expression> AS <introduced name>

 

 

Semantics of the Original Algebra –Union

• Union operates on two sets and returns a set that contains all elements belonging to either

• Both sets must be of the same type - formerly known as union compatibility

• Relations cannot have duplicate tuples; we say loosely that UNION “eliminates duplicates”

Semantics of the Original Algebra –Intersect and Difference

• Intersect operates on two sets and returns a set that contains all tuples belonging to both

• Difference operates on two sets and returns a set containing all tuples occuring in one but not the other, using MINUS

• For both Intersect and Difference, the sets operated upon must be of the same type - formerly known as union compatibility

Semantics of the Original Algebra –Cartesian Product

• A Cartesian Product is the set of all ordered pairs such that in each pair, the first element comes from the first set, and the second element comes from the second set

• However, since the result of a relational operator is a relation, the result of each pair is a single tuple containing all the elements of both of the source tuples

• Uses keyword TIMES

Semantics of the Original Algebra –Restrict

• Yields a horizontal subset – a/k/a “SELECT”

• a WHERE p• p is called the restriction condition• p is a predicate, and returns boolean• If it can be evaluated by examining a

single tuple it is simple; otherwise it is nonsimple

Semantics of the Original Algebra –Project

• Yields a vertical subset• The general form is a commalist of

attributes to be kept in the result• For all attributes kept, all tuples are kept• An alternative specification is to name the

attributes to be excluded:• P { ALL BUT WEIGHT}

Semantics of the Original Algebra –Join – Natural Join

• When unqualified, join means “natural join”• For any two relations with at least one matching

attribute, the join operator returns a relation with a single tuple of all the attributes for each match

• Attributes that do not match from each source relation are retained

• If no attributes match, result is a Cartesian product

• If all attributes match, result is an Intersect

Semantics of the Original Algebra –Join – Theta Join

• Used to join relations based on matching attributes, where the values are not equal

• Given relations a and b, and attributes X and Y, this can be expressed as follows:

• (a TIMES b) WHERE X theta Y • When theta is set to = the result can be

made to be that of natural join (project away the duplicate attribute, and rename the kept one)

Semantics of the Original Algebra –Divide

• Used to “divide one relation into another”• Small Divide uses one relation expression

as divisor, Great Divide uses two• For small divide:• a DIVIDEDBY b PER c• where a is the dividend, b is the divisor,

and c is the mediator• Used to determine who in a relates to the

complete set in b

Semantics of the Original Algebra –Divide - Example

• Let S be a relation of suppliers, P one of parts, and SP the mediator

• S JOIN ( S {S#} DIVIDEDBY P {P#}

PER SP {S#, P#} )

• Will return a relation with suppliers who supply all parts, only

Examples

Get supplier names for suppliers who supply part P2.

In SQL:SELECT SNAME FROM SWHERE S# IN

(SELECT S# FROM SP WHERE P# = ‘P2’);

 In relational algebra:

( ( SP JOIN S ) WHERE P# = P# (‘P2’) ) { SNAME }

Get supplier names for suppliers who supply at least one red part.

 SELECT SNAMEFROM SWHERE S# IN

(SELECT S# FROM SP WHERE P# IN

(SELECT P# FROM P WHERE COLOR = ‘RED’) );

( ( ( P WHERE COLOR = COLOR (‘RED’) ) { P# } JOIN SP ) { S# }

JOIN S ) {SNAME}

Get supplier names for suppliers who do not supply part P2.

 

SELECT SNAME

FROM S

WHERE NOT EXISTS

( SELECT S#

FROM SP

WHERE S# = S.S#

AND P# = ‘P2’ ) ;

 

 

( ( S {S#} MINUS ( SP WHERE P# = ‘P2’ ) { S# } )

JOIN S ) { SNAME }

Get all pairs of supplier numbers where the two suppliers are located in the same city.

 SELECT FIRST.S#, SECOND.S#FROM S FIRST, S SECONDWHERE FIRST.CITY = SECOND.CITYAND FIRST.S# < SECOND.S#;  ( ( ( S RENAME S# AS FIRSTS# ) {FIRSTS#, CITY} JOIN (S RENAME S# AS SECONDS# ) {SECONDS#, CITY} ) WHERE FIRSTS# < SECONDS# )

{ FIRSTS#, SECONDS# }

 

Get supplier names for suppliers who do not supply part P2. SELECT SNAMEFROM SWHERE NOT EXISTS

( SELECT S# FROM SP WHERE S# = S.S# AND P# = ‘P2’ ) ;

  ( ( S {S#} MINUS ( SP WHERE P# = ‘P2’ ) { S# } )

JOIN S ) { SNAME } 

<divide> ::= <relation expression>

DIVIDEBY <relation expression> PER <per>

  (The <relation expression>s must not be <nonproject>s)

  

<per> ::= <relation expression>

| (<relation expression>, <relation expression> )

  (The <relation expression>s must not be <nonproject>s)

<with expression> ::= WITH <name intro commalist> : <expression>

<name intro> ::= <expression> AS <introduced name>

 

 

Semantics of the Original Algebra –Divide

• Used to “divide one relation into another”• Small Divide uses one relation expression

as divisor, Great Divide uses two• For small divide:• a DIVIDEDBY b PER c• where a is the dividend, b is the divisor,

and c is the mediator• Used to determine who in a relates to the

complete set in b

Fig. 7.8 Division Examples

The “Symbolic” Form

Names of Suppliers located in Paris:

  π SNAME ( σ CITY = ‘Paris’ (S) )

 

(S WHERE CITY = CITY (‘Paris’) ){SNAME}

 

 Names of Suppliers of part ‘P2’:

π SNAME ( σ P# = ’P2’ (S SP) )

((S JOIN SP) WHERE P# = ‘P2’) {SNAME}

Relational Algebra Symbols

Unary Operators

Selection

Projection

Aggregate Function

Binary Operators

Union

Intersection

Difference

Cartesian product X

Theta Join

Natural Join * (or in some notations )

Left Outer Join

Right Outer Join

Full Outer Join

Outer Union *

Logic Symbols

Logical AND

Logical OR

Logical NOT

What is the Algebra for?

• The purpose of the algebra is to allow the writing of relational expressions

• Applications of the algebra: retrieval, update, defining integrity constraints, derived relvars, stability and security

• An implemented language can be said to be relationally complete if it is at least as powerful as the algebra

The Original Algebra

• Many operators are associative: Union, intersect, times, join, but not minus

• Many operators are commutative: Union, intersect, times, join, but not minus

• Join, union, intersect were originally defined as dyadic, but are now seen to operate on any number of relations, including DEE and DUM

Additional Relational Operators

• Semijoin is used to perform a partial join based on restrictions (Join for a specific part number, for example)

• Semidifference is similar (Obtain suppliers who do not supply a particular part, e.g.)

• Extend adds an attribute dynamically, but does not alter the underlying relvar

• Summarize performs vertical or attribute-wise computations

Semijoin

 

A SEMIJOIN B is equivalent to:

 

(A JOIN B) { X, Y }

  

The JOIN of A and B projected over the attributes of A.

 

The tuples of A that have “counterparts” in B.

Grouping…

• Required because relations can have attributes that are themselves relations

• Provides a map between such relations and “flat” relations

• SP GROUP {P#, QTY} AS PQ• Will return quantities of parts by supplier,

which is the unnamed co-conspirator

…and Ungrouping

• Returns the original relation• In the example, the original SP relation• If you group, you can always ungroup, but

the converse is not necessarily true• This occurs when the relations being

ungrouped were not validly grouped in the first place

top related