cis611 lecture notes algebra

30
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1 Cleveland State University CIS611 Relational Database Systems Lecture Notes Prof. Victor Matos RELATIONAL ALGEBRA

Upload: luthweedblow

Post on 18-Apr-2017

241 views

Category:

Documents


0 download

TRANSCRIPT

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1

Cleveland State University

CIS611 – Relational Database Systems

Lecture Notes

Prof. Victor Matos

RELATIONAL ALGEBRA

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 2

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

The relational model of data (RM) was introduced by Dr. E. Codd (CACM, June 1970).

The RM is simpler and more uniform than the preceding Network and Hierarchical model.

S.Todd, (IBM 1976) presented PRTV the first implementation of a relational algebra DBMS.

A. Klug added summary functions for statistical computing (ACM SIGMOD 1982).

Roth, Oszoyoglu et al. (1987) extended the model to allow nested data structures

Clifford, Tansel, Navathe, and others have added time especifications into the model

Current research is aimed at extending the model to support complex data objects, multimedia mgnt, hyperdata, geographical, temporal and logical processing.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 3

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

A relational database is a collection of relations

A relation is a 2-dimensional table, in which each row represents a collection of related data values

The values in a relation can be interpreted as a fact describing an entity or a relationship

Relation name Attributes

STUDENT Name SSN Address GPA

Mary Poppins 111-22-3333 77 Picadilly St 4.00

Pepe Gonzalez 123-45-6789 123 Bonita Rd. 3.09 Tuples V. Sundarabatharan 999-88-7777 105 Calcara Ave. 3.87

Shi-Wua Yan 881-99-0101 778 Tienamen Sq. 3.88

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 4

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Domains, Tuples, Attributes, and Relations

A domain D is a set of atomic values. A domain is given a name, data type and format.

A relation schema R, denoted R(A1, A2, ...An), is a

set of attributes (column names). The degree of a relation is the number of attributes of its relation scheme

Each attribute Ai is the name of a role played by

some domain D in R(A1...An). D is denoted the

domain of Ai and is denoted by dom(Ai).

A relation r defined on schema R(A1...An), also

denoted by r(R), is a set of n-tuples r= { t1, t2, ...tm}

Each n-tuple t is an ordered list of n values t= <v1, v2,...,vn>, where each value vi is an element

of dom(Ai) or a special null value.

A relation r(R) is a subset of the cartesian product of the domains dom(Ai ) that define R.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 5

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Characteristics of Relations

Relations are defined as a (mathematical) set of tuples.

Duplicate tuples are not allowed

Order of tuples inside a relation is immaterial

Ordering of values within a tuple is irrelevant; therefore the column ordering is not important.

Each value in a tuple is atomic (not divisible)

Recent research is oriented toward removing the atomicity of First Normal Form databases

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 6

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Key Attributes of a Relation

A superkey SK of relation r(R) is a group of attributes which uniquely identifies all the other attributes of r(R).

A key K of a relation schema R is a minimal superkey of R.

A relation schema R may have more than one key. Each of those is called a Candidate Key.

It is common to select one of the candidate keys and elevate it to Primary Key.

Convention: The attributes representing the primary key of schema R are underlined. Example: EMPLOYEE (SSN, Name, Address, Salary) STOCK (PartNum, SupNum, Quantity)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 7

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Integrity Constrains

Integrity constrains are rules specified on the database

and are expected to hold on every instance of that schema.

1. Key constraints specify the candidate keys of each relation scheme R.

2. Entity integrity constraints state that no primary key value can be null.

3. Referential integrity constraints are specified

between two tables and is used to maintain the consistency among tuples of the two relations. Foreign key(s) of one relation are used to refer to primary key values in the other relation.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 8

RELATIONAL QUERY LANGUAGES Classification based on the underlying language „model‟

Model Example

1. Pure Algebraic ISBL-IBM

Info. Syst. Base Lang.

2. Pure Predicate Calculus

Tuple Oriented Type

Domain Oriented

QUEL Ingres

QBE, STBE

3. Mixed Algebra-Calculus

SQL

4. Object Oriented

DB4O

5. Associative

Sentences - LazySoft

6. Other

Cache (Object-Relational)

….

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 9

THE RELATIONAL DATA MODEL (RM) and the Relational Algebra

Relational Algebra

Collection of operators which are used to manipulate

entire relations.

The result of each operation is a new relation.

Consists of two grups: operations on sets and operations

specifaclly designed to manipulate relational databases

Operations on sets: UNION

DIFFERENCE

INTERSECTION

CARTESIAN PRODUCT

Operations on databases SELECT

PROJECT

JOIN

AGGREGATE

DIVISION

RENAME

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 10

RELATIONAL ALGEBRA

UNION

The result of this operation, denoted (r s ) or ( r + s ), is a

relation that includes all tuples that either are in r or s or both

in r and s.

Duplicate tuples are eliminated.

Combined relations must be union-compatible

r + s = { t / t r or t s }

r A B C s A B C

1 1 1 1 2 3

2 2 2 1 1 1

3 3 3 3 2 1

r + s A B C

1 1 1

2 2 2

3 3 3

1 2 3

3 2 1

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 11

RELATIONAL ALGEBRA

DIFFERENCE

The result of this operation, denoted ( r - s ), is a relation

that includes all tuples that are in r but not in s.

Participating relations must be union-compatible

r - s = { t / t r and t s }

Example

r A B C s A B C

1 1 1 1 2 3

2 2 2 1 1 1

3 3 3 3 2 1

r - s A B C

2 2 2

3 3 3

NOTE: The difference operator is not commutative,

that is ( in general ) r - s s - r

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 12

RELATIONAL ALGEBRA

INTERSECTION

The result of this operation, denoted ( r s ), is a relation

that includes all tuples that are present in both r and s.

Participating relations must be union-compatible

r s = { t / t r and t s }

Example

r A B C s A B C

1 1 1 1 2 3

2 2 2 1 1 1

3 3 3 3 2 1

r s A B C

1 1 1

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 13

RELATIONAL ALGEBRA

CARTESIAN PRODUCT

The operation, denoted ( r s ), is also known as the cross

product or cross join. The purpose of the operator is to

concatenate rows from two relations, making all possible

combinations of rows.

Consider relation schemas r(A1,A2,...An ) and s(B1,B2,...Bm )

Relations r and s, do not have to be union-compatible

If r has n tuples, and s has m tuples, then (r s) will have

a total of (n * m) tuples

The resulting relation schema is ( A1,A2,...An, B1,...,Bm )

Example

r2 A B C r2 x s2

A B C D E

1 1 1 1 1 1 10 a

2 2 2 1 1 1 20 b

3 3 3 2 2 2 10 a

2 2 2 20 b

s2 D E 3 3 3 10 a

10 a 3 3 3 20 b

20 b

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 14

RELATIONAL ALGEBRA

PROJECTION

The project operator extracts certain columns from the table

and discards the other columns.

Syntax: Result= Col

Table( )

where

Col is the list of columns to be extracted from the Table

Duplicate tuples in the resulting table are eliminated

EXAMPLE

Evaluate the expressions Temp1= A

r( )

Temp2= B C

r,

( )

r A B C Temp1 A Temp2 B C

1 610 3 1 610 3

1 620 3 2 620 3

1 600 2 600 2

1 650 2 650 2

2 610 3 634 4

2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 15

RELATIONAL ALGEBRA

SELECTION

The selection operator extracts certain rows from the table and discards

the others. Retrieved tuples must satisfy a given filtering condition.

Syntax: Result = cond (table)

where

Cond is a logical expression containing and, or, not operators on

clauses of the form (table.column value) or (table.col1 table.col2)

and = { =, >, >=, <, <=, <> }

Entire rows (with all of their columns) are retrieved when the

condition is met.

EXAMPLE Evaluate the expression

Temp1 = (B >=620) and (c<4) (r)

r A B C Temp1

A B C

1 610 3 1 620 3

1 620 3 1 650 2

1 600 2

1 650 2

2 610 3

2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 16

RELATIONAL ALGEBRA

RENAME

In some cases, we may want to rename the attributes of a relation or the

relation name or both.

The rename operator is useful to avoid situations in which a query

produces columns with the same name (perhaps different meaning).

Syntax: Result = oldName ← newName (table)

where

oldName is a column in the table and newName is the new

identification for the column.

Only the column name is changed, data remains intact.

EXAMPLE Evaluate the expression

Temp1 = A ← Section, B ← Course, B ← Credits (r)

r A B C Temp1

Section

Course

Credits

1 610 3 1 610 3

1 620 3 1 620 3

1 600 2 1 600 2

1 650 2 1 650 2

2 610 3 2 610 3

2 634 4 2 634 4

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 17

RELATIONAL ALGEBRA

JOIN

The join operation, denoted by (Tab1 < join condition>

Tab2), is

used to combine related tuples from two relations

Join-condition format is: (Table1.Col1 Table2.Col2),

where could be { =, >, >=, <, <=, <> }

Restrictions of the form (Table.Col Value), can be

and-ed, or or-ed to the joining condition.

EXAMPLE Consider relation schemas r(A,B,C ) and s(D,E ), and the

expression: Temp1= ( r ( )C D

s )

r A B C

1 1 1

2 2 2

3 3 3 Temp1 A B C D E

1 1 1 1 a

s D E 2 2 2 2 b

1 a 2 2 2 2 c

2 b

2 c

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 18

RELATIONAL ALGEBRA

NATURAL JOIN

The natural join operation, denoted by (Tab1 * Tab2 ), is

used to combine tuples of two relations under an equi-join.

Related columns must have the same name & domain

Implicit Join-Conditions are: (Table1.Col1 = Table2.Col2)

EXAMPLE Consider relation schemas r(A,B,C ) and s(B,C,D ), and the

expression: Temp= ( r * s )

r A B C

1 1 1

2 1 0

4 3 2 Temp A B C D

1 1 1 a

s B C D 4 3 2 c

1 1 a

1 2 b

3 2 c

4 3 d

The implicit join-condition is (r.B=s.B) and (r.C=s.C)

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 19

RELATIONAL ALGEBRA

LEFT OUTER JOIN

The left outer join operation, denoted by (r< join condition>

s ), is a special

case of the general join.

LOJ keeps in the resulting table representation from every tuple that

appears in the first (or left) relation

If no matching value for r is found in s, then the attributes of s appear

in the result as null values

EXAMPLE Consider relation schemas r(A,B,C ) and s(D,E ), and the expression:

Temp1= ( r ( )A D s )

r A B C

1 1 1

2 2 2

3 3 3 Temp1 A B C D E

1 1 1 1 a

s D E 2 2 2 2 b

1 a 2 2 2 2 c

2 b 3 3 3 null null

2 c

NOTE:

Outer-join is not a primitive operator. It could be expressed as follows:

( r ( )A D s ) = ( ( ( ) ( )) )L

A DY Y

r s r s Null

Where: Y= Schema(r) Schema(s), and L = degree(s) - |Y|

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 20

RELATIONAL ALGEBRA

AGGREGATE FUNCTIONS

Originally proposed by A. Klug (1982) to extend the scope of

relational algebra allowing mathematical computations of

summary functions.

Syntax: <grouping attributes> <function list> ( <relation name> )

Common functions are: MAX, MIN, AVG, SUM, COUNT

Grouping attributes force a fragmentation of the relation,

the function is computed in each independent group.

Output consists of the grouping attributes and the result of

the summary functions on each group

If no grouping field(s) is given the function(s) applies on

the entire table

EXAMPLE

Compute Temp= A Sum(B), Max(C) ( r )

r A B C

1 10 1 Group-by field

Summary Data

1 2 5 Temp A Sum_B Max_C

2 3 3 1 12 5

3 6 10 2 3 3

3 5 7 3 11 10

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 21

RELATIONAL ALGEBRA

DIVISION

The division operation, denoted by (r / s) is useful when you need a

mechanism to identify the tuples of some table that are related to each

and every one of the tuples of a second group.

EXAMPLE Consider relation schemas r(A,B ) and s(B ), and the expression:

Temp1= ( r / s )

r A B Temp1

A

1 1 1

1 2

1 3

1 4

2 1

2 3

3 3

s B

1

2

3

NOTE

Division is not a primitive operation it could be expressed as:

r / s = A

for table schemes r(A,B) and s(B)

the algebraic expression

r[A,B] / s[B]

selects the A-values from the dividend table

r[A,B], whose B-values are a super-set of those

B-values held in the divisor table s[B].

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 22

RELATIONAL ALGEBRA

PACK

Assume A is an attribute in Schema(r). The PackA(r) operator transforms

the A-values into a ‘nested’ representation.

A Nested field is a set of related atomic values.

EXAMPLE Consider relation schemas r(A,B,C) and the expression:

temp1 = PackC(r)

r A B C

b1 40 a1

b1 40 a2

b2 50 a3

b3 60 a4

b3 60 a2

b3 60 a5

b4 60 a6

Packc(r)

temp1 A B C

b1 40 {a1, a2}

b2 50 {a3}

b3 60 {a2, a4, a5}

b4 60 {a6}

Consider relation schemas s(A,B,C) and the expression:

temp2 = PackC(s)

s A B C

m1 1 {a1, a2}

m1 1 {a3}

m2 2 { a4 }

m2 1 {a5, a6}

m2 2 {a4, a7, a8}

PackC(s)

temp2 A B C

m1 1 {a1, a2, a3}

m2 2 {a4, a7, a8}

m2 1 {a5, a6}

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 23

RELATIONAL ALGEBRA

PACK (cont…)

Let A be one of the n attributes in Schema (A1…An). Assume the

relation r is defined over the Schema (A1…An).

Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1

For each (n-1)-tuple ( )AC

g r we define the sets Wg[CA] and Wg[A] as

follows:

Wg[A] = { t[A] / t and t[CA]= g } if A is atomic, and

Wg[A] = { x / t) t otherwise.

Then PackA(r) = { Wg / ( )AC

g r }

Therefore, the Pack operator converts sets of r-tuples whose (n-1)

attributes for CA are the same into a single tuple.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 24

RELATIONAL ALGEBRA

UNPACK

Unpack is the counterpart of the Pack operator. When applied on the set-

valued attribute A of a relation r, this operator transforms the single non-

atomic version of the tuple into a group of records in which the attribute

A is atomic.

EXAMPLE Consider relation schemas r(A,B,C) and the expression:

temp1 = UnpackC(r)

r A B C

b1 1 {a1, a2}

b2 2 {a3}

b2 2 {a2, a4, a5}

b4 3 {a6}

UnpackC(r)

temp1 A B C

b1 1 a1

b1 1 a2

b2 2 a3

b3 2 a4

b3 2 a2

b3 2 a5

b4 3 a6

Let A be one of the n attributes in Schema (A1…An). Assume the

relation r is defined over the Schema (A1…An).

Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1

UPA ( {t} ) = { t[A] } if A is atomic, and

UPA( {t} ) = { t’ / (t’[A] t[A]) and (t’[CA] = t[CA]) } otherwise.

Then PackA(r) = { Wg / ( ({ })A

t r

UP t }

If A is atomic then UPA(r)= r, otherwise UPA(r) maps each tuple t in r

into a set of (decompressed) tuples such that each element in t[A]

becomes the atomic A-value of a new decompressed tuple.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 25

RELATIONAL ALGEBRA

EXAMPLE QUERIES

QUERY 1. Retrieve the name and address of all employees

who work in the 'Research' department.

QUERY 2. For every project located in 'Cleveland', list the

project number, the controlling department number, and the

department manager's last name, address, and birthdate.

QUERY 3. Find the name of employees who work on all

projects controlled by department number 5.

QUERY 4. Make a list of project numbers for projects that

involve an employee whose last name is 'Smith', either as a

worker or as a manager of the department that controls the

project.

QUERY 5. List the name of all employees with two or

more dependents.

QUERY 6. Retrieve the name of employees who have no

dependents. QUERY 7. List the name of managers who have at least

one dependent.

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 26

Company Database

DEPARTMENTDNAME

DNUMBER

MGRSSN

MGRSTARTDATE

DEPENDENTESSN

DEPENDENT_NAME

SEX

BDATE

RELATIONSHIP

DEPT_LOCATIONSDNUMBER

DLOCATION

EMPLOYEEFNAME

MINIT

LNAME

SSN

BDATE

ADDRESS

SEX

SALARY

SUPERSSN

DNO

PROJECTPNAME

PNUMBER

PLOCATION

DNUM

WORKS_ONESSN

PNO

Hours

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 27

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 28

IST 331 Brief Notes on Relational Algebra V. Matos. Consider the relation schema of the COMPANY database given below

EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno)

DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate)

PROJECT (pname, pnumber, plocation, dnum)

WORKS_ON (essn, pno, hours)

DEPENDENT (essn, dependent-name, sex, bdate, relationship)

Operator Example Comments

Selection ( ' ') ( 25000)

( )sex F and salary

Answer Employee Find the female employees

earning at least $25K.

Projection ,

( )ssn Fname

Answer Employee Get the Social Sec. No. and first name of each employee

Join ( . . )L dno R dnumber

Answer Employee Department Merge employee and department records according to matching dept. numbers.

Union ( ' ') ( 25000) ( ' ') ( 35000)

( ) ( )sex F and salary sex M and salary

Answer Employee Employee Get the male employees earning at least $35K and the female employees whose salary is exactly $25K

Minus 4( )

dnoAnswer Employee Employee Get the employees who do not

work for department No. 4

Intersection ' ' 4

( ) ( )sex F dno

Answer Employee Employee Get all the female employees who work for dept. 4

Aggregation , ( )( )dno sex average salaryAnswer EmployeeF

Find the average salary of employees grouping by sex and dept. no. Put the results in the table defined as: Answer(dno,sex,average_salary)

Division ' ',

( ) ( ( ))plocation Clev

essn pno Pnumber

Answer WorksOn Project Get the SSN of employees working in each one of the projects located in Cleveland

Rename , ,

( )fname lname First Last

Answer Employee Change the labels “fname” and “lname” in the Employee table to “First”, and “Last”.

Q01. Get the SSN and Last name of each of the female managers

( . . )' ',

1 ( ( ) )L ssn R MgrSsnsex Fssn lname

Answer Employee Department 1

Q02. Get the Social Sec. No. of those employees who are married.

( ' ')

2 ( ( ) )relationship Spouseessn

Answer Dependent

1 Observation: The notation

( . . )( 1 2)

L a R be e merges the tables produced by the expressions e1 and e2. The match is dictated by the joining

condition (L.a=R..b). The fragment L.a identifies the a-column as part of the table produced by the table/expression e1.The L and R

qualifications indicate whether the source columns are located to the left or right side of the join-operator .

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 29

Q03. Get the last name of the married female managers. Rename the column to “Mgr Name”

( . . )" "

3 ( ( 1 2))L ssn R essnlname Mgr Name Lname

Answer Answer Answer

Q04. Get the last name of married employees who have at least one daughter and one son.

( ' ')

( ( ))relationship Sonessn

Boys Dependent

( ' ')

( ( ) )relationship Daughteressn

Girls Dependent

( ' ')

( ( ) )relationship Spouseessn

Married Dependent

. .( )

L ssn R essnLname

TheSsn Married Boys Girls

Answer Employee TheSsn

Q05. Get the last name of married employees who have no children.

. .

( )

( )L ssn R essn

Lname

TheSsn Married Boys Girls

Answer Employee TheSsn

Q06. Get the last name of married employees who only have daughters.

. .

( )

( )L ssn R essn

Lname

TheSsn Married Girls Boys

Answer Employee TheSsn

Q07. Get the last name and salary of each employee as well as that of their corresponding (direct) supervisor.

( . . ), , , ,

( ( ) ( ) )L superSsn R BossSsnEmpSsn ssn salary EmpSsn EmpSalary ssn salary BossSsn BossSalary

EmpSalaryBossSsnBossSalary

Answer Employee Employee

Q08. Get the last name of employees who work on five or more projects.

( )

. . ( _ 5)

( )

( ( ) )

essn count pno

L ssn R essnLname count pno

theTally worksOn

Answer Employee theTally

F

V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 30

Relational Algebra – Practice Test

Last Name: ___________________________ First Name:__________________

Consider the relation schema of the COMPANY database given below EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno) KEY: ssn DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate) KEY: dnumber. PROJECT (pname, pnumber, plocation, dnum) KEY: pnumber. WORKS_ON (essn, pno, hours) KEY: (essn, pno) DEPENDENT (essn, dependent-name, sex, bdate, relationship) KEY: (essn, dependent-name)

Formulate the following question in Relational Algebra query language:

1. Give the last name of those employees who work in any project(s) where there are more female than male

employees.

2. Give the last name of those female managers who work in each of the projects located in Cleveland.