functional dependencies and normal forms (part 1)...database system concepts, 7th ed....

Database System Concepts, 7th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Functional Dependencies and Normal Forms

(Part 1)

http://www.db-book.com/

©Silberschatz, Korth and Sudarshan7.2Database System Concepts - 7th Edition

Features of Good Relational Designs

Suppose we have the following table, where ID is the key

There is repetition of information

If we want to update the building of the Comp. Sci. dept., we need to do it

in all rows where Comp. Sci. appears

Need to use null values (if we add a new department with no instructors)


Functional Dependencies

Suppose we have the following table, where ID is the key

The data above is repeated, but there is a pattern… the department name

uniquely determines its building and budget.

This is a property of the real-world environment we want to model, and not of

only the specific table above.

In this case, we have that dept_name functionally determines building and

budget


There are usually a variety of constraints (rules) on the data in the real

world.

For example, some of the constraints that are expected to hold in a

university database are:

• Students and instructors are uniquely identified by their ID.

• Each student and instructor has only one name.

• Each instructor and student is (primarily) associated with only one

department.

• Each department has only one value for its budget, and only one

associated building.

Functional Dependencies (Cont.)


Functional Dependencies (Cont.)

A legal instance of a database (i.e., what we would like to admit as a valid

instance of the database) is one where all the real-world constraints are

satisfied.

Some real-world constraints can be expressed via so-called functional

dependencies


Functional Dependencies Definition

Let R(U) be a relation schema. a functional dependency of R is an expression of the form

X Y

where X U and Y U

An instance r of R satisfies the functional dependency above if, whenever any two tuples t1 and t2 of r agree on the attributes X, they also agree on the attributes Y. That is,

t1[X] = t2 [X] t1[Y ] = t2 [Y ]

Example: Consider R(A,B) with the following instance r.

The instance satisfies B A; but not A B.

To specify that our legal instances of the relation R(U) must satisfy a certain set of functional dependencies F, we write <R(U),F>

1 4

1 5

3 7


Functional Dependencies

Suppose we have the following table

<R(ID,name,salary,dept_name,building,budget),

• ID -> name,salary,dept_name,building,budget,

• dept_name -> building,budget >


Decomposition

Functional dependencies highlight the parts of a relation where repetition

of information occurs (redundancy).

We can try avoiding the repetition, by decomposing the relation R in two

more relations. Functional dependencies give us suggestions on how to

do it.

< Prof(ID,name,salary,dept_name), ID -> name,salary,dept_name >

< Dept(dept_name,building,budget), dept_name -> building,budget >

To reduce redundancy, we made dept_name a key of a new relation

Dept, and the relation Prof only needs to refer to it with the key.

If we try to join the two tables together, we will get the original table.

However, we cannot decompose a relation arbitrarily, otherwise, the

above property might be lost.


A Lossy Decomposition


Lossless Decomposition

Let <R(U), F> be a relation schema. A decomposition of <R(U), F> is a

set of relation schemas

< 𝑅1 𝑿𝟏 , 𝐹1 >, …, < 𝑅𝑛 𝑿𝒏 , 𝐹𝑛 >

where 𝑼 = 𝑋1 ∪⋯∪ 𝑋𝑛

We say that the decomposition is a lossless decomposition if there is

no loss of information by replacing R with the n relations

Formally, for every instance r of R that satisfies F

• 𝑟 = Π𝑿𝟏 𝑟 ⋯ Π𝑿𝒏(𝑟)


Normal Forms

Note that it is not always advised to decompose a relation:

• splitting a relation in multiple relations might make some queries less

efficient (we need to do more joins).

it is up to us, as designers, to understand how much our queries are

affected, and decide if we want to decompose or not.

If we decided we want to decompose our relation schema, then:

• The decomposition must be lossless (mandatory)

• The redundancies should be eliminated, or at least reduced as much

as possible

• The functional dependencies of the original schema should be

preserved in the decomposition, if possible.

To achieve the above properties, we decompose our schema into some

other schema in a so-called normal form.

To define normal forms, we need first some auxiliary notions.


Closure of a Set of Functional Dependencies

Given a set F set of functional dependencies, there are certain other

functional dependencies that are logically implied by F.

• If A B and B C, then we can infer that A C

• etc.

A set F of FDs logically implies an FD 𝑋 → 𝑌 if every instance that

satisfies F also satisfies 𝑋 → 𝑌

The set of all functional dependencies logically implied by F is the

closure of F.

We denote the closure of F by F+.

How can we compute the closure of a set F?



We can compute F+, the closure of F, by repeatedly applying Armstrong’s

Axioms:

• Reflexive rule: if Y X, then X Y

• Augmentation rule: if X Y, then X Z YZ

• Transitivity rule: if X Y, and Y Z, then X Z

These rules are

• Sound -- generate only functional dependencies that actually hold,

and

• Complete -- generate all functional dependencies that hold.

Additional rules:

• Union rule: If X Y holds and X Z holds, then X YZ holds.

• Decomposition rule: If X YZ holds, then X Y holds and X

Z holds.

The above rules can be inferred from Armstrong’s axioms.



Example.

• A -> BCDEFGH, CE -> A, BD -> E

CE -> A and A -> BCDEFGH imply CE -> BCDEFGH

BD -> E implies BDC -> CE

BDC -> CE and CE -> A imply BDC -> A


Closure of a set of Attributes

Another notion we are going to need, is the closure of a set of attributes.

This is useful for finding the superkeys of a relation.

Given a set of functional dependencies F and a set of attributes X

The closure of X, denoted 𝑋 +, is the set of all attributes that are

functionally determined by X. How do we compute it?

• Start from 𝑋 + = X.

• If there is a FD Z -> W in F with Z 𝑋 +, add W to 𝑋 +.

• Repeat until 𝑋 + does not change.

Example. F = {A -> B, B -> C, AC -> D }. What is (𝐴)+ ?

Starting from 𝑋 + = A, we derive B, and then C, obtaining ABC, then we

have AC -> D, and thus 𝑋 + = ABCD. A -> ABCD


Closure of a set of Attributes (Cont.)

Consider < R(U), F >. The closure of a set X of attributes is very useful to

check if X is a superkey of the relation R.

Just compute 𝑋 +, and check if 𝑋 +=U. X -> U

X might not be a (minimal) key:

• We might be able to remove some attributes from X, and still derive U.

Example. < R(A,B,C,D), F = {A -> B, B -> C, AC -> D } > .

• 𝐴 += ABCD. A is a superkey (in this case, even a key).

• 𝐴𝐶 + = ACBD. AC is a superkey, but not a key.

• 𝐵 + = BC. B is not a superkey (and thus, not even a key).


Trivial Functional Dependencies

A functional dependency X -> Y is trivial if Y X.

A trivial functional dependency is satisfied by all instances of a relation

Example:

• ID, name ID

• name name


Boyce-Codd Normal Form

We now have all the ingredients to defined our first normal form.

A relation schema <R(U), F> is in Boyce-Codd Normal Form (BCNF) if for all functional dependencies X -> Y in F+ that are not trivial

• X is a superkey of R

Intuition: the only kind of redundancy that any relevant FD can

describe is the one where data is determined by a key of the relation,

and nothing else.

Since a key is unique, for each tuple, it means that there is no

redundancy in a schema in BCNF.


Boyce-Codd Normal Form (Cont.)

Example schema that is not in BCNF:

< ProfDept (ID, name, salary, dept_name, building, budget ), F >

The set of functional dependencies is

• F = ID -> name,salary,dept_name,building,budget,

dept_name -> building, budget.

The only key is ID. The second dependency violates the BCNF condition.

If we decompose the relation schema into:

<Prof(ID,name,salary,dept_name> , ID -> name,salary,dept_name >

<Dept(dept_name,building,budget), dept_name -> building, budget>

The above two schemas are in BCNF. The decomposition is also lossless


Minimal cover

Computing the closure of F can be very hard an time consuming, as it can contain exponentially many FDs.

We solve the issue, by focusing on a simpler equivalent version of F.

Consider a set F of functional dependencies. A minimal cover of F is a set of functional dependencies 𝐹𝑚𝑖𝑛 such that:

• 𝐹𝑚𝑖𝑛+ = 𝐹+ (i.e., the two sets are equivalent)

• All functional dependencies in 𝐹𝑚𝑖𝑛 are of the form X -> A

• If we remove one FD or an attribute from the left size of an FD in 𝐹𝑚𝑖𝑛 , then 𝐹𝑚𝑖𝑛 is no more equivalent to F, i.e., 𝐹𝑚𝑖𝑛

+ ≠ 𝐹+

So, 𝐹𝑚𝑖𝑛 contains a minimal amount of “information” to describe all the FDs implied by F.


Minimal cover

One can prove that a schema <R(U), F> is in BCNF iff the BCNF conditions are satisfied by 𝐹𝑚𝑖𝑛. So, we can focus on 𝐹𝑚𝑖𝑛.

How do we compute a minimal cover of F?

• Step 1: Normalize each X -> ABC… in F, into X -> A, X -> B, X -> C, …

• Step 2: Until nothing more changes,

if there is an FD XA -> B, with 𝐴 ∈ 𝑋 +, then A is redundant, and can be removed.

• Step 3: If there is an FD X -> A such that 𝑋 + contains A, even if X -> A is not used to construct 𝑋 +, then X -> A is redundant, and can be removed


Minimal cover

Example. < R(A,B,C,D,E), F>, with

F=A -> BCE, CDB -> A, CD -> E, E -> B

First, normalize:

A -> B, A -> C, A -> E, CDB -> A, CD -> E, E -> B

Remove left attributes:

𝐶𝐷 + = CDEB, so B is redundant

𝐶 + = 𝐶 (nothing to do)

𝐷 + = 𝐷 (nothing to do)

Remove redundant FDs:

A can derive B without using A -> B: it can derive ACEB. No other FDs are redundant.

𝐹𝑚𝑖𝑛 = 𝐴 → 𝐶, 𝐴 → 𝐸, 𝐶𝐷 → 𝐴, 𝐶𝐷 → 𝐸, 𝐸 → 𝐵


Algorithm for BCNF decomposition

Algorithm: BCNF decomposition

• Input: <R(U), F> (where F is a minimal cover)

• Output: a decomposition of <R(U), 𝐹> in BCNF that is lossless

Choose some FD X -> A in 𝐹 that violates the BCNF conditions.

Compute Y = 𝑋 + ∖ 𝑋 and Z = U ∖ 𝑋𝑌

Construct the two relation schemas:

• < 𝑅1 𝑋𝑌 , (Π𝑋𝑌𝐹+)𝑚𝑖𝑛>,< 𝑅2 𝑋𝑍 , (Π𝑋𝑍𝐹

+)𝑚𝑖𝑛>

If one of the two schemas is not yet in BCNF, decompose it again.


Algorithm for BCNF decomposition

Example. < R(A,B,C), F= A -> B, B -> C>

F is already a minimal cover (nothing to remove).

The only key is A (because is the only set of attributes functionally

determining all the others).

A -> B satisfies the BCNF condition, but B -> C does not.

So, we compute all attributes that B can derive: 𝐵 + = 𝐵𝐶

We now split R in two relations:

• one relations has attributes 𝐵 +=BC,

• the other has all the remaining attributes (A)

• B must stay in both, to allow the two relations to join.

< 𝑅1 𝐵, 𝐶 , 𝐵 → 𝐶 >,< 𝑅2 𝐴,𝐵 , 𝐴 → 𝐵 >

Both schemas are in BCNF. Done.

functional dependencies and normal forms (part 1)...database system concepts, 7th ed....

Documents