functional dependencies and normal forms (part 1)...database system concepts, 7th ed....
TRANSCRIPT
Database System Concepts, 7th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Functional Dependencies and Normal Forms
(Part 1)
©Silberschatz, Korth and Sudarshan7.2Database System Concepts - 7th Edition
Features of Good Relational Designs
Suppose we have the following table, where ID is the key
There is repetition of information
If we want to update the building of the Comp. Sci. dept., we need to do it
in all rows where Comp. Sci. appears
Need to use null values (if we add a new department with no instructors)
©Silberschatz, Korth and Sudarshan7.3Database System Concepts - 7th Edition
Functional Dependencies
Suppose we have the following table, where ID is the key
The data above is repeated, but there is a pattern… the department name
uniquely determines its building and budget.
This is a property of the real-world environment we want to model, and not of
only the specific table above.
In this case, we have that dept_name functionally determines building and
budget
©Silberschatz, Korth and Sudarshan7.4Database System Concepts - 7th Edition
There are usually a variety of constraints (rules) on the data in the real
world.
For example, some of the constraints that are expected to hold in a
university database are:
• Students and instructors are uniquely identified by their ID.
• Each student and instructor has only one name.
• Each instructor and student is (primarily) associated with only one
department.
• Each department has only one value for its budget, and only one
associated building.
Functional Dependencies (Cont.)
©Silberschatz, Korth and Sudarshan7.5Database System Concepts - 7th Edition
Functional Dependencies (Cont.)
A legal instance of a database (i.e., what we would like to admit as a valid
instance of the database) is one where all the real-world constraints are
satisfied.
Some real-world constraints can be expressed via so-called functional
dependencies
©Silberschatz, Korth and Sudarshan7.6Database System Concepts - 7th Edition
Functional Dependencies Definition
Let R(U) be a relation schema. a functional dependency of R is an expression of the form
X Y
where X U and Y U
An instance r of R satisfies the functional dependency above if, whenever any two tuples t1 and t2 of r agree on the attributes X, they also agree on the attributes Y. That is,
t1[X] = t2 [X] t1[Y ] = t2 [Y ]
Example: Consider R(A,B) with the following instance r.
The instance satisfies B A; but not A B.
To specify that our legal instances of the relation R(U) must satisfy a certain set of functional dependencies F, we write <R(U),F>
1 4
1 5
3 7
©Silberschatz, Korth and Sudarshan7.7Database System Concepts - 7th Edition
Functional Dependencies
Suppose we have the following table
<R(ID,name,salary,dept_name,building,budget),
• ID -> name,salary,dept_name,building,budget,
• dept_name -> building,budget >
©Silberschatz, Korth and Sudarshan7.8Database System Concepts - 7th Edition
Decomposition
Functional dependencies highlight the parts of a relation where repetition
of information occurs (redundancy).
We can try avoiding the repetition, by decomposing the relation R in two
more relations. Functional dependencies give us suggestions on how to
do it.
< Prof(ID,name,salary,dept_name), ID -> name,salary,dept_name >
< Dept(dept_name,building,budget), dept_name -> building,budget >
To reduce redundancy, we made dept_name a key of a new relation
Dept, and the relation Prof only needs to refer to it with the key.
If we try to join the two tables together, we will get the original table.
However, we cannot decompose a relation arbitrarily, otherwise, the
above property might be lost.
©Silberschatz, Korth and Sudarshan7.9Database System Concepts - 7th Edition
A Lossy Decomposition
©Silberschatz, Korth and Sudarshan7.10Database System Concepts - 7th Edition
Lossless Decomposition
Let <R(U), F> be a relation schema. A decomposition of <R(U), F> is a
set of relation schemas
< 𝑅1 𝑿𝟏 , 𝐹1 >, …, < 𝑅𝑛 𝑿𝒏 , 𝐹𝑛 >
where 𝑼 = 𝑋1 ∪⋯∪ 𝑋𝑛
We say that the decomposition is a lossless decomposition if there is
no loss of information by replacing R with the n relations
Formally, for every instance r of R that satisfies F
• 𝑟 = Π𝑿𝟏 𝑟 ⋯ Π𝑿𝒏(𝑟)
©Silberschatz, Korth and Sudarshan7.11Database System Concepts - 7th Edition
Normal Forms
Note that it is not always advised to decompose a relation:
• splitting a relation in multiple relations might make some queries less
efficient (we need to do more joins).
it is up to us, as designers, to understand how much our queries are
affected, and decide if we want to decompose or not.
If we decided we want to decompose our relation schema, then:
• The decomposition must be lossless (mandatory)
• The redundancies should be eliminated, or at least reduced as much
as possible
• The functional dependencies of the original schema should be
preserved in the decomposition, if possible.
To achieve the above properties, we decompose our schema into some
other schema in a so-called normal form.
To define normal forms, we need first some auxiliary notions.
©Silberschatz, Korth and Sudarshan7.12Database System Concepts - 7th Edition
Closure of a Set of Functional Dependencies
Given a set F set of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
• If A B and B C, then we can infer that A C
• etc.
A set F of FDs logically implies an FD 𝑋 → 𝑌 if every instance that
satisfies F also satisfies 𝑋 → 𝑌
The set of all functional dependencies logically implied by F is the
closure of F.
We denote the closure of F by F+.
How can we compute the closure of a set F?
©Silberschatz, Korth and Sudarshan7.13Database System Concepts - 7th Edition
Closure of a Set of Functional Dependencies
We can compute F+, the closure of F, by repeatedly applying Armstrong’s
Axioms:
• Reflexive rule: if Y X, then X Y
• Augmentation rule: if X Y, then X Z YZ
• Transitivity rule: if X Y, and Y Z, then X Z
These rules are
• Sound -- generate only functional dependencies that actually hold,
and
• Complete -- generate all functional dependencies that hold.
Additional rules:
• Union rule: If X Y holds and X Z holds, then X YZ holds.
• Decomposition rule: If X YZ holds, then X Y holds and X
Z holds.
The above rules can be inferred from Armstrong’s axioms.
©Silberschatz, Korth and Sudarshan7.14Database System Concepts - 7th Edition
Closure of a Set of Functional Dependencies
Example.
• A -> BCDEFGH, CE -> A, BD -> E
CE -> A and A -> BCDEFGH imply CE -> BCDEFGH
BD -> E implies BDC -> CE
BDC -> CE and CE -> A imply BDC -> A
©Silberschatz, Korth and Sudarshan7.15Database System Concepts - 7th Edition
Closure of a set of Attributes
Another notion we are going to need, is the closure of a set of attributes.
This is useful for finding the superkeys of a relation.
Given a set of functional dependencies F and a set of attributes X
The closure of X, denoted 𝑋 +, is the set of all attributes that are
functionally determined by X. How do we compute it?
• Start from 𝑋 + = X.
• If there is a FD Z -> W in F with Z 𝑋 +, add W to 𝑋 +.
• Repeat until 𝑋 + does not change.
Example. F = {A -> B, B -> C, AC -> D }. What is (𝐴)+ ?
Starting from 𝑋 + = A, we derive B, and then C, obtaining ABC, then we
have AC -> D, and thus 𝑋 + = ABCD. A -> ABCD
©Silberschatz, Korth and Sudarshan7.16Database System Concepts - 7th Edition
Closure of a set of Attributes (Cont.)
Consider < R(U), F >. The closure of a set X of attributes is very useful to
check if X is a superkey of the relation R.
Just compute 𝑋 +, and check if 𝑋 +=U. X -> U
X might not be a (minimal) key:
• We might be able to remove some attributes from X, and still derive U.
Example. < R(A,B,C,D), F = {A -> B, B -> C, AC -> D } > .
• 𝐴 += ABCD. A is a superkey (in this case, even a key).
• 𝐴𝐶 + = ACBD. AC is a superkey, but not a key.
• 𝐵 + = BC. B is not a superkey (and thus, not even a key).
©Silberschatz, Korth and Sudarshan7.17Database System Concepts - 7th Edition
Trivial Functional Dependencies
A functional dependency X -> Y is trivial if Y X.
A trivial functional dependency is satisfied by all instances of a relation
Example:
• ID, name ID
• name name
©Silberschatz, Korth and Sudarshan7.18Database System Concepts - 7th Edition
Boyce-Codd Normal Form
We now have all the ingredients to defined our first normal form.
A relation schema <R(U), F> is in Boyce-Codd Normal Form (BCNF) if for all functional dependencies X -> Y in F+ that are not trivial
• X is a superkey of R
Intuition: the only kind of redundancy that any relevant FD can
describe is the one where data is determined by a key of the relation,
and nothing else.
Since a key is unique, for each tuple, it means that there is no
redundancy in a schema in BCNF.
©Silberschatz, Korth and Sudarshan7.19Database System Concepts - 7th Edition
Boyce-Codd Normal Form (Cont.)
Example schema that is not in BCNF:
< ProfDept (ID, name, salary, dept_name, building, budget ), F >
The set of functional dependencies is
• F = ID -> name,salary,dept_name,building,budget,
dept_name -> building, budget.
The only key is ID. The second dependency violates the BCNF condition.
If we decompose the relation schema into:
<Prof(ID,name,salary,dept_name> , ID -> name,salary,dept_name >
<Dept(dept_name,building,budget), dept_name -> building, budget>
The above two schemas are in BCNF. The decomposition is also lossless
©Silberschatz, Korth and Sudarshan7.20Database System Concepts - 7th Edition
Minimal cover
Computing the closure of F can be very hard an time consuming, as it can contain exponentially many FDs.
We solve the issue, by focusing on a simpler equivalent version of F.
Consider a set F of functional dependencies. A minimal cover of F is a set of functional dependencies 𝐹𝑚𝑖𝑛 such that:
• 𝐹𝑚𝑖𝑛+ = 𝐹+ (i.e., the two sets are equivalent)
• All functional dependencies in 𝐹𝑚𝑖𝑛 are of the form X -> A
• If we remove one FD or an attribute from the left size of an FD in 𝐹𝑚𝑖𝑛 , then 𝐹𝑚𝑖𝑛 is no more equivalent to F, i.e., 𝐹𝑚𝑖𝑛
+ ≠ 𝐹+
So, 𝐹𝑚𝑖𝑛 contains a minimal amount of “information” to describe all the FDs implied by F.
©Silberschatz, Korth and Sudarshan7.21Database System Concepts - 7th Edition
Minimal cover
One can prove that a schema <R(U), F> is in BCNF iff the BCNF conditions are satisfied by 𝐹𝑚𝑖𝑛. So, we can focus on 𝐹𝑚𝑖𝑛.
How do we compute a minimal cover of F?
• Step 1: Normalize each X -> ABC… in F, into X -> A, X -> B, X -> C, …
• Step 2: Until nothing more changes,
if there is an FD XA -> B, with 𝐴 ∈ 𝑋 +, then A is redundant, and can be removed.
• Step 3: If there is an FD X -> A such that 𝑋 + contains A, even if X -> A is not used to construct 𝑋 +, then X -> A is redundant, and can be removed
©Silberschatz, Korth and Sudarshan7.22Database System Concepts - 7th Edition
Minimal cover
Example. < R(A,B,C,D,E), F>, with
F=A -> BCE, CDB -> A, CD -> E, E -> B
First, normalize:
A -> B, A -> C, A -> E, CDB -> A, CD -> E, E -> B
Remove left attributes:
𝐶𝐷 + = CDEB, so B is redundant
𝐶 + = 𝐶 (nothing to do)
𝐷 + = 𝐷 (nothing to do)
Remove redundant FDs:
A can derive B without using A -> B: it can derive ACEB. No other FDs are redundant.
𝐹𝑚𝑖𝑛 = 𝐴 → 𝐶, 𝐴 → 𝐸, 𝐶𝐷 → 𝐴, 𝐶𝐷 → 𝐸, 𝐸 → 𝐵
©Silberschatz, Korth and Sudarshan7.23Database System Concepts - 7th Edition
Algorithm for BCNF decomposition
Algorithm: BCNF decomposition
• Input: <R(U), F> (where F is a minimal cover)
• Output: a decomposition of <R(U), 𝐹> in BCNF that is lossless
Choose some FD X -> A in 𝐹 that violates the BCNF conditions.
Compute Y = 𝑋 + ∖ 𝑋 and Z = U ∖ 𝑋𝑌
Construct the two relation schemas:
• < 𝑅1 𝑋𝑌 , (Π𝑋𝑌𝐹+)𝑚𝑖𝑛>,< 𝑅2 𝑋𝑍 , (Π𝑋𝑍𝐹
+)𝑚𝑖𝑛>
If one of the two schemas is not yet in BCNF, decompose it again.
©Silberschatz, Korth and Sudarshan7.24Database System Concepts - 7th Edition
Algorithm for BCNF decomposition
Example. < R(A,B,C), F= A -> B, B -> C>
F is already a minimal cover (nothing to remove).
The only key is A (because is the only set of attributes functionally
determining all the others).
A -> B satisfies the BCNF condition, but B -> C does not.
So, we compute all attributes that B can derive: 𝐵 + = 𝐵𝐶
We now split R in two relations:
• one relations has attributes 𝐵 +=BC,
• the other has all the remaining attributes (A)
• B must stay in both, to allow the two relations to join.
< 𝑅1 𝐵, 𝐶 , 𝐵 → 𝐶 >,< 𝑅2 𝐴,𝐵 , 𝐴 → 𝐵 >
Both schemas are in BCNF. Done.