databases -normalization iii · 2012-05-16 · this lecture describes 3rd normal form. (n...
TRANSCRIPT
Databases -Normalization III
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31
This lecture
This lecture describes 3rd normal form.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31
Normal Forms
BCNF - recap
The BCNF decomposition of a relation is derived by a recursivealgorithm. A lossless-join decomposition is derived which may not bedependency preserving.
The decomposition is too restrictive. To make the decomposition in theprevious example dependency preserving we can cover the FDJP → C by adding its attributes as a relation
R1 = CSJDQV R2 = SDP R3 = JPC
We have added the required FD involving key attributes that wereprohibited by BCNF.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 3 / 31
Normal Forms
3NF
A relational schema is in 3NF if for every functional dependency X → A(where X is a subset of the attributes and A is a single attribute) either
A ∈ X (that is, X → A is a trivial FD), orX is a superkey, orA is part of some key for R.
Thus the definition of 3NF permits a few additional FDs involving keyattributes that are prohibited by BCNF.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 4 / 31
Normal Forms
The contracts example
The contracts example was not in BCNF because of the FD
SD → P
However as P is part of a key (JP), this FD does not violate theconditions for 3NF. Therefore neither of the two given FDs violate theconditions for 3NF.
Checking all the other (implied) FDs shows that contracts is already in3NF and thus need not be decomposed further.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 5 / 31
Normal Forms
Properties of 3NF
The most important property of 3NF is the result that there is always alossless-join and dependency-preserving decomposition of anyrelational schema into 3NF.
While not conceptually difficult, the algorithm to decompose a relationschema into 3NF is quite fiddly and much more complicated than thesimple algorithm for decomposition into BCNF.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 6 / 31
Normal Forms
3NF Synthesis
The BCNF algorithm is from the perspective of deriving a lossless-joindecomposition, and dependency preservation was addressed byadding extra relation schemas.
An alternative approach involves synthesis that takes the attributes ofthe original relation R and the minimal cover F of the FDs and adds arelation schema XA for each FD X → A in F.
The resulting collection of relations is 3NF and preserves all FDs. If itis not lossless-join, it is made so by adding a relation that containsthose attributes that appear in some key.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 7 / 31
Normal Forms
The two approaches
The decomposition algorithm derives the relations while maintainingthe keys, thus emphasizing the lossless-join property, and deals withdependency preservation by adding the necessary relations.
The synthesis algorithm build the relations from the set of FDs,emphasizing the dependencies, and then deals with lossless-joinissues after the fact by adding the necessary relation.
The latter is an algorithm that has polynomial complexity.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 8 / 31
3NF synthesis
The Synthesis Algorithm
Given a relation R and a set of functional dependencies X → Y whereX and Y are subsets of the attributes of R.
1 Compute the closures of the FDs.2 Use À to find all the candidate keys of the relation R.3 Find the minimal cover of the relation R.4 Use  to produce Ri decompositions in 3NF.5 If no Ri is a super key of R, add a relation containing the key.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 9 / 31
3NF synthesis
FD closures
1 Compute the closures of the FDs.
For each X → Y, derive (X)+ the set of attributes of R covered byX → Y.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 10 / 31
3NF synthesis
Candidate keys
2 Use À to find all the candidate keys of the relation R.
If any (X)+ covers all the attributes of R then X is a key. It is possibly asuper key with redundant attributes contained within X (these will beobvious).
Otherwise find the (X)+ that maximally covers R and add to Xattributes of R missing from the closure.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 11 / 31
3NF synthesis
Minimal cover3 Find the minimal cover of the relation R.
ALGORITHM:Step 1 Decompose all the FDs to the form X → A, where A is asingle attribute.Step 2 Eliminate redundant attributes from the LHS.
If XB → A, where B is a single attribute andX → A is entailed within the set, thenB was unnecessary.
If LHS has more than one attribute, compute closure of each setthat leaves out one attribute. If it includes the right side, removethe extra attribute in the functional dependency.Step 3 Delete the redundant FDs.For each dependency, compute the closure of the determinantattributes against all the other dependencies except the one you’retesting. If you find the attribute on the right side in the closure, youcan leave that dependency out.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 12 / 31
3NF synthesis
3NF relations
4 Use  to produce Ri decompositions in 3NF.
ALGORITHM:
Step 1 Combine all FDs with the same LHS
For each X such that X → A, X → B ...⇒ X → AB....
Step 2 Form a relation Ri of all attributes in each FD.
For each X → ABC, form⇒ {X,A,B,C}.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 13 / 31
3NF synthesis
3NF relations
5 If no Ri is a super key of R, add a relation containing the key.
If UVW is a key for R, and no relation Ri formed in the 3NFdecomposition contains a super key of R, then add {U,V,W} as arelation.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 14 / 31
3NF synthesis
Exercise
Consider R = {A,B,C,D,E,G} and the following FDs
1 AB → C2 C → A3 BC → D4 ACD → B5 D → EG6 BE → C7 CG → BD8 CE → AG
(AB)+ = ABCDEG
(C)+ = AC
(BC)+ = ABCDEG
(ACD)+ = ABCDEG
(D)+ = DEG
(BE)+ = ABCDEG
(CG)+ = ABCDEG
(CE)+ = ABCDEG
AB is a key
BC is a keyACD is a super key
BE is a keyCG is a keyCE is a key
Applying Armstrong’s Axioms gives BD and CD are also keys.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 15 / 31
3NF synthesis
Generate minimal cover - Working
Original FDs
1 AB → C2 C → A3 BC → D4 ACD → B5 D → EG6 BE → C7 CG → BD8 CE → AG
single RHS
1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D10 CE → A11 CE → G
redundant LHS
1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D
10 CE→ A11 CE → G
minimal FDs
1
2
3
4
5
6
7
8
9
10
11
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 16 / 31
3NF synthesis
Generate minimal cover
single RHS
1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D10 CE → A11 CE → G
redundant LHS
1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D
10 CE→ A11 CE → G
duplications
delete Ç
= È + Ã
delete É = Á
minimal FDs
1 AB → C2 C → A3 BC → D4 CD → B5 D → E6 D → G7 BE → C8
9 CG → D10
11 CE → G
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 17 / 31
3NF Synthesis
The synthesis process
Combine same LHS
1 AB → C2 C → A3 BC → D4 CD → B5 D → E, D → G6 BE → C7 CG → D8 CE → G
Form relations
1 {A,B,C}2 {C,A}3 {B,C,D}4 {C,D,B}5 {D,E,G}6 {B,E,C}7 {C,G,D}8 {C,E,G}
The 3NF decomposition
1 {A,B,C}2 Á is a proper subset of À
3 {B,C,D}4 Ã is a subset of Â
5 {D,E,G}6 {B,E,C}7 {C,G,D}8 {C,E,G}
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 18 / 31
3NF Synthesis
Final 3NF decomposition
{A,B,C}{B,C,D}{D,E,G}{B,E,C}{C,G,D}{C,E,G}.
Super key is contained in at least one relation.
We have looked at how to confirm the decomposition is a lossless-joinwhen an original relation R is decomposed into two R1 and R2.
Simply check the shared attribute(s) of R1 and R2 to see if it is asuperkey for either R1 or R2.
However when decomposed into more than two relations, we will needto use the algorithm described in the next three slides to check forlossless decomposition.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 19 / 31
3NF Synthesis
Definition - Lossless Join Property
Definition: A decomposition D = {R1,R2, ...,Rm} of R has the lossless(nonadditive) join property with respect to the set of dependencies Fon R if, for every relation state Ri of R that satisfies F, the natural join ofall the relations in D:
R1 ./ R2... ./ Rm = R
Note: The word loss in lossless refers to loss of information, not to lossof tuples. In fact, for “loss of information” a better term is “addition ofspurious information”.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 20 / 31
3NF Synthesis
Algorithm for Testing Lossless Join Property
Step 1 Create an initial matrix with one row for each decomposedrelation Ri, and one column for each attribute Aj in the originalrelation R.Step 2 For the ith row, if the jth attribute is in Ri, cell (i, j) shouldcontain value aj, otherwise, bij.
A B C D E GR1{A,B,C} a1 a2 a3 b14 b15 b16R2{B,C,D} b21 a2 a3 a4 b25 b26R3{D,E,G} b31 b32 b33 a4 a5 a6R4{B,E,C} b41 a2 a3 b44 a5 b46R5{C,G,D} b51 b52 a3 a4 b55 a6R6{C,E,G} b61 b62 a3 b64 a5 a6
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 21 / 31
3NF Synthesis
Algorithm for Testing Lossless Join Property (contd.)Step 3 Apply functional dependencies to replace the bijs with ajs.Step 4 If there is a row contains all ajs, the decomposition islossless.
A B C D E GR1{A,B,C} a1 a2 a3R2{B,C,D} a2 a3 a4R3{D,E,G} a4 a5 a6R4{B,E,C} a2 a3 a5R5{C,G,D} a3 a4 a6R6{C,E,G} a3 a5 a6
1 AB → C2 C → A3 BC → D4 CD → B5 D → E6 D → G7 BE → C8 CG → D9 CE → G
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 22 / 31
3NF Synthesis
Manual normalization
Data typically employed in a business is usually presented as below.
However to be useful in a database the data needs to be normalized.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 23 / 31
3NF Synthesis
First normal form – 1NF
First normal form (1NF) requires attribute values be atomic — that is,individual values rather than lists or sets and that a primary key isdefined.
This table is in 1NF. The primary key is (Order ID, Product ID).
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 24 / 31
3NF Synthesis
Second normal form – 2NF
A relation in first normal form (1NF) has the above functionaldependencies.
A relation is in Second normal for (2NF) if it is in 1NF and contains nopartial dependencies.
A partial functional dependency exists when a nonkey attribute isfunctionally dependent on part of, but not all of, the primary key.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 25 / 31
3NF Synthesis
Second normal form – 2NF
2NF is achieved by creating three (3) relations with the followingattributes and keys.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 26 / 31
3NF Synthesis
Third normal form – 3NF
A relation in second normal form (2NF) still has the above functionaldependencies.
A relation is in Third normal form (3NF) if it is in 2NF and contains notransitive dependencies.
A transitive functional dependency in a relation is a functionaldependency between two or more nonkey attributes.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 27 / 31
3NF Synthesis
Third normal form – 3NF
3NF is achieved by creating an additional relation with the followingattributes and keys.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 28 / 31
3NF Synthesis
Practical Use of Normalization
In practice, the DBA should be guided by the theory of normalization,but not slavishly follow it.
When the ER diagram is converted into tables, any functionaldependencies should be identified and a decision made as to whetherthe tables should be normalized or not.
Occasionally it will even be appropriate to de-normalize data if themost frequently occurring queries involve expensive joins of thenormalized tables.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 29 / 31
Aside
Multi-valued Attributes
A Single-valued attribute of an entity instance has just one value. Thishas been the typical scenario we have discussed so far and know howthis is handled in a schema definition.
A Multi-valued attribute is an attribute that can take on more than onevalue for a given entity instance. Diagrammatically this is indicated inan ERD by a double edged ellipse.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 30 / 31
Aside
Multi-valued Attributes
The mapping of an entity with a multi-valued attribute is achieved bythe following.
The first relation contains all the attributes of the entity type except themulti-valued attribute. The second relation contains the attributes thatform the primary key of the second relation. The first of these attributesis the primary key of the first relation, and hence it is a foreign key.
(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 31 / 31