databases -normalization iii · 2012-05-16 · this lecture describes 3rd normal form. (n...

31
Databases -Normalization III (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Upload: others

Post on 24-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Databases -Normalization III

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31

Page 2: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

This lecture

This lecture describes 3rd normal form.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31

Page 3: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

BCNF - recap

The BCNF decomposition of a relation is derived by a recursivealgorithm. A lossless-join decomposition is derived which may not bedependency preserving.

The decomposition is too restrictive. To make the decomposition in theprevious example dependency preserving we can cover the FDJP → C by adding its attributes as a relation

R1 = CSJDQV R2 = SDP R3 = JPC

We have added the required FD involving key attributes that wereprohibited by BCNF.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 3 / 31

Page 4: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

3NF

A relational schema is in 3NF if for every functional dependency X → A(where X is a subset of the attributes and A is a single attribute) either

A ∈ X (that is, X → A is a trivial FD), orX is a superkey, orA is part of some key for R.

Thus the definition of 3NF permits a few additional FDs involving keyattributes that are prohibited by BCNF.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 4 / 31

Page 5: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

The contracts example

The contracts example was not in BCNF because of the FD

SD → P

However as P is part of a key (JP), this FD does not violate theconditions for 3NF. Therefore neither of the two given FDs violate theconditions for 3NF.

Checking all the other (implied) FDs shows that contracts is already in3NF and thus need not be decomposed further.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 5 / 31

Page 6: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

Properties of 3NF

The most important property of 3NF is the result that there is always alossless-join and dependency-preserving decomposition of anyrelational schema into 3NF.

While not conceptually difficult, the algorithm to decompose a relationschema into 3NF is quite fiddly and much more complicated than thesimple algorithm for decomposition into BCNF.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 6 / 31

Page 7: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

3NF Synthesis

The BCNF algorithm is from the perspective of deriving a lossless-joindecomposition, and dependency preservation was addressed byadding extra relation schemas.

An alternative approach involves synthesis that takes the attributes ofthe original relation R and the minimal cover F of the FDs and adds arelation schema XA for each FD X → A in F.

The resulting collection of relations is 3NF and preserves all FDs. If itis not lossless-join, it is made so by adding a relation that containsthose attributes that appear in some key.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 7 / 31

Page 8: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Normal Forms

The two approaches

The decomposition algorithm derives the relations while maintainingthe keys, thus emphasizing the lossless-join property, and deals withdependency preservation by adding the necessary relations.

The synthesis algorithm build the relations from the set of FDs,emphasizing the dependencies, and then deals with lossless-joinissues after the fact by adding the necessary relation.

The latter is an algorithm that has polynomial complexity.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 8 / 31

Page 9: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

The Synthesis Algorithm

Given a relation R and a set of functional dependencies X → Y whereX and Y are subsets of the attributes of R.

1 Compute the closures of the FDs.2 Use À to find all the candidate keys of the relation R.3 Find the minimal cover of the relation R.4 Use  to produce Ri decompositions in 3NF.5 If no Ri is a super key of R, add a relation containing the key.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 9 / 31

Page 10: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

FD closures

1 Compute the closures of the FDs.

For each X → Y, derive (X)+ the set of attributes of R covered byX → Y.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 10 / 31

Page 11: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

Candidate keys

2 Use À to find all the candidate keys of the relation R.

If any (X)+ covers all the attributes of R then X is a key. It is possibly asuper key with redundant attributes contained within X (these will beobvious).

Otherwise find the (X)+ that maximally covers R and add to Xattributes of R missing from the closure.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 11 / 31

Page 12: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

Minimal cover3 Find the minimal cover of the relation R.

ALGORITHM:Step 1 Decompose all the FDs to the form X → A, where A is asingle attribute.Step 2 Eliminate redundant attributes from the LHS.

If XB → A, where B is a single attribute andX → A is entailed within the set, thenB was unnecessary.

If LHS has more than one attribute, compute closure of each setthat leaves out one attribute. If it includes the right side, removethe extra attribute in the functional dependency.Step 3 Delete the redundant FDs.For each dependency, compute the closure of the determinantattributes against all the other dependencies except the one you’retesting. If you find the attribute on the right side in the closure, youcan leave that dependency out.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 12 / 31

Page 13: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

3NF relations

4 Use  to produce Ri decompositions in 3NF.

ALGORITHM:

Step 1 Combine all FDs with the same LHS

For each X such that X → A, X → B ...⇒ X → AB....

Step 2 Form a relation Ri of all attributes in each FD.

For each X → ABC, form⇒ {X,A,B,C}.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 13 / 31

Page 14: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

3NF relations

5 If no Ri is a super key of R, add a relation containing the key.

If UVW is a key for R, and no relation Ri formed in the 3NFdecomposition contains a super key of R, then add {U,V,W} as arelation.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 14 / 31

Page 15: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

Exercise

Consider R = {A,B,C,D,E,G} and the following FDs

1 AB → C2 C → A3 BC → D4 ACD → B5 D → EG6 BE → C7 CG → BD8 CE → AG

(AB)+ = ABCDEG

(C)+ = AC

(BC)+ = ABCDEG

(ACD)+ = ABCDEG

(D)+ = DEG

(BE)+ = ABCDEG

(CG)+ = ABCDEG

(CE)+ = ABCDEG

AB is a key

BC is a keyACD is a super key

BE is a keyCG is a keyCE is a key

Applying Armstrong’s Axioms gives BD and CD are also keys.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 15 / 31

Page 16: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

Generate minimal cover - Working

Original FDs

1 AB → C2 C → A3 BC → D4 ACD → B5 D → EG6 BE → C7 CG → BD8 CE → AG

single RHS

1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D10 CE → A11 CE → G

redundant LHS

1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D

10 CE→ A11 CE → G

minimal FDs

1

2

3

4

5

6

7

8

9

10

11

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 16 / 31

Page 17: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF synthesis

Generate minimal cover

single RHS

1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D10 CE → A11 CE → G

redundant LHS

1 AB → C2 C → A3 BC → D4 ACD → B5 D → E6 D → G7 BE → C8 CG → B9 CG → D

10 CE→ A11 CE → G

duplications

delete Ç

= È + Ã

delete É = Á

minimal FDs

1 AB → C2 C → A3 BC → D4 CD → B5 D → E6 D → G7 BE → C8

9 CG → D10

11 CE → G

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 17 / 31

Page 18: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

The synthesis process

Combine same LHS

1 AB → C2 C → A3 BC → D4 CD → B5 D → E, D → G6 BE → C7 CG → D8 CE → G

Form relations

1 {A,B,C}2 {C,A}3 {B,C,D}4 {C,D,B}5 {D,E,G}6 {B,E,C}7 {C,G,D}8 {C,E,G}

The 3NF decomposition

1 {A,B,C}2 Á is a proper subset of À

3 {B,C,D}4 Ã is a subset of Â

5 {D,E,G}6 {B,E,C}7 {C,G,D}8 {C,E,G}

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 18 / 31

Page 19: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Final 3NF decomposition

{A,B,C}{B,C,D}{D,E,G}{B,E,C}{C,G,D}{C,E,G}.

Super key is contained in at least one relation.

We have looked at how to confirm the decomposition is a lossless-joinwhen an original relation R is decomposed into two R1 and R2.

Simply check the shared attribute(s) of R1 and R2 to see if it is asuperkey for either R1 or R2.

However when decomposed into more than two relations, we will needto use the algorithm described in the next three slides to check forlossless decomposition.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 19 / 31

Page 20: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Definition - Lossless Join Property

Definition: A decomposition D = {R1,R2, ...,Rm} of R has the lossless(nonadditive) join property with respect to the set of dependencies Fon R if, for every relation state Ri of R that satisfies F, the natural join ofall the relations in D:

R1 ./ R2... ./ Rm = R

Note: The word loss in lossless refers to loss of information, not to lossof tuples. In fact, for “loss of information” a better term is “addition ofspurious information”.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 20 / 31

Page 21: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Algorithm for Testing Lossless Join Property

Step 1 Create an initial matrix with one row for each decomposedrelation Ri, and one column for each attribute Aj in the originalrelation R.Step 2 For the ith row, if the jth attribute is in Ri, cell (i, j) shouldcontain value aj, otherwise, bij.

A B C D E GR1{A,B,C} a1 a2 a3 b14 b15 b16R2{B,C,D} b21 a2 a3 a4 b25 b26R3{D,E,G} b31 b32 b33 a4 a5 a6R4{B,E,C} b41 a2 a3 b44 a5 b46R5{C,G,D} b51 b52 a3 a4 b55 a6R6{C,E,G} b61 b62 a3 b64 a5 a6

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 21 / 31

Page 22: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Algorithm for Testing Lossless Join Property (contd.)Step 3 Apply functional dependencies to replace the bijs with ajs.Step 4 If there is a row contains all ajs, the decomposition islossless.

A B C D E GR1{A,B,C} a1 a2 a3R2{B,C,D} a2 a3 a4R3{D,E,G} a4 a5 a6R4{B,E,C} a2 a3 a5R5{C,G,D} a3 a4 a6R6{C,E,G} a3 a5 a6

1 AB → C2 C → A3 BC → D4 CD → B5 D → E6 D → G7 BE → C8 CG → D9 CE → G

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 22 / 31

Page 23: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Manual normalization

Data typically employed in a business is usually presented as below.

However to be useful in a database the data needs to be normalized.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 23 / 31

Page 24: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

First normal form – 1NF

First normal form (1NF) requires attribute values be atomic — that is,individual values rather than lists or sets and that a primary key isdefined.

This table is in 1NF. The primary key is (Order ID, Product ID).

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 24 / 31

Page 25: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Second normal form – 2NF

A relation in first normal form (1NF) has the above functionaldependencies.

A relation is in Second normal for (2NF) if it is in 1NF and contains nopartial dependencies.

A partial functional dependency exists when a nonkey attribute isfunctionally dependent on part of, but not all of, the primary key.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 25 / 31

Page 26: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Second normal form – 2NF

2NF is achieved by creating three (3) relations with the followingattributes and keys.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 26 / 31

Page 27: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Third normal form – 3NF

A relation in second normal form (2NF) still has the above functionaldependencies.

A relation is in Third normal form (3NF) if it is in 2NF and contains notransitive dependencies.

A transitive functional dependency in a relation is a functionaldependency between two or more nonkey attributes.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 27 / 31

Page 28: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Third normal form – 3NF

3NF is achieved by creating an additional relation with the followingattributes and keys.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 28 / 31

Page 29: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

3NF Synthesis

Practical Use of Normalization

In practice, the DBA should be guided by the theory of normalization,but not slavishly follow it.

When the ER diagram is converted into tables, any functionaldependencies should be identified and a decision made as to whetherthe tables should be normalized or not.

Occasionally it will even be appropriate to de-normalize data if themost frequently occurring queries involve expensive joins of thenormalized tables.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 29 / 31

Page 30: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Aside

Multi-valued Attributes

A Single-valued attribute of an entity instance has just one value. Thishas been the typical scenario we have discussed so far and know howthis is handled in a schema definition.

A Multi-valued attribute is an attribute that can take on more than onevalue for a given entity instance. Diagrammatically this is indicated inan ERD by a double edged ellipse.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 30 / 31

Page 31: Databases -Normalization III · 2012-05-16 · This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 2 / 31. Normal Forms ... (N

Aside

Multi-valued Attributes

The mapping of an entity with a multi-valued attribute is achieved bythe following.

The first relation contains all the attributes of the entity type except themulti-valued attribute. The second relation contains the attributes thatform the primary key of the second relation. The first of these attributesis the primary key of the first relation, and hence it is a foreign key.

(N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 31 / 31