ecomposition chema normalizationpages.cs.wisc.edu/~paris/cs564-s18/lectures/lecture-07.pdf ·...

37
DECOMPOSITION & SCHEMA NORMALIZATION CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan

Upload: phungphuc

Post on 20-Mar-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

DECOMPOSITION &SCHEMA NORMALIZATION

CS564- Spring2018

ACKs:DanSuciu,Jignesh Patel,AnHai Doan

WHAT IS THIS LECTURE ABOUT?

• Badschemasleadtoredundancy• To“correct”badschemas:decompose relations– lossless-join– dependencypreserving

• Desirednormalforms– BCNF– 3NF

2CS564[Spring2018]- ParisKoutris

DB DESIGN THEORY

• Helpsusidentifythe“bad”schemasandimprovethem1. expressconstraintsonthedata:functional

dependencies(FDs)2. usetheFDstodecomposetherelations

• Theprocess,callednormalization,obtainsaschemaina“normalform”thatguaranteescertainproperties– examplesofnormalforms:BCNF,3NF,…

3CS564[Spring2018]- ParisKoutris

SCHEMA DECOMPOSITION

4CS564[Spring2018]- ParisKoutris

WHAT IS A DECOMPOSITION?

WedecomposearelationR(A1,…,An)bycreating• R1(B1,..,Bm)• R2(C1,…,Cl)• where{𝐵#,… ,𝐵&} ∪ {𝐶#,… , 𝐶+} = {𝐴#,…𝐴.}

• TheinstanceofR1 istheprojectionofR ontoB1,..,Bm• TheinstanceofR2 istheprojectionofR ontoC1,..,Cl

5CS564[Spring2018]- ParisKoutris

EXAMPLE:DECOMPOSITION

SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20

SSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221

SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221

6CS564[Spring2018]- ParisKoutris

DECOMPOSITION DESIDERATA

Whatshouldagood decompositionachieve?

1. minimizeredundancy2. avoidinformationloss(lossless-join)3. preservetheFDs(dependencypreserving)4. ensuregoodqueryperformance

7CS564[Spring2018]- ParisKoutris

EXAMPLE:INFORMATION LOSS

8CS564[Spring2018]- ParisKoutris

name age phoneNumberParis 24 608-374-8422John 24 608-321-1163Arun 20 206-473-8221

Decomposeinto:R1(name,age)R2(age,phoneNumber)

name ageParis 24John 24Arun 20

age phoneNumber24 608-374-842224 608-321-116320 206-473-8221

Wecan’tfigureoutwhichphoneNumbercorrespondstowhichperson!

LOSSLESS-JOIN DECOMPOSITION

9CS564[Spring2018]- ParisKoutris

R(A,B,C)

R1(A,B) R2(B,C)

decompose(projection)

R’(A,B,C)

recover(naturaljoin)

Aschemadecompositionislossless-join ifforanyinitialinstanceR,R =R’

Anaturaljoinisajoinonthesame attributenames

A LOSSLESS-JOIN CRITERION

Startingwith:• arelationR(A)+setFofFDs• adecompositionofR intoR1(A1)andR2(A2)

wesaythatadecompositionislossless-join ifandonlyifatleastoneofthefollowingFDsisinF+ (theclosureofF):1. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟏2. 𝑨𝟏 ∩ 𝑨𝟐 ⟶ 𝑨𝟐

10CS564[Spring2018]- ParisKoutris

EXAMPLE

• relationR(A,B,C,D)• FD𝐴 ⟶ 𝐵, 𝐶

Lossless-join• decompositionintoR1(A,B,C)andR2(A,D)• 𝐴, 𝐵, 𝐶 ∩ 𝐴, 𝐷 = 𝐴• ForR1wehaveindeed𝐴 ⟶ 𝐵, 𝐶

Not lossless-join• decompositionintoR1(A,B,C)andR2(D)

11CS564[Spring2018]- ParisKoutris

DEPENDENCY PRESERVING

GivenR andasetofFDsF,wedecomposeR intoR1andR2. Suppose:– R1 hasasetofFDsF1– R2 hasasetofFDsF2– F1 andF2 arecomputedfromF

Adecompositionisdependencypreserving ifbyenforcingF1 overR1 andF2 overR2,wecanenforceFoverR

12CS564[Spring2018]- ParisKoutris

GOOD EXAMPLE

Person(SSN,name,age,canDrink)• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒• 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘

decomposesinto• R1(SSN,name,age)– 𝑆𝑆𝑁⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• R2(age,canDrink)– 𝑎𝑔𝑒 ⟶ 𝑐𝑎𝑛𝐷𝑟𝑖𝑛𝑘

13CS564[Spring2018]- ParisKoutris

BAD EXAMPLE

R(A,B,C)• 𝐴 ⟶ 𝐵• 𝐵, 𝐶 ⟶ 𝐴

Decomposesinto:• R1(A,B)– 𝐴⟶ 𝐵

• R2(A,C)– noFDshere!!

14CS564[Spring2018]- ParisKoutris

A Ba1 ba2 b

R1A Ca1 ca2 c

R2

recover

A B Ca1 b ca2 b c

Therecoveredtableviolates𝐵, 𝐶 ⟶ 𝐴

NORMAL FORMS

15CS564[Spring2018]- ParisKoutris

Anormalform representsa“good”schemadesign:

• 1NF(flattables/atomicvalues)• 2NF• 3NF• BCNF• 4NF• …

morerestrictive

BCNFDECOMPOSITION

16CS564[Spring2018]- ParisKoutris

BOYCE-CODD NORMAL FORM (BCNF)

Equivalentdefinition:foreveryattributesetX• either𝑋D = 𝑋• or𝑋D = 𝑎𝑙𝑙𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠

17

ArelationR isinBCNF ifwhenever𝑋 ⟶ 𝐵 isanon-trivialFD,thenXisasuperkey inR

CS564[Spring2018]- ParisKoutris

BCNFEXAMPLE 1

18

SSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• key={𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}• 𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 isa“bad”FD• Theaboverelationisnot inBCNF!

CS564[Spring2018]- ParisKoutris

BCNFEXAMPLE 2

19

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

• key={𝑆𝑆𝑁}• TheaboverelationisinBCNF!

SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20

CS564[Spring2018]- ParisKoutris

BCNFEXAMPLE 3

20

• key={𝑆𝑆𝑁, 𝑝ℎ𝑜𝑛𝑒𝑁𝑢𝑚𝑏𝑒𝑟}• TheaboverelationisinBCNF!• IsitpossiblethatabinaryrelationisnotinBCNF?

SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221

CS564[Spring2018]- ParisKoutris

BCNF DECOMPOSITION

• FindanFDthatviolatestheBCNFcondition𝐴#, 𝐴M,… , 𝐴. ⟶𝐵#, 𝐵M,…,𝐵&

• DecomposeR toR1 andR2:

• ContinueuntilnoBCNFviolationsareleft21

A’sB’s remainingattributes

R1 R2

CS564[Spring2018]- ParisKoutris

EXAMPLESSN name age phoneNumber934729837 Paris 24 608-374-8422934729837 Paris 24 603-534-8399123123645 John 30 608-321-1163384475687 Arun 20 206-473-8221

22

• TheFD𝑆𝑆𝑁⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒 violatesBCNF• SplitintotworelationsR1,R2 asfollows:

SSNname

phoneNumber

R1 R2

age

CS564[Spring2018]- ParisKoutris

EXAMPLE CONT’D

SSN name age934729837 Paris 24123123645 John 30384475687 Arun 20

SSN phoneNumber934729837 608-374-8422934729837 603-534-8399123123645 608-321-1163384475687 206-473-8221

23

SSNname

phoneNumber

R1 R2

age

𝑆𝑆𝑁 ⟶ 𝑛𝑎𝑚𝑒, 𝑎𝑔𝑒

CS564[Spring2018]- ParisKoutris

BCNF DECOMPOSITION PROPERTIES

TheBCNFdecomposition:– removescertaintypesofredundancy– islossless-join– isnotalwaysdependencypreserving

24CS564[Spring2018]- ParisKoutris

BCNF IS LOSSLESS-JOIN

Example:R(A,B,C)with𝐴 ⟶ 𝐵 decomposesinto:R1(A,B)andR2(A,C)

• TheBCNFdecompositionalwayssatisfiesthelossless-joincriterion!

25CS564[Spring2018]- ParisKoutris

BCNF IS NOT DEPENDENCY PRESERVING

26CS564[Spring2018]- ParisKoutris

R(A,B,C)• 𝐴 ⟶ 𝐵• 𝐵, 𝐶 ⟶ 𝐴

TheBCNFdecompositionis:• R1(A,B)withFD𝐴 ⟶ 𝐵• R2(A,C)withnoFDs

TheremaynotexistanyBCNFdecompositionthatisFDpreserving!

BCNFEXAMPLE (1)

Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

Whatisthecandidatekey?• (author,booktitle)istheonlyone!

IsisinBCNF?• No,becausethelefthandsideofboth(nottrivial)FDsisnotasuperkey!

27CS564[Spring2018]- ParisKoutris

BCNFEXAMPLE (2)

Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

SplittingBooks usingtheFD𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:• Author(author,gender)FD:𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 inBCNF!

• Books2(authos,booktitle,genre,price)FD: 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒 ⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 notinBCNF!

28CS564[Spring2018]- ParisKoutris

BCNFEXAMPLE (3)

Books(author,gender,booktitle,genre,price)• 𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟• 𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒

SplittingBooks usingtheFD𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟:• Author(author,gender)FD:𝑎𝑢𝑡ℎ𝑜𝑟 ⟶ 𝑔𝑒𝑛𝑑𝑒𝑟 inBCNF!

• Splitting Books2(author,booktitle,genre,price):– BookInfo (booktitle,genre,price)FD:𝑏𝑜𝑜𝑘𝑡𝑖𝑡𝑙𝑒⟶ 𝑔𝑒𝑛𝑟𝑒, 𝑝𝑟𝑖𝑐𝑒 inBCNF!

– BookAuthor (author,booktitle)inBCNF!

29CS564[Spring2018]- ParisKoutris

THIRD NORMAL FORM (3NF)

30CS564[Spring2018]- ParisKoutris

3NFDEFINITION

31

ArelationR isin3NF ifwhenever𝑋 ⟶ 𝐴, oneofthefollowingistrue:

• 𝐴 ∈ 𝑋 (trivialFD)

• X isasuperkey

• A ispartofsomekeyofR (primeattribute)

CS564[Spring2018]- ParisKoutris

BCNFimplies3NF!!

3NFCONT’D

• Example:R(A,B,C)with𝐴, 𝐵 ⟶ 𝐶 and𝐶 ⟶ 𝐴– isin3NF.Why?– isnotinBCNF.Why?

• CompromiseusedwhenBCNFnotachievable:aimforBCNFandsettlefor3NF

• Lossless-joinanddependencypreservingdecompositionintoacollectionof3NFrelationsisalwayspossible!

32CS564[Spring2018]- ParisKoutris

3NFALGORITHM

1. ApplythealgorithmforBCNFdecompositionuntilallrelationsarein3NF(wecanstopearlierthanBCNF)

2. ComputeaminimalbasisF’ ofF3. Foreachnon-preservedFD𝑋 ⟶ 𝐴 inF’,addanew

relationR(X,A)

33CS564[Spring2018]- ParisKoutris

3NFEXAMPLE (1)

StartwithrelationR(A,B,C,D)withFDs:• 𝐴⟶ 𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵

Step1:findaBCNFdecomposition• R1 (B,C)• R2 (A,B,D)

34CS564[Spring2018]- ParisKoutris

3NFEXAMPLE (2)

StartwithrelationR(A,B,C,D)withFDs:• 𝐴 ⟶𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵

Step2:computeaminimalbasisoftheoriginalsetofFDs:• 𝐴⟶ 𝐷• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴• 𝐷 ⟶ 𝐵

35CS564[Spring2018]- ParisKoutris

3NFEXAMPLE (3)

StartwithrelationR(A,B,C,D)withFDs:• 𝐴⟶ 𝐷• 𝐴,𝐵 ⟶ 𝐶• 𝐴,𝐷 ⟶ 𝐶• 𝐵 ⟶ 𝐶• 𝐷 ⟶ 𝐴,𝐵

Step3:addanewrelationforanyFDinthebasisthatisnotsatisfied:• allthedependenciesinF’aresatisfied!• theresultingdecompositionR1,R2 isalsoBCNF!

36CS564[Spring2018]- ParisKoutris

IS NORMALIZATION ALWAYS GOOD?

• Example:supposeAandBarealwaysusedtogether,butnormalizationsaystheyshouldbeindifferenttables– decompositionmightproduceunacceptableperformanceloss

• Example:datawarehouses– hugehistoricalDBs,rarelyupdatedaftercreation– joinsexpensiveorimpractical

37CS564[Spring2018]- ParisKoutris