cs 405g: introduction to database systems 18. normal forms and normalization

44
CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Upload: luke-clarke

Post on 13-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

CS 405G: Introduction to Database Systems

18. Normal Forms and Normalization

Page 2: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

How to connect MySQL server

Know your user id and password (provided by Paul Linton)

Server installed on mysql.cs.uky.edu

04/21/23 Chen Qian @ Univ of Kentucky

Page 3: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ Univ of Kentucky

Page 4: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ Univ of Kentucky

Page 5: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ Univ of Kentucky

Page 6: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky

Last class

Functional Dependency. Normalization Decomposition

6

Page 7: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Review

Functional dependencies X -> Y: X “determines” Y

If two rows agree on X, they must agree on Y

04/21/23 7

Attribute on the LHS is known as the determinant

• X is a determinant of Y

Page 8: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 8

Normalization

A normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency.

A normal form is a certification that tells whether a relation schema is in a particular state

Page 9: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

First Normal Form ( 1NF )

A relation is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.

04/21/23 Chen Qian @ University of Kentucky 9

Page 10: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 10

2nd Normal Form

An attribute A of a relation R is a nonprimary attribute if it is not part of any key in R, otherwise, A is a primary attribute.

R is in (general) 2nd normal form if every nonprimary attribute A in R is not partially functionally dependent on any key of R

Page 11: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Redundancy Example

If a key will result a partial dependency of a nonprimary attribute.

e.g. EID, PID-> Ename In this case, the attribute (Ename) should be separated

with its full dependency key (EID) to be a new table.

So, to check whether a table includes redundancy. Try every nonprimary attribute and check whether it fully depends on any key.

04/21/23 Chen Qian @ University of Kentucky 11

Page 12: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 12

Decomposition

Decomposition eliminates redundancy To get back to the original relation, use natural join.

EID PID Ename email Pname Hours

1234 10 John Smith [email protected] B2B platform 10

1123 9 Ben Liu [email protected] CRM 40

1234 9 John Smith [email protected] CRM 30

1023 10 Susan Sidhuk [email protected] B2B platform 40

Decomposition

EID Ename email

1234 John Smith [email protected]

1123 Ben Liu [email protected]

1023 Susan Sidhuk [email protected]

EID PID Pname Hours

1234 10 B2B platform 10

1123 9 CRM 40

1234 9 CRM 30

1023 10 B2B platform 40

Foreign key

Page 13: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 13

Decomposition

Decomposition may be applied recursively

EID PID Pname Hours

1234 10 B2B platform 10

1123 9 CRM 40

1234 9 CRM 30

1023 10 B2B platform 40

PID Pname

10 B2B platform

9 CRM

EID PID Hours

1234 10 10

1123 9 40

1234 9 30

1023 10 40

Page 14: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 14

Questions about decomposition

When to decompose

How to come up with a correct decomposition (i.e., lossless join decomposition)

Page 15: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Third normal form

• 3NF requires that there are no non-trivial functional dependencies of non-key attributes on something other than a superset of a candidate key.

• Recall: non-trivial FD means LHS has no intersection with RHS.

• In summary, all non-key attributes are mutually independent.

04/21/23 Chen Qian @ University of Kentucky 15

Page 16: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Ename and email has no FD

04/21/23 Chen Qian @ University of Kentucky 16

EID Ename email

1234 John Smith [email protected]

1123 Ben Liu [email protected]

1023 Susan Sidhuk [email protected]

Page 17: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Boyce-Codd normal form (BCNF)

• BCNF requires that there are no non-trivial functional dependencies of attributes on something other than a superset of a candidate key (called a superkey).

• All attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A).

04/21/23 Chen Qian @ University of Kentucky 17

Page 18: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

• A table is said to be in the BCNF if and only if it is in the 3NF and every non-trivial, left-irreducible functional dependency has a candidate key as its determinant.

• In more informal terms, a table is in BCNF if it is in 3NF and the only determinants are the candidate keys.

04/21/23 Chen Qian @ University of Kentucky 18

Page 19: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 19

Non-key FD’s

Consider a non-trivial FD X -> Y where X is not a super key Since X is not a super key, there are some attributes (say

Z) that are not functionally determined by X

That b is always associated with a is recorded by multiple rows: redundancy, update anomaly, deletion anomaly

X Y Z

a b c

a b d

Page 20: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 20

Dealing with Nonkey Dependency: BCNF

A relation R is in Boyce-Codd Normal Form if For every non-trivial FD X -> Y in R, X is a super key That is, all FDs follow from “key -> other attributes”

When to decompose As long as some relation is not in BCNF

How to come up with a correct decomposition Always decompose on a BCNF violation (details next) Then it is guaranteed to be a lossless join decomposition

Page 21: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 21

BCNF decomposition algorithm

Find a BCNF violation That is, a non-trivial FD X -> Y in R where X is not a

super key of R Decompose R into R1 and R2, where

R1 has attributes X Y

R2 has attributes X Z, where Z contains all attributes of R that are in neither X nor Y (i.e. Z = attr(R) – X – Y)

Repeat until all relations are in BCNF

Page 22: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 22

BCNF decomposition example

WorkOn (EID, Ename, email, PID, hours)BCNF violation: EID -> Ename, email

Student (EID, Ename, email)Grade (EID, PID, hours)

BCNF BCNF

Page 23: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 23

Another example

WorkOn (EID, Ename, email, PID, hours)BCNF violation: email -> EID

StudentID (email, EID)

StudentGrade’ (email, Ename, PID, hours)BCNF

BCNF violation: email -> Ename

StudentName (email, Ename)Grade (email, PID, hours)BCNF

BCNF

Page 24: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 24

Exercise

Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate) Property_id#-> County_name, Lot#, Area, Price,

Tax_rate County_name, Lot# -> Property_id#, Area, Price,

Tax_rate County_name -> Tax_rate area -> Price

Page 25: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 25

Exercise

Property(Property_id#, County_name, Lot#, Area, Price, Tax_rate)

BCNF violation: County_name -> Tax_rate

LOTS1 (County_name, Tax_rate )

LOTS2 (Property_id#, County_name, Lot#, Area, Price)BCNF violation: Area -> Price

LOTS2A (Area, Price)

LOTS2B (Property_id#, County_name, Lot#, Area)

BCNF

BCNF

BCNF

Page 26: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 26

Why is BCNF decomposition lossless

Given non-trivial X -> Y in R where X is not a super key of R, need to prove:

Anything we project always comes back in the join:R πXY ( R ) πXZ ( R ) Sure; and it doesn’t depend on the FD

Anything that comes back in the join must be in the original relation:R πXY ( R ) πXZ ( R ) Proof makes use of the fact that X -> Y

Page 27: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

04/21/23 Chen Qian @ University of Kentucky 27

Recap

Functional dependencies: a generalization of the key concept

Partial dependencies: a source of redundancy Use 2nd Normal form to remove partial dependency

Non-key functional dependencies: a source of redundancy

BCNF decomposition: a method for removing ALL functional dependency related redundancies Plus, BNCF decomposition is a lossless join

decomposition

Page 28: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Normalization

04/21/23 28

There is a sequence to normal forms: 1NF is considered the weakest, 2NF is stronger than 1NF, 3NF is stronger than 2NF, and BCNF is considered the strongest

Also, any relation that is in BCNF, is in 3NF; any relation in 3NF is in 2NF; and any relation in 2NF is in 1NF.

Page 29: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Normalization

04/21/23 29

BCNF

3NF

2NF

1NF a relation in BCNF, is also in 3NF

a relation in 3NF is also in 2NF

a relation in 2NF is also in 1NF

Page 30: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

First Normal Form

04/21/23 30

The following is not in 1NF

EmpNum EmpPhone EmpDegrees123 233-9876333 233-1231 BA, BSc, PhD679 233-1231 BSc, MSc

EmpDegrees is a multi-valued field:

employee 679 has two degrees: BSc and MSc

employee 333 has three degrees: BA, BSc, PhD

Page 31: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

First Normal Form

04/21/23 31

EmpNum EmpDegree

333 BA

333 BSc

333 PhD

679 BSc

MSc679

EmpNum EmpPhone

123 233-9876

333 233-1231

679 233-1231

An outer join between Employee and EmployeeDegree will produce the information we saw before

EmployeeEmployeeDegree

Page 32: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Second Normal Form

04/21/23 32

LineNum ProdNum QtyInvNum

InvNum, LineNum ProdNum

InvNum, ProdNum LineNum

Since there is a determinant that is not a candidate key, InvLine is not BCNF

InvLine is not 2NF since there is a partial dependency of InvDate on InvNum

Qty

InvDate

InvDateInvNum

There are two candidate keys.

Qty is the only non-key attribute, and it is dependent on InvNum

InvLine is only in 1NF

Consider this InvLine table (in 1NF):

Page 33: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Second Normal Form

04/21/23 33

LineNum ProdNum QtyInvNum InvDate

InvLine

The above relation has redundancies: the invoice date is repeated on each invoice line.

We can improve the database by decomposing the relation into two relations:

LineNum ProdNum QtyInvNum

InvDateInvNum

Page 34: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Third Normal Form

04/21/23 34

EmpNum EmpName DeptNum DeptName

EmpName, DeptNum, and DeptName are non-key attributes.

DeptNum determines DeptName, a non-key attribute, and DeptNum is not a candidate key.

Consider this Employee relation

Is the relation in 3NF? … no

Is the relation in 2NF? … yes

Is the relation in BCNF? … no

Candidate keys are? …

Page 35: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Third Normal Form

04/21/23 35

EmpNum EmpName DeptNum DeptName

We correct the situation by decomposing the original relation into two 3NF relations. Note the decomposition is lossless.

EmpNum EmpName DeptNum DeptNameDeptNum

Verify these two relations are in 3NF.

Are they in BCNF?

Page 36: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Boyce-Codd Normal Form

04/21/23 36

Boyce-Codd Normal Form

BCNF is defined very simply:

a relation is in BCNF if it is in 1NF and if every determinant is a candidate key.

Page 37: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

student_no course_no instr_no

Instructor teaches one course only.

Student takes a course and has one instructor.

In 3NF, but not in BCNF:

{student_no, course_no} instr_noinstr_no course_no

since we have instr_no course-no, but instr_no is not aCandidate key.

04/21/23 37

Page 38: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

course_no instr_no

student_no instr_no

student_no course_no instr_no

BCNF

{student_no, instr_no} student_no{student_no, instr_no} instr_noinstr_no course_no

04/21/23 38

Page 39: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Boyce-Codd Normal Form

04/21/23 39

LineNum ProdNum QtyInvNum

InvNum, LineNum ProdNum

InvNum, ProdNum LineNum

There are two candidate keys.

Since every determinant is a candidate key, the relation is in BCNF

This relation is about Invoice lines only.

Qty{InvNum, LineNum} and {InvNum, ProdNum} are the two candidate keys.

Page 40: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

inv_no line_no prod_no prod_desc qty

04/21/23 40

Page 41: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

2NF, but not in 3NF, nor in BCNF:

inv_no line_no prod_no prod_desc qty

since prod_no is not a candidate key and we have:

prod_no prod_desc.

04/21/23 41

Page 42: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

EmployeeDept

ename ssn bdate address dnumber dname

04/21/23 42

Page 43: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Summary

Philosophy behind BCNF, 4NF:Data should depend on the key, the whole key, and nothing but the key!

Philosophy behind 3NF: … But not at the expense of more expensive constraint enforcement!

04/21/23 43

Page 44: CS 405G: Introduction to Database Systems 18. Normal Forms and Normalization

Next Class

Quiz again

04/21/23 Chen Qian @ University of Kentucky 44