1 schema refinement and normal forms chapter 19 raghu ramakrishnan and j. gehrke (second text book)...

37
Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

Post on 21-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

1

Schema Refinement and Normal Forms

Chapter 19Raghu Ramakrishnan and J. Gehrke

(second text book)In Course Pick-up box tomorrow

Page 2: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

2

ReviewAvoiding the expense of global integrity constraints

e.g.: lno bname preserved by:

CREATE ASSERTION lno-bname CHECK ( NOT EXIST (SELECT * FROM loan-info l1, loan-info l2 WHERE l1.lno = l2.lno AND l1.bname <> l2.bname))

Expensive, requires a join for every insertion

Reducing the expense:1. Determine FD set for loan-info, F

2. Find “minimal set”, G, s.t. F= G

Page 3: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

3

Functional Dependencies (FDs)

A functional dependency X Y holds over relation R if, for every allowable instance r of R: t1 r, t2 r, (t1) = (t2) implies (t1) =

(t2) i.e., given two tuples in r, if the X values agree, then the Y

values must also agree. (X and Y are sets of attributes.) An FD is a statement about all allowable relations.

Must be identified based on semantics of application. Given some allowable instance r1 of R, we can check if it

violates some FD f, but we cannot tell if f holds over R! K is a candidate key for R means that K R

However, K R does not require K to be minimal!

X X Y Y

Page 4: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

4

Reasoning About FDs

Given some FDs, we can usually infer additional FDs: ssn did, did lot implies ssn lot

An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold. = closure of F is the set of all FDs that are implied by F.

Armstrong’s Axioms (X, Y, Z are sets of attributes): Reflexivity: If X Y, then Y X Augmentation: If X Y, then XZ YZ for any Z Transitivity: If X Y and Y Z, then X Z

These are sound and complete inference rules for FDs!

F

Page 5: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

5

Armstrong’s AxiomsA. Fundamental Rules (W, X, Y, Z: sets of attributes) 1. Reflexivity If Y X then X Y 2. Augmentation If X Y then WX WY 3. Transitivity If X Y and Y Z then XZB. Additional rules (can be proved from A) 4. UNION: If X Y and X Z then X YZ 5. Decomposition: If X YZ then X Y, X Z 6. Pseudotransitivity: If X Y and WY Z then

WX Z

Page 6: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

6

Functional DependenciesA B C D

a 1 U a 1 V a 5 W b 3 W b 3 W

A B C “ AB determines C”

two tuples with the same values for A and B will also have the same value for C

Page 7: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

7

Functional Dependencies (Cont.)

K is a superkey for relation schema R if and only if K R K is a candidate key for R if and only if

K R, and for no K, R

Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:

bor_loan = (customer_id, loan_number, amount ).

We expect this functional dependency to hold:

loan_number amount

but would not expect the following to hold:

amount customer_name

Page 8: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

8

Use of Functional Dependencies

We use functional dependencies to: test relations to see if they are legal under a given set of

functional dependencies. • If a relation r is legal under a set F of functional

dependencies, we say that r satisfies F. specify constraints on the set of legal relations

• We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.

Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of loan may, by chance,

satisfy amount customer_name.

Page 9: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

9

Functional Dependencies (Cont.)

A functional dependency is trivial if it is satisfied by all instances of a relation Example:

• customer_name, loan_number customer_name

• customer_name customer_name In general, is trivial if

Page 10: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

10

Another use of FD’s: Schema DesignExample:

bname bcity assets cname lno amt

Downtown Bkln 9M Jones L-17 1000 Downtown Bkln 9M Johnson L-23 2000 Mianus Horse 1.7M Jones L-93 500 Downtown Bkln 9M Hayes L-17 1000

R =

R: “Universal relation” tuple meaning: Jones has a loan (L-17) for $1000 taken out at the Downtown branch in Bkln which has assets of $9M

Design: + : fast queries (no need for joins!) - : redudancy: update anomalies examples? deletion anomalies

Page 11: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

11

The Evils of Redundancy

A first-cut design Universal relation (or few relations) Redundantly store some columns in multiple tables

Redundancy is at the root of several problems associated with relational schemas: redundant storage, performance (of updates,…)

suffers insert/delete/update anomalies

Page 12: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

12

Example: Constraints on Entity Set

Consider relation obtained from Hourly_Emps: Hourly_Emps (ssn, name, lot, rating, hrly_wages,

hrs_worked) Notation: We will denote this relation schema

by listing the attributes: SNLRWH This is really the set of attributes {S,N,L,R,W,H}. Sometimes, we will refer to all attributes of a

relation by using the relation name. (e.g., Hourly_Emps for SNLRWH)

Page 13: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

13

Example (Contd.)

S N L R W H

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

No Fuctional Dependencies Any instance is legal here (no constraints) No redundany

Hourly_Emps relation

Page 14: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

14

Example: Constraints on Entity Set

Add Some FDs on Hourly_Emps: ssn is the key: S SNLRWH

• Values of SSN have to be unique in the relation rating determines hrly_wages: R W

• For two rows that have same value of R, the rows also have same value for W

• Above relation stores R and W values “redundantly”

S N L R W H

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

Hourly_Emps relation

Page 15: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

15

Example (Contd.) Problems due to R W :

Update anomaly: Can we change W in just the 1st tuple of SNLRWH?

Insertion anomaly: What if we want to insert an employee and don’t know the hourly wage for his rating?

Deletion anomaly: If we delete all employees with rating 5, we lose the information about the wage for rating 5!

S N L R W H

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

Hourly_Emps

Will 2 smaller tables be better?

Page 16: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

16

Example (Contd.)

S N L R H

123-22-3666 Attishoo 48 8 40

231-31-5368 Smiley 22 8 30

131-24-3650 Smethurst 35 5 30

434-26-3751 Guldu 35 5 32

612-67-4134 Madayan 35 8 40

R W

8 10

5 7

Hourly_Emps2 Wages

Will 2 smaller tables be better? Yes

But: what criteria should the new ‘smaller’ tables satisfy so that you can stop decomposition? What is a good design?

---- Normal Forms

Solution to avoid redundancy: Decomposition to smaller tables

Page 17: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

17

Goals of Decomposition1. Lossless Joins Want to be able to reconstruct big (e.g. universal) relation by joining smaller ones (using natural joins) (i.e. R1 R2 = R)

2. Dependency preservation Want to minimize the cost of global integrity constraints based on FD’s ( i.e. avoid big joins in assertions)

3. Redundancy Avoidance/Minimization Avoid/minimize unnecessary data duplication (the motivation for decomposition)

Why important? LJ : information loss DP: efficiency (time) RA: efficiency (space), update anomalies

Normal Forms

Page 18: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

18

Decomposition1. Decomposing the schema R = ( bname, bcity, assets, cname, lno, amt)

R1 = (bname, bcity, assets, cname) R1 = (cname, lno, amt)

2. Decomposing the instance

R = R1 U R2

bname bcity assets cname lno amt

Downtown Bkln 9M Jones L-17 1000 Downtown Bkln 9M Johnson L-23 2000 Mianus Horse 1.7M Jones L-93 500 Downtown Bkln 9M Hayes L-17 1000

bname bcity assets cname

Downtown Bkln 9M Jones Downtown Bkln 9M Johnson Mianus Horse 1.7M Jones Downtown Bkln 9M Hayes

cname lno amt

Jones L-17 1000 Johnson L-23 2000 Jones L-93 500 Hayes L-17 1000

Page 19: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

19

Dependency Goal #1: lossless joins

A bad decomposition:

bname bcity assets cname

Downtown Bkln 9M Jones Downtown Bkln 9M Johnson Mianus Horse 1.7M Jones Downtown Bkln 9M Hayes

cname lno amt

Jones L-17 1000 Johnson L-23 2000 Jones L-93 500 Hayes L-17 1000

=

bname bcity assets cname lno amt

Downtown Bkln 9M Jones L-17 1000 Downtown Bkln 9M Jones L-93 500 Downtown Bkln 9M Johnson L-23 2000 Mianus Horse 1.7M Jones L-17 1000 Mianus Horse 1.7M Jones L-93 500 Downtown Bkln 9M Hayes L-17 1000

Problem: join adds meaningless tuples “lossy join”: by adding noise, have lost meaningful information

bname bcity assets cname lno amt

Downtown Bkln 9M Jones L-17 1000 Downtown Bkln 9M Johnson L-23 2000 Mianus Horse 1.7M Jones L-93 500 Downtown Bkln 9M Hayes L-17 1000

Page 20: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

20

Dependency Goal #1: lossless joinsIs the following decomposition lossless or lossy?

bname assets cname lno

Downtown 9M Jones L-17 Downtown 9M Johnson L-23 Mianus 1.7M Jones L-93 Downtown 9M Hayes L-17

lno bcity amt

L-17 Bkln 1000 L-23 Bkln 2000 L-93 Horse 500

Ans: Lossless: R = R1 R2, it has same 4 tuples as original

R1 and R2 share the lno which is the key to R2.

Page 21: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

21

Ensuring Lossless Joins

A decomposition of R : R = R1 union R2Is lossless iff

R1 R2 R1, or

R1 R2 R2

(i.e., intersecting attributes must for a superkey for one of the resulting smaller relations)

In the previous example, lno is the common attribute and lno is the key to second relation R2

Page 22: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

22

More on Lossless Join

The decomposition of R into X and Y is lossless-join wrt F if and only if the closure of F contains: X Y X, or X Y Y

In particular, the decomposition of R into R - V and UV is lossless-join if U V holds over R (i.e., U is key of second relation)

A B C1 2 34 5 67 2 81 2 87 2 3

A B C1 2 34 5 67 2 8

A B1 24 57 2

B C2 35 62 8

Page 23: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

23

Decomposition Goal #2: Dependency preservationGoal: efficient integrity checks of FD’s

An example w/ no DP:R = ( bname, bcity, assets, cname, lno, amt) bname bcity assets lno amt bname

Decomposition: R = R1 U R2 R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)

Lossless but not DP. Why?

Ans: bname bcity assets crosses 2 tables

Goal #2: Attributes in a dependency should be in a single relation

Page 24: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

24

Decomposition Goal #3: Redundancy Avoidance/MinimizationRedundancy

for B=x , y and zExample: A B C

a x 1 e x 1 g y 2 h y 2 m y 2 n z 1 p z 1

(1) An FD that exists in the above relation is: B C

(2) A superkey in the above relation is A, (or any set containing A)

When do you have redundancy? Ans: when there is some FD, XY covered by a relation and X is not a superkey

Criteria to determine if a design avoids/minimizes redundancy Normal Forms

Page 25: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

25

Normal Forms

Criteria to decide whether or not a design is good

Minimize/avoid the problems due to redundancy

Examples First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form

Most important BCNF and 3NF

More restrictive

Page 26: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

26

Boyce-Codd Normal Form (BCNF)

Reln R with FDs F is in BCNF if, for all X A in A X (called a trivial FD), or X is a super key for R (i.e., X contains a candidate key for

R) In other words, R is in BCNF if the only non-trivial

FDs that hold over R are key constraints. No dependency in R that can be predicted using FDs

alone. If we are shown two tuples that agree upon

the X value, we cannot infer the A value in one tuple from the A value in the other.

If example relation is in BCNF, the 2 tuples must be identical (since X is a key).

F

X Y A

x y1 a

x y2 ?

Page 27: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

27

Boyce-Codd Normal Form

is trivial (i.e., ) is a superkey for R

A relation schema R is in BCNF with respect to a set F of functional dependencies if for all functional dependencies in F+ of the form

where R and R, at least one of the following holds:

Example schema not in BCNF:

bor_loan = ( customer_id, loan_number, amount )

because loan_number amount holds on bor_loan but loan_number is not a superkey

Page 28: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

28

Decomposing a Schema into BCNF Suppose we have a schema R and a non-trivial

dependency causes a violation of BCNF.We decompose R into:

• (U )

• ( R - ( - ) )

In our example that is not in BCNF, bor_loan = ( customer_id, loan_number,

amount ) with FD loan_number amount

= loan_number = amount

and bor_loan is replaced by a BCNF (U ) = ( loan_number, amount ) ( R - ( - ) ) = ( customer_id, loan_number )

Page 29: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

29

BCNF and Dependency Preservation

Constraints, including functional dependencies, are costly to check in practice unless they pertain to only one relation

If it is sufficient to test only those dependencies on each individual relation of a decomposition in order to ensure that all functional dependencies hold, then that decomposition is dependency preserving.

Because it is not always possible to achieve both BCNF and dependency preservation, we consider a weaker normal form, known as third normal form.

Page 30: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

30

Third Normal Form (3NF)

Reln R with FDs F is in 3NF if, for all X A in1. A X (called a trivial FD), or2. X contains a key for R, or3. A is part of some key(i.e., candidate key) for R.

Minimality of a key is crucial in third condition above! i.e., Can only have candidate keys

If R is in BCNF, obviously in 3NF. If R is in 3NF, some redundancy is possible.

(why? Because (3) allows non-key based “X->A” dependencies) It is a compromise, used when BCNF not achievable (e.g., no

``good’’ decomp, or performance considerations). Lossless-join, dependency-preserving decomposition of R into a

collection of 3NF relations always possible.

F

Page 31: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

31

3NF example

R = (custid, empid, bname, type) F = {empid bname} Other candidate keys {custid, bname} Not in BCNF because empid is not a

superkey In 3NF because bname is in a candidate

key

Page 32: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

32

3NF Decomposition Algorithm

Let Fc be a canonical cover for F;i := 0;for each functional dependency in Fc do

if none of the schemas Rj, 1 j i contains then begin

i := i + 1;Ri :=

endif none of the schemas Rj, 1 j i contains a candidate key

for Rthen begin

i := i + 1;Ri := any candidate key for R;

end return (R1, R2, ..., Ri)

Page 33: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

33

3NF Decomposition Algorithm (Cont.)

Above algorithm ensures: each relation schema Ri is in 3NF

decomposition is dependency preserving and lossless-join

Page 34: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

34

3NF Decomposition: An Example

Relation schema:cust_banker_branch = (customer_id, employee_id,

branch_name, type ) The functional dependencies for this relation schema

are: customer_id, employee_id branch_name, type employee_id branch_name customer_id, branch_name employee_id

We first compute a canonical cover (or minimal cover) FC

branch_name is extraneous in the r.h.s. of the 1st dependency No other attribute is extraneous, so we get FC =

customer_id, employee_id type employee_id branch_name

customer_id, branch_name employee_id

Page 35: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

35

3NF Decompsition Example (Cont.) The for loop generates following 3NF schema:

(customer_id, employee_id, type )

(employee_id, branch_name)

(customer_id, branch_name, employee_id) Observe that (customer_id, employee_id, type ) contains a

candidate key of the original schema, so no further relation schema needs be added

Page 36: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

36

Summary of Schema Refinement

If a relation is in BCNF, it is free of redundancies that can be detected using FDs. Thus, trying to ensure that all relations are in BCNF is a good heuristic.

If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. Must consider whether all FDs are preserved. If a

lossless-join, dependency preserving decomposition into BCNF is not possible (or unsuitable, given typical queries), should consider decomposition into 3NF.

Decompositions should be carried out and/or re-examined while keeping performance requirements in mind.

Page 37: 1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow

37

Theory and practicePerformance tuning:

Redundancy not the sole guide to decomposition

Workload matters too!!• nature of queries run• mix of updates, queries•.....

Workload (mix of queries, updates,…) can influence: BCNF vs 3NF may further decompose a BCNF into (4NF) may denormalize (i.e., undo a decomposition or add new columns For optimizing query workloads: Materialized views,…