more on normalization

36
CSC 240 (Blum) 1 More on Normalization Based on Chapter 5 in Rob and Coronel and Chapter 13 in Connolly and Begg

Upload: misty

Post on 13-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

More on Normalization. Based on Chapter 5 in Rob and Coronel and Chapter 13 in Connolly and Begg. First and Second Normal Form. A table is in the First Normal Form (1NF) if it has no multi-valued fields. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: More on Normalization

CSC 240 (Blum) 1

More on Normalization

Based on Chapter 5 in Rob and Coronel and Chapter 13 in Connolly

and Begg

Page 2: More on Normalization

CSC 240 (Blum) 2

First and Second Normal Form A table is in the First Normal Form (1NF)

if it has no multi-valued fields. A table is in Second Normal Form (2NF)

if it is in First Normal Form and it has no partial dependencies on the primary key.

Recall that moving from an un-normalized table to the 1NF increases redundancy but that the purpose of the other stages of normalization is to decrease redundancy.

Page 3: More on Normalization

CSC 240 (Blum) 3

Some Redundancy May Remain at 2NF

In the above example, the firstName, lastName combination may serve as the primary key, each of the other attributes stateSymbol, stateName and stateCapital are fully dependent on the primary key. So it’s in Second Normal Form. But clearly there is still redundancy.

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

Page 4: More on Normalization

CSC 240 (Blum) 4

Third Normal Form

Eliminating any field (via table decomposition) that transitively depends on the primary key puts the table into Third Normal Form (provided the table was in the Second Normal Form prior to decomposition). Recall that if AB (attribute A is a

determinant of attribute B) and BC, then clearly AC. C is said to be transitively dependent on A provided that CA is not true.

Page 5: More on Normalization

CSC 240 (Blum) 5

The firstName-lastName combination determines stateSymbol which in turn determines stateName and stateCapital. (transitive dependence) stateName and stateCapital do not determine

firstName-lastName

PA Pennsylvania Harrisburgh Borski Robert A.

PA Pennsylvania Harrisburgh Brady Robert A.

PA Pennsylvania Harrisburgh Coyne William J.

… … … … …

Page 6: More on Normalization

CSC 240 (Blum) 6

Decomposition into the Third Normal Form

Create another table that has as its primary key the attribute which is the intermediate attribute in the transitive dependence. Put the attributes depending on it in this new table. HouseMember(lastName, firstName,

stateSymbol)State(stateSymbol, stateName,

stateCapital)

Page 7: More on Normalization

CSC 240 (Blum) 7

Another transitive dependence example Customer(customerID, lastName,

firstName, street, city, state, zipcode, stateTax, cityTax) There is a simple primary key, so the

table is in Second Normal Form (2NF). But the city tax is dependent on the

city and state (there may be cities with the same name in different states), and the state tax is dependent on the state.

Page 8: More on Normalization

CSC 240 (Blum) 8

Zipcode Dependence (Cont.)

In fact city and state may also be dependent on zipcode. Sometimes a small city shares a

zipcode with a bordering city or neighborhood of a bordering city.

It also depends on whether one means a 5-digit zipcode or a 9-digit zipcode.

Page 9: More on Normalization

CSC 240 (Blum) 9

Another transitive dependence example (Cont.)

Customer(customerID, lastName, firstName, street, zipcode)

ZipInfo(zipcode, city, state, cityTax, stateTax) There could be further decomposition

since stateTax depends on state.

Page 10: More on Normalization

CSC 240 (Blum) 10

Another transitive dependence example (Cont.)

Customer(customerID, lastName, firstName,street, zipcode)

ZipInfo(zipcode, city, state) StateTax(state, stateTax) CityTax(state, city, cityTax)

Page 11: More on Normalization

CSC 240 (Blum) 11

Recall the price

While redundancy has its price (increased storage and the possibility for update anomalies), minimizing redundancy also has a price: It introduces more tables. More tables means more joins when it

comes to querying the database.

Page 12: More on Normalization

CSC 240 (Blum) 12

Some Redundancy May Remain at 3NF: Lot Example

Let us say the land within a county is broken up into lots and each lot is assigned a number.

A county is also broken down into municipalities (cities, townships, etc.).

The lots are assessed at some value. The table might look like:

LotAssessment(lotID, county, municipality, assessment)

Page 13: More on Normalization

CSC 240 (Blum) 13

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The next stage is to select a primary key. There are two candidate keys:

1. LotAssessment(lotID, county, municipality, assessment)

2. LotAssessment(lotID, county, municipality, assessment)

Page 14: More on Normalization

CSC 240 (Blum) 14

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The next thing to note in this example is that county is functionally dependent on municipality. 1. LotAssessment(lotID, county,

municipality, assessment)2. LotAssessment(lotID, county,

municipality, assessment) Note that choice two is not in 2NF.

Page 15: More on Normalization

CSC 240 (Blum) 15

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The second choice LotAssessment(lotID, county, municipality,

assessment)

is decomposed as follows:LotAssessment(lotID, municipality,

assessment) CityCounty(municipality, county)

Page 16: More on Normalization

CSC 240 (Blum) 16

Some Redundancy May Remain at 3NF: Lot Example (Cont.)

The first choice LotAssessment(lotID, county, municipality,

assessment)

on the other hand, is in 2NF and 3NF. There is no partial dependence on the

primary key. There is no transitive dependency on the

primary key.

Page 17: More on Normalization

CSC 240 (Blum) 17

Some Redundancy May Remain at 3NF: Lot Example (Cont.) A possible feature of tables that are in

3NF but may still have redundancy is that there are various candidate keys from which one chooses the primary key. We will introduce a generalization and/or

extension of the Normal Form idea to ensure that we get further in our redundancy reduction independent of our initial choice of primary key.

Page 18: More on Normalization

CSC 240 (Blum) 18

Generalization: Primary Candidate Key

One way to avoid the type of problem that occurred in the Lot example is to extend the definitions of the Second and Third Normal Forms.

To extend the definitions of the Second and Third Normal Forms, replace the term “depends on the primary key” with “depends on any candidate key.”

Page 19: More on Normalization

CSC 240 (Blum) 19

Return to the lot

Note that the first lot table is not in generalized 2NF because there is a partial dependence on a candidate key.

So with the generalized version of 2NF, this table would be decomposed.

In fact, the decomposed table would have the second candidate key as its primary key.

In effect, you are forced to choose the candidate key that yields decomposition.

Page 20: More on Normalization

CSC 240 (Blum) 20

Boyce-Codd Normal Form Another way to extend normal forms is

to introduce the so-called Boyce-Codd Normal Form.

A table is in Boyce-Codd Normal Form if the only determinants (attributes that determine other attributes) are candidate keys.

Page 21: More on Normalization

CSC 240 (Blum) 21

Primary Key Choice: Tutoring Example

Let us say we have a tutoring center in which tutees (students) come in to see tutors to be tutored in a course.

A tutor is assigned to a tutee for a particular course. Thus the tutor-tutee pair is a determinant of

course. The tutor and tutee meet on a certain date at

a certain time and are assigned a room for the tutoring session.

A tutor and tutee meet at most once a day.

Page 22: More on Normalization

CSC 240 (Blum) 22

Primary Key Choice: Tutoring Example (Cont.)

The starting table might look like:Tutoring(tutor, tutee, course, room, date,

time) The next stage is to identify the primary

key. In this example, there are many choices, i.e. there are many candidate keys.

Page 23: More on Normalization

CSC 240 (Blum) 23

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, in a given room, there can only be one tutor-tutee pair studying a course. So one choice for the primary key is date, time, roomTutoring(tutor, tutee, course, room, date,

time) This table is in 2NF but not 3NF because

tutor-tuteecourse is second part of a transitive dependence of course on the primary key.

Page 24: More on Normalization

CSC 240 (Blum) 24

Primary Key Choice: Tutoring Example (Cont.)

On a given date, a given tutor-tutee pair meet just once to study their assigned course. So another choice for the primary key is date, tutor, tuteeTutoring(tutor, tutee, course, room, date,

time) This table is not in 2NF because course is

partially dependent on the primary key.

Page 25: More on Normalization

CSC 240 (Blum) 25

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, a given tutor meets a tutee in a room to study their assigned course. So another choice for the primary key is date, time, tutor Tutoring(tutor, tutee, course, room, date,

time) This table is in 2NF. It is not in 3NF but

the intermediate attribute (tutor-tutee) is comprised of part of the primary and part non-primary key.

Page 26: More on Normalization

CSC 240 (Blum) 26

Primary Key Choice: Tutoring Example (Cont.)

On a given date, at a given time, a given tutee meets a tutor in a room to study their assigned course. So another choice for the primary key is date, time, tutee Tutoring(tutor, tutee, course, room, date,

time) This table is in 2NF. It is not in 3NF but the

intermediate attribute (tutor-tutee) is comprised of part of the primary and part non-primary key.

Page 27: More on Normalization

CSC 240 (Blum) 27

Primary Key Choice: Tutoring Example (Cont.)

While the tutoring example has to be decomposed to reach the 3NF, it does demonstrate that there can be many different primary keys.

If the generalized version of the 2NF is used then decomposition occurs there (because it occurs there for one of the candidate keys).

Page 28: More on Normalization

CSC 240 (Blum) 28

Primary Key Choice: Tutoring Example (Cont.)

The resulting tables areTutoring(tutor, tutee, room, date, time)Subject(tutor, tutee, course)

This decomposition does not force a choice of the primary key in the first table.

Page 29: More on Normalization

CSC 240 (Blum) 29

Many candidate keys

Another example with many possible keys is the football schedule table. We could use Week, HostName Week, AwayName Date, HostName Date, AwayName

week date hostCity hostState hostName awayCity awayState awayName hostScore awayScore

1 9/12/049/13/04…

PhiladelphiaCharlotte…

PANC…

EaglesPanthers…

New YorkGreen Bay…

NYWI…

GiantsPackers…

3114…

1724…

2 9/19/04 Kansas City MO Chiefs Charlotte NC Panthers 17 28

Page 30: More on Normalization

CSC 240 (Blum) 30

Choose Date over Week

Note that Date Week (but not vice versa). In the third and fourth choices this is a partial

dependence on the primary key – and so Second Normal Form requires a table consisting of (Date, Week).

Generalized Second Normal Form would also require the decomposition because it is a partial dependence on a candidate key.

Boyce-Codd Normal Form would require the decomposition because the determinant (Date) is not by itself a candidate key.

Page 31: More on Normalization

CSC 240 (Blum) 31

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 32: More on Normalization

CSC 240 (Blum) 32

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 33: More on Normalization

CSC 240 (Blum) 33

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 34: More on Normalization

CSC 240 (Blum) 34

Connelly and Begg: Review of Normalization (UNF to BCNF)

Page 35: More on Normalization

CSC 240 (Blum) 35

Fourth Normal Form

Above is the un-normalized table of our Art for Sale example. There were two multi-valued fields: Styles and WorkForsale.

The Styles field lists styles that the artist has worked in, not the specific styles of any particular work by that artist.

When you have more than one multi-valued field there is the potential for establishing a false relationship.

ArtistID Fname Mname Lname Styles WorkForSale Medium Size

THB0001 Thomas Hart Benton Mural, Genre, Figure, Graphics

Mural Study for ‘The Social History of Indiana’

Ink on Paper 6.75" x 29"

The Cliffs, Wesquebosque, Martha's Vineyard

Oil on Board 12" x 15"

Page 36: More on Normalization

CSC 240 (Blum) 36

References

Database Systems, Rob and Coronel Database Systems, Connolly and Begg Fundamentals of Relational Databases,

Mata-Toledo and Cushman Concepts of Database Management,

Pratt and Adamski