04 data normalization 1
TRANSCRIPT
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 1/6
DataNormalization (1)
1 Copyright © 2010 Jerry Post. All rights reserved.
IS240 – DBMS
Lecture # 4 – 2010-02-01M. E. Kabay, PhD, CISSP-ISSMP Assoc . Prof . Informat ion As surance
School of Bus iness & Management, Norwich University
mailto:[email protected] V: 802.479.7937
Topics
Why Normalization?
Definitions
Client B illing Example
First Normal Form
This material wacreated by
Prof Jerry Postand has been
reformatted by
Prof M. E. Kabay
2 Copyright © 2010 Jerry Post. All rights reserved.
econ orma o rm
Third Normal Form
Checking Your Work (Quality Control)
Why Normalization? Need standardized data definition
Advantages of DBMS requ ire carefu l design
Define data correctly and the rest is much easier
It especially makes it easier to expand databaselater
Method a lies to most models and most DBMS
3 Copyright © 2010 Jerry Post. All rights reserved.
Similar to Entity-Relationship
Similar to Objects (without inheritance and methods)
Goal: Define tables carefull y
Save space
Minimize redundancy Protect data
Definitions Relational database: A collection of tables.
Table: A collection of columns (attributes) describing an entity.Individual objects are stored as rows of data in the table.
Property (attribute): a characteristic or descriptor of a class or entity.
Every table has a primary key.
The smallest set of c olumns that uni quely identifies any row
Pr imar ke s can s an more than one column concatenated
4 Copyright © 2010 Jerry Post. All rights reserved.
keys)
We often create a primary key to ensure uniqueness (e.g.,CustomerID, Product#, . . .) called a surrogate key.
EmployeeID TaxpayerID LastName Firs tName HomePhone Address12512 888-22-5552 Car to m Abd ul (603) 323-9893 252 So ut h St reet15293 222-55 -3737 Venet iaan Roland (804) 888-6667 937 Paramar ibo Lane22343 293-87-4343 J oh ns on Jo hn (703) 222-9384 234 Mai n Str eet29387 837-36-2933 St en hei m Su san (410) 330-9837 8934 W. Map le
Employee
Properties
Rows/ObjectsClass: Employe
Primary key
Keys
Primary key
Every table (object) must have a primary key
Uniquely identifies a row (one-to-one)
Concatenated (or composite) key
Multiple columns needed for primary key
5 Copyright © 2010 Jerry Post. All rights reserved.
Identify repeating relationsh ips (1 : M or M : N)
Key columns are underlined
First step
Collect user documents
Identify possible keys
For each attribute_1 can there be manyattribute_2? If so, make attribute_2 a key
Notation
Table name
Primary key is underlined
Table columns
Customer(CustomerID, Phone, Name, Address, City, State, ZipCode
6 Copyright © 2010 Jerry Post. All rights reserved.
CustomerID Phone LastName FirstName Address City State Zipcode
1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith’s Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 616-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 371487 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371488 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 2/6
Identifying Key Columns
Orders
OrderItems
OrderID Date Customer 8367 5-5-04 67948368 5-6-04 9263
Each order hasonly one customer.So Customer is notpart of the key.We can uniqu ely identify each
order simply by its OrderID.
7 Copyright © 2010 Jerry Post. All rights reserved.
OrderID Item Qu ant ity8367 229 28367 253 48367 876 18368 555 48368 229 1
Each order hasmany items.Each item can appear on many orders.So OrderID and Itemare both part of the key.We need both the OderID &
the Item to uniquely identify a
row in th e order.
Surrogate Keys
Real wor ld keys sometim es cause problems in a database.
Ass umpt ions about uniqueness are of ten w rong
E.g., using l astname_first name as a key does not work!
Example: Customer
Avo id phone n umb ers: peo ple may not not ify you w hen
numbers change.
8 Copyright © 2010 Jerry Post. All rights reserved.
Avo id SSN (pr ivacy and most bus inesses are no tauthorized to ask for verification, so you coul d end upwith duplicate values)
Often best to let the DBMS generate unique values
Acc ess: AutoNumb er
SQL Server: Identit y
Oracle: Sequences (but require additional programmi ng
Drawback: Numbers are not related to any busin ess data, sthe application needs to hide them and provide other look umechanisms.
Common Order System
Customer
Order
Salesperson
OrderItem
1*
1
*1
*
9 Copyright © 2010 Jerry Post. All rights reserved.
Item
1*
Customer(Cust omerID, Name, Address, Cit y, Phone)
Salesperson(EmployeeID, Name, Commis sion, DateHired)
Order(OrderID, OrderDate, CustomerID, Emplo yeeID)OrderItem(OrderID, ItemID, Quantit y)
Item(ItemID, Description, ListPrice)
Client Billing Example
Client Billing
Client(ClientID, Name, Address, BusinessType)
Partner(PartnerID, Name, Speciality, Office, Phone)
PartnerAssignment(PartnerID, ClientID, DateAcquired)
Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled
10 Copyright © 2010 Jerry Post. All rights reserved.
Each partner can be assigned many clients.
Each client can be assigned to many partn ers.
Table names
Client(ClientID, Name, Address, BusinessType)
Partner(PartnerID, Name, Specialit y, Office, Phone)
PartnerAssignment(PartnerID, ClientID, DateAcquired)
Client Billing: UnusualBusiness Rules
combine
11 Copyright © 2010 Jerry Post. All rights reserved.
Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled)
Suppose each client is assigned to exactly o nepartner .
Cannot key PartnerID.
Would combine Client and PartnerAssignmenttables, since they have the same key.
Client Billing: More LikelyAssumptions
Cl ien tIDPar tner ID Date/Time Item Descr ip tion Hours AmountB il led
115 963 8-4-04 10:03 967 Stress analysis 2 $500
295 967 8-5-04 11:15 754 New Design 3 $750115 963 8-8-04 09:30 967 Stress analysis 2.5 $650
Billing
12 Copyright © 2010 Jerry Post. All rights reserved.
.
Each Partner may work wit h many clients.
Each client may work with many partners.
Each partner and client may wo rk together manytimes.
The identifying f eature is the date/time of the service.
What happens if we do not i nclude Date/Time as a key?
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 3/6
Sample: Video DatabasePossible Keys
13 Copyright © 2010 Jerry Post. All rights reserved.
Initial Objects (Entities) Customers
Key: Assign aCustomerID
Sample Properties
Name
Address
RentalTransaction
Event/Relationship
Key: AssignTransactionID
Sample Properties
CustomerID
14 Copyright © 2010 Jerry Post. All rights reserved.
Videos
Key: Assign aVideoID
Sample Properties
Title
RentalPrice
Rating
Description
VideosRented
Event/Repeating list
Keys: TransactionID+ VideoID
Sample Properties
VideoCopy#
Initial Form Evaluation
Collect forms fromusers
Write down properties
Find repeating groups( . . .)
Lo ok for o tent ial k e s :
RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,(VideoID, Copy#, Title, Rent ) )
15 Copyright © 2010 Jerry Post. All rights reserved.
key
Identify computedvalues
Notation makes it easier to identify and solveproblems
Results equivalent todiagrams, but will fit onone or tw o pages
Problems with RepeatingSections
Storing data in this raw form wouldnot work very well. For example,repeating sections will causeproblems.
Note the duplication of data.
RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode(VideoID, Copy#, Title, Rent ) )
Repeating Section
16 Copyright © 2010 Jerry Post. All rights reserved.
,yet checked out a movie—where dowe store that cust omer’s data?
TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent1 4/18/04 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.501 4/18/04 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.502 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.502 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.002 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.503 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.503 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 1 5 1 Fabulous Baker Boys $2.003 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.504 4/18/04 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.004 4/18/04 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.504 4/18/04 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.504 4/18/04 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00
Causes duplication
Problems with RepeatingSections
NamePhone AddressCityState
ZipCode
VideoID Copy# Title Rent1. 6 1 Clockwork Orange 1.50
Store repeating data
Al locate space
How much?Can’t be short
Wasted space
Customer Rentals
17 Copyright © 2010 Jerry Post. All rights reserved.
2. 8 2 Hopscotch 1.503.4.5.
{Unused Space}
Not in First Normal Form
e.g., How m any videoswill be rented at onetime?
What happens if we try tostore more than weplanned?
A bet ter def in it ioneliminates this problem.
First Normal FormRentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode,
(VideoID, Copy#, Title, Rent ) )
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
RentalLine(TransID, VideoID, Copy#, Title, Rent )
Remove repeating sections
S l it into two tab les
18 Copyright © 2010 Jerry Post. All rights reserved.
Bring key from m ain and repeating section
RentalLine(TransID, VideoID, Copy#, . . .)
Each transaction can h ave many videos (key VideoID)
Each video can be rented on many transactio ns (keyTransID)
For each TransID and VideoID, only one Copy# (no keyon Copy#)
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 4/6
Nested Repeating Sections
Table (Key1, . . . (Key2, . . . (Key3, . . .) ) )
Table1(Key1, . . .) TableA (Key1,Key2 . . .(Key3, . . .) )
19 Copyright © 2010 Jerry Post. All rights reserved.
Nested: Table (Key1, aaa. . . (Key2, bbb. . . (Key3,ccc. . .) ) )
First Normal Form (1NF)
Table1(Key1, aaa . . .)
Table2(Key1, Key2, bbb . .)
Table3(Key1, Key2, Key3, ccc. . .)
Table2 (Key1, Key2 . . .) Table3 (Key1, Key2, Key3, . . .)
First Normal Form Problems(Data)TransID RentDate C ustID Phone LastName FirstName Address City State ZipCode1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371483 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs T N 370314 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
1NF split s repeatinggroups
Stil l have rob lems
20 Copyright © 2010 Jerry Post. All rights reserved.
TransID VideoID Copy# Title1 1 2 2001: A Space Odyssey1 6 3 Clockwork Orange2 8 1 Hopscotch2 2 1 Apocalypse Now2 6 1 Clockwork Orange3 9 1 Luggage Of The Gods3 15 1 Fabulous Baker Boys3 4 1 Boy And His Dog4 3 1 Blues Brothers4 8 1 Hopscotch4 13 1 Surf Nazis Must Die4 17 1 Witches of Eastwick
Replication
Hiddendependency:
If a video has notbeen rented yet,then what is itstitle?
Second Normal FormDefinition
Each non-key column Dependence (definition )
RentalLine(TransID, VideoID, Copy#, Title, Rent)
Depend only on VideoID
Depends on both TransID and VideoID
21 Copyright © 2010 Jerry Post. All rights reserved.
entire key.
Appl ies only toconcatenated keys
Some columnsdepend on only partof the key
Split those into anew table.
If, given a value for the key,you always know the valueof the property in question,then that property is said todepend on the key.
If you change part of a keyand the questionable
property does not change,then the table is not in 2NF.
Second Normal FormExample
Title depends only on VideoID
RentalLine(TransID, VideoID, Copy#, Title, Rent)
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Re
22 Copyright © 2010 Jerry Post. All rights reserved.
Rent depends on VideoID
This statement is actually a business rule.
It might be different at different stores.
Some stores might charge a different rent fo r eachvideo depending on the day (or time).
Each non-key column depends on the whole key.
Second Normal FormExample (Data)
Tr an sID Vi deo ID Cop y#1 1 21 6 32 2 1
2 6 12 8 13 4 13 9 1
VideosRented(TransID, VideoID, Copy#)
VideoID Title Rent1 2001: A Space Odyssey $1.502 Apocalypse Now $2.003 Blues Brothers $2.00
i
Videos(VideoID, Title, Rent)
23 Copyright © 2010 Jerry Post. All rights reserved.
3 15 14 3 14 8 14 13 14 17 1
4 Boy And His Dog $2.505 Brother From Another Planet $2.006 Clockwork Orange $1.507 Gods Must Be Crazy $2.008 Hopscotch $1.50
RentalForm2(TransID, RentDate, CustomerID, Phone,Name, Address, City, State, ZipCode)
(Unchanged)
Second Normal FormProblems (Data)
TransID RentDate C ustID Phone LastName FirstName Address City State ZipCode1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421712 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148
3 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs T N 370314 4/18/042 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
24 Copyright © 2010 Jerry Post. All rights reserved.
Even in 2NF, problems remain
Replication
Hidden dependency
If a customer has not rented avideo yet, where do we store their personal data?
Solution: split table.
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 5/6
Third Normal FormDefinition
Each non-key column mus tde end on nothin but he
Dependence (definiti on)
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
Depend only on CustomerID
Depend on TransID
25 Copyright © 2010 Jerry Post. All rights reserved.
key.
Some columns depend oncolumns that are not partof the key.
Split those int o a newtable.
Example: Customer’sname does not change for every transaction.
, ,you always know the valueof the property in question,then that property is said todepend on the key.
If you change the key andthe questionable propertydoes not change, then thetable is not in 3NF.
Third Normal Form Example
Customer attributes depend only o n Customer ID
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
26 Copyright © 2010 Jerry Post. All rights reserved.
Split them into new table (Customer)
Remember to leave CustomerID in Rentals table.
We need to be able to r econnect tables.
3NF is sometimes easier to see if you identify primaryobjects at the start—then you wo uld recognize thatCustomer was a separate object.
Third Normal Form ExampleData
TransID RentDate CustomerID1 4/18/04 32 4/30/04 73 4/18/04 84 4/18/04 3
Rentals(TransID, RentDate, CustomerID )
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode
27 Copyright © 2010 Jerry Post. All rights reserved.
CustomerID Phone LastName FirstName Address City State ZipCode1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 421013 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 421714 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 421225 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 421026 615-373-4746 Steinmetz Susan 15 Speedway DrivePortland TN 371487 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 371488 615-452-1162 Jones Charlie 867 L akeside Drive Castalian Springs T N 370319 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 4272110 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
Third Normal Form Tables(3NF)
CustomerIDPhoneLastName
Customers
TransIDRentDateCustomerID
Rentals
TransIDVideoIDCopy#
VideosRented1
*
1
*
*
28 Copyright © 2010 Jerry Post. All rights reserved.
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
VideosRented(TransID, VideoID, Copy#)
Videos(VideoID, Title, Rent)
FirstName AddressCityStateZipCode
VideoIDTitleRent
Videos
1
3NF Rules/Procedure
Split out r epeating sections
Be sure to incl ude a key from the parent sectionin the new piece so the two parts can be
recombined. Verify that the keys are correct
29 Copyright © 2010 Jerry Post. All rights reserved.
key?
Are one-to-many and many-to-many relationshipscorrect?
Check “ many” for keyed columns and “one” for non-key columns.
Make sure that each non-key column depends onthe whole key and nothing but the key.
No hidden dependencies.
Checking Your Work (QualityControl) Look for one-to-many relationships.
Many side should be keyed (underlined).
e.g., VideosRented(Trans ID, VideoID, . . .).
Check each column and ask if i t should be 1 : 1 or 1: M.
If add a key, r enormalize.
30 Copyright © 2010 Jerry Post. All rights reserved.
Verify no repeating s ections (1NF)
Check 3NF
Check each colum n and ask:
Does it depend on the whole key and nothing butthe key?
Verify that the tables can be reconnected (joined) toform the original tables (draw lines).
Each table represents one object.
Enter sample data—look for r eplication.
7/27/2019 04 Data Normalization 1
http://slidepdf.com/reader/full/04-data-normalization-1 6/6
HOMEWORK REQUIRED by Sun 14 Feb 2010 at 23:59
Study Chapter 3 thoroughly using SQ3R
Answer Review Questions for yourself
For 10 points each, solve and submit problems 1 & 3by creating diagrams in DBDesign for examination by
the professor
31 Copyright © 2010 Jerry Post. All rights reserved.
Sally’s Pet Store questions 8-11 (40 pts total)
OPTIONAL by Sun 14 Feb 2010 at 23:59
Rolling Thunder Bicycl es problems 12 and/or 13 (10pts each)
Corner Med problems 14 (incl optional) and/or 15 (10pts each)
DISCUSSION
32 Copyright © 2010 Jerry Post. All rights reserved.