normalization
DESCRIPTION
TRANSCRIPT
Relational database designNormalization
Prepared by Vaishali Kalaria
Design Guidelines for Relational Databases What is relational database design?
The grouping of attributes to form "good" relation schemas
Two levels of relation schemas
The logical "user view" level The storage "base relation" level
Design is concerned mainly with base relations
What are the criteria for "good" base relations?
1. Semantics of the Relation Attributes
each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes).
Attributes of different entities should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible.
2. Redundancy and Data Anomalies
Redundant data is where we have stored the same ‘information’ more than once. i.e., the redundant data could be removed without the loss of information.
Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Design a schema that does not suffer from the insertion, deletion and update anomalies.
Example: the following relation that contains staff and department details:
Such ‘redundancy’ could lead to the following ‘anomalies’
staffNo job dept dname city
SL10 Salesman 10 Sales Stratford
SA51 Manager 20 Accounts Barking
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking
• Insert Anomaly: Need to store a value for an attribute but cannot because the value for another attribute is unknown. • We can’t insert a dept without inserting a member
of staff that works in that department
Update Anomaly: Occurs when a change of a single attribute in one record requires changes in multiple records• We could change the name of the dept that SA51
works in without simultaneously changing the dept that DS40 works in.
Deletion Anomaly: Occurs when the removal of a record results in a loss of important information about an entity.• By removing employee SL10 we have removed all
information pertaining to the Sales dept.
3 Null Values in Tuples
Relations should be designed such that their tuples will have as few NULL values as possible
Attributes that are NULL frequently could be placed in separate relations (with the primary key)
Reasons for nulls:
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable
Purpose of Normalization
To avoid redundancy by storing each ‘fact’ within the database only once.
To put data into a form that conforms to relational principles - no repeating groups.
To put the data into a form that is more able to accurately accommodate change.
To avoid certain updating ‘anomalies’.
To facilitate the enforcement of data constraints.
Normalization
"Normalization" refers to the process of creating an efficient, reliable, flexible, and appropriate "relational" structure for storing information. Normalized data must be in a "relational" data structure.
Usually involves dividing a database into two or more tables and defining relationships between the tables.
The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships
The Process of Normalization
• Normalization is often executed as a series of steps. • Each step corresponds to a specific normal
form that has known properties.
• As normalization proceeds, • the relations become progressively more
restricted in format, and • less vulnerable to update anomalies.
Unnormalised (UDF)
First normal form(1NF)
Remove repeating groups
Second normal form(2NF)
Remove partial dependencies
Third normal form(3NF)
Remove transitive dependencies
Boyce-Codd normalform (BCNF)
Remove remaining functional dependency anomalies
Fourth normal form(4NF)
Remove multivalued dependencies
Fifth normal form(5NF)
Remove remaining anomalies
Stages of Normalisation
Unnormalized Normal Form (UNF)
Definition: A relation is unnormalized when it has not had any normalization rules applied to it, and it suffers from various anomalies.
the capturing of attributes to a ‘Universal Relation’ from a screen layout, manual report, manual document, etc...
ClientRental relation in UNF
Unnormalized form (UNF)A table that contains one or more repeating groups.
ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76Johnkay
PG4
PG16
6 lawrenceSt,Glasgow
5 Novar Dr,Glasgow
1-Jul-00
1-Sep-02
31-Aug-01
1-Sep-02
350
450
CO40
CO93
Tina Murphy
Tony Shaw
CR56AlineStewart
PG4
PG36
PG16
6 lawrenceSt,Glasgow
2 Manor Rd,Glasgow
5 Novar Dr,Glasgow
1-Sep-99
10-Oct-00
1-Nov-02
10-Jun-00
1-Dec-01
1-Aug-03
350
370
450
CO40
CO93
CO93
Tina Murphy
Tony Shaw
Tony Shaw
Figure ClientRental unnormalized table
Repeating group = (propertyNo, pAddress, rentStart, rentFinish, rent, ownerNo, oName)
First Normal Form (1NF)
Definition: A relation is in 1NF if, and only if, all its underlying attributes contain atomic values only. the intersection of each row and column contains one
and only one value.
Remove repeating groups into a new relation
1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value
for a single tuple.
1NF
There are two approaches to removing repeating groups from
unnormalized tables:
1. Removes the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data.
2. Removes the repeating group by placing the repeating data, along with a copy of the original key attribute(s), in a separate relation. A primary key is identified for the new relation.
1NF ClientRental relation with the first approach
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
CR76 PG4JohnKay
6 lawrenceSt,Glasgow
1-Jul-00 31-Aug-01 350 CO40Tina Murphy
CR76 PG16JohnKay
5 Novar Dr,Glasgow
1-Sep-02 1-Sep-02 450 CO93Tony Shaw
CR56 PG4AlineStewart
6 lawrenceSt,Glasgow
1-Sep-99 10-Jun-00 350 CO40Tina Murphy
CR56 PG36AlineStewart
2 Manor Rd,Glasgow
10-Oct-00 1-Dec-01 370 CO93Tony Shaw
CR56 PG16AlineStewart
5 Novar Dr,Glasgow
1-Nov-02 1-Aug-03 450 CO93Tony Shaw
Figure 1NF ClientRental relation with the first approach
The ClientRental relation is defined as follows,ClientRental ( clientNo, propertyNo, cName, pAddress, rentStart, rentFinish, rent, ownerNo, oName)
With the first approach, we remove the repeating group(property rented details) by entering the appropriate client data into each row.
1NF ClientRental relation with the second approach
With the second approach, we remove the repeating group (property rented details) by placing the repeating data along with a copy of the original key attribute (clientNo) in a separte relation.
Client (clientNo, cName)PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
rentFinish, rent, ownerNo, oName)
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
CR76 PG46 lawrenceSt,Glasgow
1-Jul-00 31-Aug-01 350 CO40Tina Murphy
CR76 PG165 Novar Dr,Glasgow
1-Sep-02 1-Sep-02 450 CO93Tony Shaw
CR56 PG46 lawrenceSt,Glasgow
1-Sep-99 10-Jun-00 350 CO40Tina Murphy
CR56 PG362 Manor Rd,Glasgow
10-Oct-00 1-Dec-01 370 CO93Tony Shaw
CR56 PG165 Novar Dr,Glasgow
1-Nov-02 1-Aug-03 450 CO93Tony Shaw
Figure 1NF ClientRental relation with the second approach
Other Example
Second Normal Form (2NF)
A database table is said to be in 2NF if it is in 1NF and contains only those fields/columns that are
functionally dependent on the primary key.
In 2NF the partial dependencies can be removed of any non-key field.
Note: It is still possible for a table in 2NF to exhibit
transitive dependency; that is, one or more attributes may be functionally dependent on nonkey attributes.
The process of converting the database table into 2NF:
Identify the primary key for the 1NF relation.
Identify the functional dependencies in the relation.
If partial dependencies exist on the primary key remove them by placing then in a new relation along with a copy of their determinant.
2NF ClientRental relation
The ClientRental relation has the following functional dependencies:
fd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)fd2 clientNo cName (Partial
dependency)fd3 propertyNo pAddress, rent, ownerNo, oName (Partial dependency)fd4 ownerNo oName (Full dependency)fd5 clientNo, rentStart propertyNo, pAddress, rentFinish, rent, ownerNo, oName (Candidate key)fd6 propertyNo, rentStart clientNo, cName, rentFinish (Candidate key)
2NF ClientRental relation
After removing the partial dependencies, the creation of the three new relations called Client, Rental, and PropertyOwner
ClientNo cNameCR76 John Kay
CR56 Aline Stewart
Client
ClientNo propertyNo rentStart rentFinishCR76 PG4 1-Jul-00 31-Aug-01
CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
Rental
propertyNo pAddress rent ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw
PropertyOwner
Client (clientNo, cName)Rental (clientNo, propertyNo, rentStart, rentFinish)PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName)
Figure 2NF ClientRental relation
Third Normal Form (3NF)
Transitive dependency A condition where A, B, and C are attributes of a relation such thatif A B and B C, then C is transitively dependent on A via B(provided that A is not functionally dependent on B or C).
Third normal form (3NF)
A relation that is in first and second normal form, and in which no non-primary-key attribute is transitively dependent on the primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies by placing the attribute(s) in a new relation along with a copy of the determinant.
3NF ClientRental relation
The functional dependencies for the Client, Rental and PropertyOwner relations are as follows:
Clientfd2 clientNo cName (Primary Key)
Rentalfd1 clientNo, propertyNo rentStart, rentFinish (Primary Key)fd5 clientNo, rentStart propertyNo, rentFinish(Candidate key)fd6 propertyNo, rentStart clientNo, rentFinish (Candidate key)
PropertyOwnerfd3 propertyNo pAddress, rent, ownerNo, oName (Primary Key)fd4 ownerNo oName (Transitive Dependency)
3NF ClientRental relation
The resulting 3NF relations have the forms:
Client (clientNo, cName)Rental (clientNo, propertyNo, rentStart, rentFinish)PropertyOwner (propertyNo, pAddress, rent, ownerNo)Owner (ownerNo, oName)
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
Client
ClientNo propertyNo rentStart rentFinish
CR76 PG4 1-Jul-00 31-Aug-01
CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
Rental
propertyNo pAddress rent ownerNo
PG4 6 lawrence St,Glasgow 350 CO40
PG16 5 Novar Dr, Glasgow 450 CO93
PG36 2 Manor Rd, Glasgow 370 CO93
PropertyOwner
3NF ClientRental relation
ownerNo oName
CO40 Tina Murphy
CO93 Tony Shaw
Owner
Figure 3NF ClientRental relation
Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if, and only if, every determinant is a candidate key.
BCNF is a refinement to third normal form,
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R
That is every relation in BCNF is also in 3NF but a relation in 3NF is not necessary in BCNF.
3NF to BCNF
Identify all candidate keys in the relation.
Identify all functional dependencies in the relation.
If functional dependencies exists in the relation where their determinants are not candidate keys for the relation, remove the functional dependencies by placing them in a new relation along with a copy of their determinant.
Example of BCNF
fd1 clientNo, interviewDate interviewTime, staffNo, roomNo
(Primary Key)
fd2 staffNo, interviewDate, interviewTime clientNo
(Candidate key)
fd3 roomNo, interviewDate, interviewTime clientNo, staffNo
(Candidate key)
fd4 staffNo, interviewDate roomNo (not a
candidate key)
ClientNo interviewDate interviewTime staffNo roomNo
CR76 13-May-02 10.30 SG5 G101
CR75 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-Jul-02 10.30 SG5 G102
Figure ClientInterview relation
ClientInterview
Example of BCNF(2)
To transform the ClientInterview relation to BCNF, we must remove the violating functional dependency by creating two new relations called Interview and SatffRoom as shown below,
Interview (clientNo, interviewDate, interviewTime, staffNo)StaffRoom(staffNo, interviewDate, roomNo)
ClientNo interviewDate interviewTime staffNoCR76 13-May-02 10.30 SG5
CR75 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5
staffNo interviewDate roomNoSG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102
Interview
StaffRoom
Figure BCNF Interview and StaffRoom relations
Example
Example - UNF to 1NF Relation
Example - 1NF to 2NF
1NF: Property_Inspection (Property_No, IDate, ITime,Paddress, Comments, Staff_No, Sname, Car_Reg)
Full Functional Dependency:(Property_No+IDate)->(ITime, Comments, Staff_No,Sname, Car_Reg)
Partial Dependency:(Property_No+IDate)->(PAddress)
2NF: Prop (Property_No, Paddress) Prop_Inspection (Property_No, IDate, ITime, Comments, Staff_No, Sname, Car_Reg)
Example - 2NF to 3NF
Transitive Dependency in Prop_Inspect: (Property_No+IDate) -> Staff_No Staff_No -> Sname
3NF: Staff (Staff_No, Sname) Prop_Inspection (Property_No, IDate, ITime, Comments, Staff_No, Car_Reg)
Example - 3NF to BCNF
Prop (Property_No,Paddress) Staff (Staff_No, Sname) Prop_Inspection (Property_No, IDate, ITime, Comments,
Staff_No, Car_Reg)
Prop and Staff are already in BCNF.
FDs of Prop_Inspect: (Property_No, IDate)->(ITime, Comments, Staff_No,
Car_Reg) (Staff_No, Idate) -> Car_Reg (Car_Reg, Idate, ITime) -> (Property_No, Comments,
Staff_No) (Staff_No, Idate, ITime) -> (Property_No, Comments)
Example – BCNF
Prop (Property_No,Paddress)
Staff (Staff_No, Sname)
Inspection (Property_No, IDate, ITime, Comments, Staff_No)
Staff_Car (Staff_No, IDate, Car_Reg)
What is Decomposition?
Decomposition – the process of breaking down in parts or elements.
Decomposition in database means breaking tables down into multiple tables
From Database perspective means going to a higher normal form
To break the modules to in smallest one to convert the data models in to a normal forms to avoid redundancies
Decomposition of relation schema
Suppose R is a relation schema R = {A1,A2,A3,….An}
This is decompose into a set of relational schemas by D = {R1,R2,R3,…Rm } ,such that
Ri ⊆ R for 1<= i <=m And R1 ⋃ R2 ⋃ R3….⋃ Rm = R
Ex: gradeInfo(rollNo, studName, course, grade) R1 : gradeInfo(rollNo, course, grade) R2 : studInfo(rollNo, studName)
Process of Decomposition
Decomposition
Important that decompositions are “good”,
Two Characteristics of Good Decompositions
1) Lossless
2) Preserve dependencies
Problem with Decomposition
Given instances of the decomposed
relations,
we may not be able to reconstruct the
corresponding instance of the original relation
– information loss
Example : Problem with Decomposition
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon
R
Model Name Category
a11 Canon
s20 Nikon
a70 Canon
Price Category
100 Canon
200 Nikon
150 Canon
R1 R2
Example : Problem with Decomposition
R1 U R2
Model Name Price Category
a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon
R
Lossy decomposition
In previous example, additional tuples are obtained along with original tuples
Although there are more tuples, this leads to less information
Due to the loss of information, decomposition for previous example is called lossy decomposition or lossy-join decomposition
Lossy decomposition (more example)
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose
T
Functional dependencies:
Employee Branch, Project Branch
Lossy decomposition
Decomposition of the previous relation
Employee Branch
Brown L.A
Green San Jose
Hoskins San Jose
Project Branch
Mars L.A.
Jupiter San Jose
Saturn San Jose
Venus San Jose
T1 T2
Lossy decomposition
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose
Green Saturn San Jose
Hoskins Jupiter San Jose
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose
After Natural Join Original Relation
After Natural Join, we get two extra tuples. Thus, there is loss of information.
What is lossless?
Lossless means functioning without a loss.In other words, retain everything.
Important for databases to have this feature.
Lossless Decomposition Property
R : relationF : set of functional dependencies on RX,Y : decomposition of RDecomposition is lossles if :
X ∩ Y X, that is: all attributes common to both X and Y functionally determine ALL the attributes in X
OR X ∩ Y Y, that is: all attributes common to both X
and Y functionally determine ALL the attributes in Y
In other words, if X ∩ Y forms a superkey of either X or Y, the decomposition of R is a lossless decomposition
Why lossless?
Ensures that attributes involved in the natural join (X ∩ Y) are a candidate key for at least one of the two relations.
This ensures we can never get the situation where false tuples are generated,
as for any value on the join attributes there will be a unique tuple in one of the relations.
A decomposition is lossless if we can recover: R(A,B,C)
R1(A,B) R2(A,C)
R’(A,B,C) should be the same as R(A,B,C)
Must ensure R’ = R
Decompose
Recover
Lossless Decomposition
Lossless Decomposition example
• Sometimes the same set of data is reproduced:
• (Word, 100) + (Word, WP) (Word, 100, WP)• (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB)• (Access, 100) + (Access, DB) (Access, 100, DB)
Name Price Category
Word 100 WP
Oracle 1000 DB
Access 100 DB
Name Price
Word 100
Oracle 1000
Access 100
Name Category
Word WP
Oracle DB
Access DB
Lossy Decomposition• Sometimes it’s not:
• (Word, WP) + (100, WP) = (Word, 100, WP)• (Oracle, DB) + (1000, DB) = (Oracle, 1000, DB)• (Oracle, DB) + (100, DB) = (Oracle, 100, DB)• (Access, DB) + (1000, DB) = (Access, 1000, DB)• (Access, DB) + (100, DB) = (Access, 100, DB)
Name Price Category
Word 100 WP
Oracle 1000 DB
Access 100 DB
Category Name
WP Word
DB Oracle
DB Access
Category Price
WP 100
DB 1000
DB 100
What’swrong?
Ensuring lossless decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp) R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
If A1, ..., An B1, ..., Bm or A1, ..., An C1, ..., Cp
Then the decomposition is lossless
R1(A1, ..., An, B1, ..., Bm)R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)R2(A1, ..., An, C1, ..., Cp)
Note: don’t need both
Dependency preservation
Dependency preservation refers to a specific case of lossless decomposition, such that the normalized relvars are independent of each other
Some lossless decompositions do not exhibit dependency preservation
Let relation R(A,B,C,D) that has dependencies F that include A ➙ B and A ➙ C decomposition: R1(A,B), R2(B,C,D) A ➙ C can not be preserved using only one
relation.
Not possible to preserve each and every dependency in F
But dependency that are preserved are equivalent to F
F dependency of Relation R R decompose in R1,R2,….Rn Dependency partition of F are F1,F2,….,Fn only involve
attributes of R1,R2,..,Rn respectively then
Decomposition have Preserved Dependencies F1⋃ F2 ⋃ .. ⋃ Fn ➙ F
If decomposition does not preserve the dependency than decomposed relation do not satisfy the F or updation may require the join operation to check
Dependency Preserving Decompositions (Contd.)
Decomposition of R into X and Y is dependency preserving
if (FX FY ) + = F +
i.e., if we consider only dependencies in the closure F + that can be checked in X without considering Y, and in Y without considering X, these imply all dependencies in F +.
Important to consider F + in this definition: ABC, A B, B C, C A, decomposed into AB and BC. Is this dependency preserving? Is C A preserved?????
note: F + contains F {A C, B A, C B}, so…
FAB contains A B and B A; FBC contains B C and C B So, (FAB FBC)+ contains C A
Dependency Preservation
Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other
Dependency is preserved in this projection:SC {S#, CITY}CS {CITY, STATUS}
Dependency is not preserved in this one:SC {S#, CITY}CS {S#, STATUS}
Although the second is nonloss, you still cannot update them independently
Dependency Preservation
Ensures we can “easily” check whether a FD X Y is violated during an update to a database:
The projection of an FD set F onto a set of attributes Z, FZ is
{X Y | X Y F +, X Y Í Z}i.e., it is those FDs local to Z’s attributes
A decomposition R1, …, Rk is dependency preserving if F + = (FR1 ... FRk)+
The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join
Example of Lossless and Dependency-Preserving Decompositions
Given relation scheme R(cno, name, street, city, st, zip, item, price)
And FD set cno namename street, citystreet, city ststreet, city zipname, item price
Consider the decomposition R1(cno, name, street, city, st, zip) and R2(cno, name, item, price) Is it lossless? Is it dependency preserving?
What if we replaced the first FD by name, street city?
Comparison of BCNF and 3NF
It is always possible to decompose a relation into a set of relations that are in 3NF such that: the decomposition is lossless the dependencies are preserved
It is always possible to decompose a relation into a set of relations that are in BCNF such that: the decomposition is lossless it may not be possible to preserve dependencies.