09-unit9
DESCRIPTION
DDDDDTRANSCRIPT
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 166
Unit 9 Functional Dependencies and
Normalization for Relational Databases
Structure
9.1 Introduction to Normalization
Objectives
Self Assessment Question(s) (SAQs)
9.2 Information Design Guide Lines for Relational DB
Self Assessment Question(s) (SAQs)
9.3 Normal forms Based on Primary Keys
9.3.1 Second Normal Form (2NF)
9.3.2 Third Normal Form (3NF)
Self Assessment Question(s) (SAQs)
9.4 Boyce Codd Normal Form (BCNF)
Self Assessment Question(s) (SAQs)
9.5 Fourth Normal Form (4NF)
Self Assessment Question(s) (SAQs)
9.6 Normalization using Join Dependencies
Self Assessment Question(s) (SAQs)
9.7 Summary
9.8 Terminal Questions (TQs)
9.9 Multiple Choice Questions (MCQs)
9.10 Answers to SAQs, TQs, and MCQs
9.10.1 Answers to Self Assessment Questions (SAQs)
9.10.2 Answers to Terminal Questions (TQs)
9.10.3 Answers to Multiple Choice Questions (MCQs)
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 167
9.1 Introduction to Normalization
Normalization is the process of building database structures to store data,
because any application ultimately depends on its data structures. If the
data structures are poorly designed, the application will start from a poor
foundation. This will require a lot more work to create a useful and efficient
application. Normalization is the formal process for deciding which attributes
should be grouped together in a relation. Normalization serves as a tool for
validating and improving the logical design, so that the logical design avoids
unnecessary duplication of data, i.e. it eliminates redundancy and promotes
integrity. In the normalization process we analyze and decompose the
complex relations into smaller, simpler and well-structured relations.
Objectives
To know about
o Information Design Guide Lines for Relational DB:
o Normal Forms Based on Primary Keys:
o Second Normal Form (2NF)
o Third Normal Form (3NF )
o Boyce Codd Normal Form (BCNF)
o Fourth Normal Form (4NF)
o Normalization using Join Dependencies
Self Assessment Question(s) (SAQs) (For Section 9.1)
1. Define Normalization. Why do you need it?
9.2 Information Design Guide Lines for relational DB
Some criteria for good and bad relation schemas are:
Semantics of attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 168
Semantics of the Attributes:
Whenever we group attributes to form a relation, we assume that a certain
meaning is associated with the attributes. This meaning is called Semantics,
and specifies how the attribute values in a tuple relate to one another.
E.g.: consider company database schema. The various relations considered
for this database are:
EMPLOYEE f.k
ENAME SSN BDATE ADDRESS DNUMBER
DEPARTMENT f.k
DNAME DNUMBER DMGRSSN
p.k.
Fig. 9.1: Simplified version of the COMPANY relational database schema
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 169
The meaning of the Employee relation is quite simple, each tuple represents
an employee. The Dnumber attribute is a foreign key that represents an
implicit relationship between EMPLOYEE and DEPARTMENT relations.
Guideline-1: design a relation schema so that it is easy to explain its
meaning. Do not combine attributes from multiple entity types and
relationship types into a single relation.
Reducing redundant values on tuples:
Storage space is one of the most important considerations of a relational
schema. Improper grouping of attributes has a significant effect on the
storage space of the relational schema.
Ex: Figure A
Emp.no Emp.Name Salary Address
Figure B
Dept_no Dname D_location
In figure B each department information appears only once in the
department relation.
If we integrate figure (A) and figure (B) as single table Emp_dept.
Figure C: Emp_dept
Emp.no Emp.Name Salary Addr Dept.no D.Name D.loc
There will be serious problem in using Figure C; that is insertion anomalies,
deletion anomalies and modification anomalies.
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 170
Here whenever we are inserting tuples, there maybe n employees in
department 10, Dept.no, D.name, D_loc values are repeated n times, which
leads to data redundancy.
Insertion Anomalies:
It is difficult to insert a ne department that has no employees as yet in the
Emp_dept relation. This causes a problem because Emp.no is the primary
key of Emp_dept. This problem does not occur in the design of fig.(B),
because a department is entered in the DEPARTMENT relation, whether or
not any employee works for it.
Deletion Anomalies:
If we deletie the lost employee of a department from the emp_dept relation,
than the whole information about that department will be lost. This problem
does not occur in the database of fig.(B) because DEPARTMENT tuples are
stored separately.
Modification Anomalies:
In Emp_dept. if we change the value of one of the attributes of a particular
department, say location of department 5, we must update the tuples of
employees who work in that department, otherwise DB will become
inconsistent.
Guide-line 2:
Design DB so that no insertion, deletion or modification anomalies are
present in that relation. If there are any anomalies, note them clearly, so that
proper actions can be taken.
NULL values in tuples:
These include unnecessary attributes in the relation. If many of the
attributes do not take any values, we insert NULL values. This can waste
space at the storage level, and it also leads to problems in understanding
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 171
the meaning of the attributes and specifying join operation. Null's may lead
to counting problems while using aggregate functions.
Guideline 3:
As far as possible avoid using NULL values for attributes in a relation.
Disallowing spurious tuples:
Design relational schema so that they can be joined with equality conditions.
Figure A
Emp_loc
Emp_Name P_loc
Figure B
Emp_project
SSN PNO P_Name P_Loc
If we attempt a natural join operation on figure A and Figure B, the result
produces many more tuples than the actual combination of tuples.
Additional tuples are called Spurious Tuples,_ because they represent
wrong information.
Guideline 4:
Design relation schemas so that they can be joined with equality conditions
on attributes that are either primary key or foreign key. It guarantees that no
spurious tuples are generated.
Self Assessment Question(s) (SAQs) (For section 9.2)
1. List some criteria for good and bad relation schemas
9.3 Normal forms Based on Primary Keys
A relation schema R is in first normal form if every attribute of R takes only
single atomic values. We can also define it as intersection of each row and
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 172
column containing one and only one value. To transform the un-normalized
table (a table that contains one or more repeating groups) to first normal
form, we identify and remove the repeating groups within the table.
E.g. Figure A
Dept.
D.Name D.No D. location
R&D 5 [England, London, Delhi)
HRD 4 Bangalore
Consider the figure that each dept can have number of locations. This is not
in first normal form because D.location is not an atomic attribute. The
dormain of D location contains multivalues.
There is a technique to achieve the first normal form. Remove the attribute
D.location that violates the first normal form and place into separate relation
Dept_location
Ex: Dept Dept_location
Dept.no. D.Name Dept_location Dept_No
5 R&D
6 HRD
9.3.1 Second Normal Form (2 NF)
A second normal form is based on the concept of full functional
dependencey. A relation is in second normal form if every non-prime
attribute A in R is fully functionally dependent on the Primary Key of R.
Emp_Project:
Emp_Project
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 173
Figure 9.2: 2NF and 3 NF, (a) Normalizing EMP_PROJ into 2NF relations
(b) Normalizing EMP_DEPT into 3NF relations
A Partial functional dependency is a functional dependency in which one or
more non-key attributes are functionally dependent on part of the primary
key. It creates a redundancy in that relation, which results in anomalies
when the table is updated.
9.3.2 Third Normal Form (3NF)
This is based on the concept of transitive dependency. We should design
relational schema in such a way that there should not be any transitive
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 174
dependencies, because they lead to update anomalies. A functional
dependence [FD] x->y in a relation schema 'R' is a transitive dependency. If
there is a set of attributes 'Z' Le x->, z->y is transitive. The dependency
SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN-
>Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset[part] of the
key.
According to codd's definition, a relational schema 'R is in 3NF if it satisfies
2NF and no no_prime attribute is transitively dependent on the primary key.
Emp_dept relation is not in 3NF, we can normalize the above table by
decomposing into E1 and E2.
Note: Transitive is a mathematical relation that states that if a relation is true
between the first value and the second value, and between the second
value and the 3rd value, then it is true between the 1st and the 3rd value.
Example 2:
Consider a relation schema 'Lots' which describes the parts of land for sale
in various countries of a state. Suppose there are two candidate keys:
property_ID and {Country_name.lot#}; that is, lot numbers are unique only
within each country, but property_ID numbers are unique across countries
for entire state.
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 175
Based on the two candidate keys property_ID and {country name,Lot} we
know that functional dependencies FD1 and FD2 hold. Suppose the
following two additional functional dependencies hold in LOTS.
FD3: Country_name -> tax_rate
FD4: Area -> price
Here, FD3 says that the tax rate is fixed for a given country coutryname
taxrate, FD4 says that price of a Lot is determined by its area, area
price. The Lots relation schema violates 2NF, because tax_rate is partially
dependent upon candidate key { Country_namelot#} Due to this, it
decomposes lots relation into two relations - lots1 and lots 2.
Lots1 violates 3NF, because price is transitively dependent on candidate
key of Lots1 via attribute area. Hence we could decompose LOTS1 into
LOTS1A and LOTS1B.
A relation schema R is in 3NF when it satisfies the conditions below.
1. It is fully functionally dependent on every key of 'R'
2. It is non_transitively dependent on every key of 'R'
Self Assessment Question(s) (SAQs) (For section 9.3)
1. Define and explain 1 NF.
2. Explain 2-NF.
3. Discuss 3-NF.
9.4 Boyce Codd Normal Form (BCNF)
Database relation are designed so that they are neither partial
dependencies nor transitive dependencies, because these types of
dependencies result in update anomalies. A functional dependency
describes the relationship between attributes in a relation. For example, 'A'
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 176
and 'B' are attributes in relation R. 'B' is functionally dependent on 'A' (A B)
if each value of 'A' is associated with exactly one value of 'B'.
The left_hand side and the right_hand side functional dependency are
sometimes called the determinant and dependent respectively.
A relation is in BCNF if and only if every determinant is a Candidate key.
The difference between the third normal form and BCNF is that for a
functional dependency A B, the third normal form allows this dependency
in a relation if 'B' is a primary_key attribute and 'A' is not a Cndidate key.
Where as in BCNF. 'A' must be Candidate Key. Therefore BCNF is a
stronger form of the third normal form.
PRODUCT (prd#,prdname,price)
Prd#->prodname,price
CUSTOMER (cust#,custname,custaddr)
Cust#->custname,custaddr
ORDER (ord#,cust#mord#,qty,amt)
Ord#->qty,amt
The PRODUCT scheme is in BCNF. Since the prd# is a candidate key,
similarly customer schema is also in BCNF.
The schema ORDER, however is not in BCNF, because ord# is not a super
key for ORDER, i.e. we could have a pair of tuples representing a single
ord#.
For e.g.
(1234,145,13,789)
(1234,123,53,455)
here ord# is not a candidate key. However, the FD ord#->amt is not trivial;
therefore ORDER does not satisfy the definition of CNF. It suffers from the
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 177
problem of repetition of information. This redundancy can be eliminated by
decomposing into ORDER1, ORDER2.
ORDER1(ord#,cust#)
ORDER2(prd#,qty,amt)
Example 2:
Consider for example LOTS relation. It has got a 5 functional dependency
FD1 to FD4, Suppose we have thousands of lots in the relation but the lots
are from only two countries: A and B. suppose lot size in country A is
0.5.0.6.1.0 acres, where as lot size in country B is restricted to
1.1.1.2..2.0 acres. In such a situation we would have additional functional
dependency FD5: area -> country_name. Here FD5 can be represented by
16 tuples in a separate relation R(Area,Country_name), since there are only
16 possible area values. This representation reduces the redundancy of
repeating the same information in thousands of LOTS1A tuples.
Figure 9.3: Boyce-Codd normal form (a) BCNF normalization of LOTS1A with
the functional dependency FD2 being lost in the decomposition
(b) A schematic relation with FDS; it is in 3NF but not in BCNF
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 178
Self Assessment Question(s) (SAQs) (For Section 9.4)
1. Explain the concept of BCNF.
9.5 Fourth Normal Form (4NF)
Multi valued dependencies are based on the concept of first normal form,
which prohibits attributes having a set of values. If we have two or more
multi valued independent attributes in the same relation, we get into a
situation where we have to repeat every value of one of the attributes, with
every value of the other attributes to keep the relation state consistent, and
to maintain independence among the attributes involved. This constraint is
specified by a Multi valued dependency.
Consider a table employee that has the attribute name, project and hobby.
An employee can work in more than one project and can have more than one hobby.
The employees projects and hobbies are independent of one another. A given project or hobby is associated with any number of employees.
To keep the Relation State consistent we must have separate tuples to
represent every combination of employee's project and employees
hobbies.
The drawback of EMPLOYEE relation is redundant data. This redundant
data leads to update anomaly. For example, if we wish to add one more
project on Sybase, so that employ B is handling, then we must add two
more tuples for each hobby. The values Reading and Movie of hobby are
repeated with each value of project. This redundancy is undesirable. One
way to remove redundancy is to decompose EMPLOYEE relation into two
relations PROJECT AND HOBBY.
NOW, if we wish to insert Sybase in PROJECT relation, then there is only
one entry required.
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 179
Definition (MVD): A relation R(X.Y.Z) is said to have multivalued
dependency XY if the set of Y values for a given [X,Z] pair does not depend on Z, but depends only on X, then we say XY "X multi-determines y" or "y is multi-dependent on x". Then such FD is called
Multivalued Dependency (MVD) and is represented by a double arrows
We can also define MVD as, for each value of X there is a set of values for
Y, and a set of values for Z. However, the set of values for Y and Z are
independent of each other.
So wherever two independent one_to_many relationships (A:B and A:C) are
mixed on the same relation, a multivalued dependency arises. Multivalued
dependency can be avoided using the fourth normal form.
ENPLOYEE
NAME PROJECT HOBBY
A Microsoft Cricket
A Oracle Music
A Microsoft Music
A Oracle Cricket
B INTEL Movies
B Sybase Reading
B INTEL Reading
B Sybase Movies
Decomposed relation to reduce redundancy
PROJECT
NAME PROJECT
A Microsoft
A Oracle
B Intel
B Sybase
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 180
HOBBY
NAME PROJECT
A Cricket
A Music
B Movie
B Reading
Fourth Normal Form (4NF) : The definition of 4NF is violated when a relation
has undesirable multivalued dependencies, and hence identify such
relations and decompose into 4NF relations.
Alternate definition: A relation R is said to be in 4NF if for every MVD
AB that holds over R, one of the following is true: B A (trivial), or AB = R or A is a super key
The Employee relation is not in 4NF because of the non-trivial MVDs
(project and hobby attributes of employee relation are independent of each
other) and NAME is not a super key of EMPLOYEE. To make this relation
into 4NF you have to decompose EMPLOYEE to PROJECT AND HOBBY.
Self Assessment Question(s) (SAQs) (For section 9.5)
1. Explain the concept of multivalued dependencies.
9.6 Normalization using join dependencies
Join dependency: the 5NF is also called "Project Join Normal form". It is
important to note that normalization into 5NF is considered very rarely in
practice.
Definiton: relation r is in 5NF, if for all join dependencies at least one of the
following holds:
(R1,R2..Rn) dependency Every Ri is a candidate key for R.
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 181
For an example of a JD, the relation shown in the figure states that CSE
department offers subjects like Data structure and RDBMS, which are taken
by Leela. Similarly, the other departments offer different subjects.
However, no student takes all the subjects and no subject has all students
enrolled in it, and therefore all three fields are needed to represent the
information.
DST
Dept Subject Student
CSE Data structures Leela
Mech Thermodynamics Arjun
CSE RDBMS Leela
Maths Discrete Structure Parvathy
The above relation does not suffer any MVD, because Subject and Student
are not independent. To make this relation into 5NF we decompose it as:
DJ (Dept. Subject)
DS (Dept, Student)
SS (Subject, Student)
The three relations shown above satisfy the rules of 5NF, and also they are
lossless. One of the major differences between 4NF and 5NF is that in a
given relation R(X,Y,Z), if the attributes Y and Z are independent, then it
suffers 4N,F and if they have dependency, then it is in NF. The 4NF gives
generally two relations after decomposition, whereas 5NF gives three
relations to keep all the information of the original relation.
Self Assessment Question(s) (SAQs) (For section 9.6)
1. What do you mean by join dependencies? Explain 5-NF
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 182
9.7 Summary
We have learnt in this unit concepts like
o Information Design Guide Lines for relational DB:
o Normal forms Based on Primary Keys:
o Second Normal Form (2NF)
o Third Normal Form (3NF )
o Boyce Codd Normal Form (BCNF)
o Fourth Normal Form (4NF)
Normalization using Join Dependencies
9.8 Terminal Questions (TQs)
1. Discuss the criteria for bad relational schemas.
2. Discuss the attribute semantics as an information measure of goodness
of a relation schema.
3. Discuss the first, second & third normal forms.
4. Discuss the concept of multi-valued dependency.
9.9 Multiple Choice Questions (MCQs)
1. --------- Eliminates redundancy and promotes integrity
A) Normalization
B) Integration
C) Consistency
D) None of the above
2. A relation schema R is in if every attribute of R takes only single
atomic values.
a) First Normal form
b) Second Normal form
c) Third Normal form
d) None of the above
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 183
3. .is a functional dependency in which one or more non-key attributes
are functionally dependent on part of the primary key. They are
sequential access devices
a) A full functional dependency
b) A Partial functional dependency
c) Functional dependency
d) None of the above
4 A relation r is in .. if for all join dependencies at least one of the
following holds:
(R1,R2..Rn) os atrovoa; kpom-dependency Every Ri is a candidate key for R.
o first normal form
o Second Normal form
o Fifth Normal form
o None of the above
9.10 Answers to SAQs, TQs, and MCQs
9.10.1 Answers to Self Assessment Questions (SAQs)
For Section 9.1
1. Normalization is the process of building database structures to store
data, because any application ultimately depends on its data structures.
(Refer section 9.1)
For Section 9.2
1.
Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.(Refer section 9.2)
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 184
For Section 9.3
1. A relation schema R is in first normal form if every attribute of R takes
only single atomic values. (Refer section 9.3)
2. A second normal form is based on the concept of full functional
dependency. A relation is in second normal form if every non-prime
attribute A in R is fully functionally dependent on the Primary Key of R.
(Refer section 9.3.1)
3. This is based on the concept of transitive dependency. We should
design relational schema in such a way that there should not be any
transitive dependencies because they lead to update anomalies.
(Refer section 9.3.2)
For Section 9.4
1. Database relations are designed so that they neither partial
dependencies nor transitive dependencies, because these types of
dependencies result in update anomalies. A functional dependency
describes the relationship between attributes in a relation. For e.g. 'A'
and 'B' are attributes in relation R. 'B' is functionally dependent on 'A'
(A B) if each value of 'A' is associated with exactly one value of 'B'.
The left_hand side and the right_hand side in a functional dependency
are sometimes called the determinant and dependent respectively.
A relation is in BCNF if and only if every determinant is a Candidate key.
(Refer section 9.4)
For Section 9.5
1. Multi valued dependencies are based on the concept of first normal
form, which prohibits attributes having a set of values.
(Refer section 9.5)
-
Database Management Systems Unit 9
Sikkim Manipal University Page No.: 185
For Section 9.6
1. Join dependency, the 5NF is also called "Project Join Normal form". It is
important to note that normalization into 5NF is considered very rarely in
practice.
Definiton: relation r is in 5NF, if for all join dependencies at least one of
the following holds:
(R1,R2..Rn) dependency Every Ri is a candidate key for R.
(Refer section 9.6)
9.10.2 Answers to Terminal Questions (TQs)
1. Criteria for good and bad relation schemas.
Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.
(Refer section 9.2)
2. Whenever we group attributes to form a relation, we assume that a
certain meaning is associated with the attributes. This meaning is called
Semantics, and specifies how the attribute values in a tuple relate to one
another. (Refer section 9.2)
3. A relation schema R is in first normal form if every attribute of R takes
only single atomic values. (Refer section 9.3)
4. Multi valued dependencies are based on the concept of first normal
form, which prohibits attributes having a set of values.(Refer section 9.5)
9.10.3 Answers to Multiple Choice Questions (MCQs)
1. A
2. A
3. B
4. C