09-unit9

20
Database Management Systems Unit 9 Sikkim Manipal University Page No.: 166 Unit 9 Functional Dependencies and Normalization for Relational Databases Structure 9.1 Introduction to Normalization Objectives Self Assessment Question(s) (SAQs) 9.2 Information Design Guide Lines for Relational DB Self Assessment Question(s) (SAQs) 9.3 Normal forms Based on Primary Keys 9.3.1 Second Normal Form (2NF) 9.3.2 Third Normal Form (3NF) Self Assessment Question(s) (SAQs) 9.4 Boyce Codd Normal Form (BCNF) Self Assessment Question(s) (SAQs) 9.5 Fourth Normal Form (4NF) Self Assessment Question(s) (SAQs) 9.6 Normalization using Join Dependencies Self Assessment Question(s) (SAQs) 9.7 Summary 9.8 Terminal Questions (TQs) 9.9 Multiple Choice Questions (MCQs) 9.10 Answers to SAQs, TQs, and MCQs 9.10.1 Answers to Self Assessment Questions (SAQs) 9.10.2 Answers to Terminal Questions (TQs) 9.10.3 Answers to Multiple Choice Questions (MCQs)

Upload: gaardi

Post on 22-Nov-2015

15 views

Category:

Documents


0 download

DESCRIPTION

DDDDD

TRANSCRIPT

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 166

    Unit 9 Functional Dependencies and

    Normalization for Relational Databases

    Structure

    9.1 Introduction to Normalization

    Objectives

    Self Assessment Question(s) (SAQs)

    9.2 Information Design Guide Lines for Relational DB

    Self Assessment Question(s) (SAQs)

    9.3 Normal forms Based on Primary Keys

    9.3.1 Second Normal Form (2NF)

    9.3.2 Third Normal Form (3NF)

    Self Assessment Question(s) (SAQs)

    9.4 Boyce Codd Normal Form (BCNF)

    Self Assessment Question(s) (SAQs)

    9.5 Fourth Normal Form (4NF)

    Self Assessment Question(s) (SAQs)

    9.6 Normalization using Join Dependencies

    Self Assessment Question(s) (SAQs)

    9.7 Summary

    9.8 Terminal Questions (TQs)

    9.9 Multiple Choice Questions (MCQs)

    9.10 Answers to SAQs, TQs, and MCQs

    9.10.1 Answers to Self Assessment Questions (SAQs)

    9.10.2 Answers to Terminal Questions (TQs)

    9.10.3 Answers to Multiple Choice Questions (MCQs)

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 167

    9.1 Introduction to Normalization

    Normalization is the process of building database structures to store data,

    because any application ultimately depends on its data structures. If the

    data structures are poorly designed, the application will start from a poor

    foundation. This will require a lot more work to create a useful and efficient

    application. Normalization is the formal process for deciding which attributes

    should be grouped together in a relation. Normalization serves as a tool for

    validating and improving the logical design, so that the logical design avoids

    unnecessary duplication of data, i.e. it eliminates redundancy and promotes

    integrity. In the normalization process we analyze and decompose the

    complex relations into smaller, simpler and well-structured relations.

    Objectives

    To know about

    o Information Design Guide Lines for Relational DB:

    o Normal Forms Based on Primary Keys:

    o Second Normal Form (2NF)

    o Third Normal Form (3NF )

    o Boyce Codd Normal Form (BCNF)

    o Fourth Normal Form (4NF)

    o Normalization using Join Dependencies

    Self Assessment Question(s) (SAQs) (For Section 9.1)

    1. Define Normalization. Why do you need it?

    9.2 Information Design Guide Lines for relational DB

    Some criteria for good and bad relation schemas are:

    Semantics of attributes Reducing the redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 168

    Semantics of the Attributes:

    Whenever we group attributes to form a relation, we assume that a certain

    meaning is associated with the attributes. This meaning is called Semantics,

    and specifies how the attribute values in a tuple relate to one another.

    E.g.: consider company database schema. The various relations considered

    for this database are:

    EMPLOYEE f.k

    ENAME SSN BDATE ADDRESS DNUMBER

    DEPARTMENT f.k

    DNAME DNUMBER DMGRSSN

    p.k.

    Fig. 9.1: Simplified version of the COMPANY relational database schema

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 169

    The meaning of the Employee relation is quite simple, each tuple represents

    an employee. The Dnumber attribute is a foreign key that represents an

    implicit relationship between EMPLOYEE and DEPARTMENT relations.

    Guideline-1: design a relation schema so that it is easy to explain its

    meaning. Do not combine attributes from multiple entity types and

    relationship types into a single relation.

    Reducing redundant values on tuples:

    Storage space is one of the most important considerations of a relational

    schema. Improper grouping of attributes has a significant effect on the

    storage space of the relational schema.

    Ex: Figure A

    Emp.no Emp.Name Salary Address

    Figure B

    Dept_no Dname D_location

    In figure B each department information appears only once in the

    department relation.

    If we integrate figure (A) and figure (B) as single table Emp_dept.

    Figure C: Emp_dept

    Emp.no Emp.Name Salary Addr Dept.no D.Name D.loc

    There will be serious problem in using Figure C; that is insertion anomalies,

    deletion anomalies and modification anomalies.

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 170

    Here whenever we are inserting tuples, there maybe n employees in

    department 10, Dept.no, D.name, D_loc values are repeated n times, which

    leads to data redundancy.

    Insertion Anomalies:

    It is difficult to insert a ne department that has no employees as yet in the

    Emp_dept relation. This causes a problem because Emp.no is the primary

    key of Emp_dept. This problem does not occur in the design of fig.(B),

    because a department is entered in the DEPARTMENT relation, whether or

    not any employee works for it.

    Deletion Anomalies:

    If we deletie the lost employee of a department from the emp_dept relation,

    than the whole information about that department will be lost. This problem

    does not occur in the database of fig.(B) because DEPARTMENT tuples are

    stored separately.

    Modification Anomalies:

    In Emp_dept. if we change the value of one of the attributes of a particular

    department, say location of department 5, we must update the tuples of

    employees who work in that department, otherwise DB will become

    inconsistent.

    Guide-line 2:

    Design DB so that no insertion, deletion or modification anomalies are

    present in that relation. If there are any anomalies, note them clearly, so that

    proper actions can be taken.

    NULL values in tuples:

    These include unnecessary attributes in the relation. If many of the

    attributes do not take any values, we insert NULL values. This can waste

    space at the storage level, and it also leads to problems in understanding

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 171

    the meaning of the attributes and specifying join operation. Null's may lead

    to counting problems while using aggregate functions.

    Guideline 3:

    As far as possible avoid using NULL values for attributes in a relation.

    Disallowing spurious tuples:

    Design relational schema so that they can be joined with equality conditions.

    Figure A

    Emp_loc

    Emp_Name P_loc

    Figure B

    Emp_project

    SSN PNO P_Name P_Loc

    If we attempt a natural join operation on figure A and Figure B, the result

    produces many more tuples than the actual combination of tuples.

    Additional tuples are called Spurious Tuples,_ because they represent

    wrong information.

    Guideline 4:

    Design relation schemas so that they can be joined with equality conditions

    on attributes that are either primary key or foreign key. It guarantees that no

    spurious tuples are generated.

    Self Assessment Question(s) (SAQs) (For section 9.2)

    1. List some criteria for good and bad relation schemas

    9.3 Normal forms Based on Primary Keys

    A relation schema R is in first normal form if every attribute of R takes only

    single atomic values. We can also define it as intersection of each row and

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 172

    column containing one and only one value. To transform the un-normalized

    table (a table that contains one or more repeating groups) to first normal

    form, we identify and remove the repeating groups within the table.

    E.g. Figure A

    Dept.

    D.Name D.No D. location

    R&D 5 [England, London, Delhi)

    HRD 4 Bangalore

    Consider the figure that each dept can have number of locations. This is not

    in first normal form because D.location is not an atomic attribute. The

    dormain of D location contains multivalues.

    There is a technique to achieve the first normal form. Remove the attribute

    D.location that violates the first normal form and place into separate relation

    Dept_location

    Ex: Dept Dept_location

    Dept.no. D.Name Dept_location Dept_No

    5 R&D

    6 HRD

    9.3.1 Second Normal Form (2 NF)

    A second normal form is based on the concept of full functional

    dependencey. A relation is in second normal form if every non-prime

    attribute A in R is fully functionally dependent on the Primary Key of R.

    Emp_Project:

    Emp_Project

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 173

    Figure 9.2: 2NF and 3 NF, (a) Normalizing EMP_PROJ into 2NF relations

    (b) Normalizing EMP_DEPT into 3NF relations

    A Partial functional dependency is a functional dependency in which one or

    more non-key attributes are functionally dependent on part of the primary

    key. It creates a redundancy in that relation, which results in anomalies

    when the table is updated.

    9.3.2 Third Normal Form (3NF)

    This is based on the concept of transitive dependency. We should design

    relational schema in such a way that there should not be any transitive

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 174

    dependencies, because they lead to update anomalies. A functional

    dependence [FD] x->y in a relation schema 'R' is a transitive dependency. If

    there is a set of attributes 'Z' Le x->, z->y is transitive. The dependency

    SSN->Dmgr is transitive through Dnum in Emp_dept relation because SSN-

    >Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset[part] of the

    key.

    According to codd's definition, a relational schema 'R is in 3NF if it satisfies

    2NF and no no_prime attribute is transitively dependent on the primary key.

    Emp_dept relation is not in 3NF, we can normalize the above table by

    decomposing into E1 and E2.

    Note: Transitive is a mathematical relation that states that if a relation is true

    between the first value and the second value, and between the second

    value and the 3rd value, then it is true between the 1st and the 3rd value.

    Example 2:

    Consider a relation schema 'Lots' which describes the parts of land for sale

    in various countries of a state. Suppose there are two candidate keys:

    property_ID and {Country_name.lot#}; that is, lot numbers are unique only

    within each country, but property_ID numbers are unique across countries

    for entire state.

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 175

    Based on the two candidate keys property_ID and {country name,Lot} we

    know that functional dependencies FD1 and FD2 hold. Suppose the

    following two additional functional dependencies hold in LOTS.

    FD3: Country_name -> tax_rate

    FD4: Area -> price

    Here, FD3 says that the tax rate is fixed for a given country coutryname

    taxrate, FD4 says that price of a Lot is determined by its area, area

    price. The Lots relation schema violates 2NF, because tax_rate is partially

    dependent upon candidate key { Country_namelot#} Due to this, it

    decomposes lots relation into two relations - lots1 and lots 2.

    Lots1 violates 3NF, because price is transitively dependent on candidate

    key of Lots1 via attribute area. Hence we could decompose LOTS1 into

    LOTS1A and LOTS1B.

    A relation schema R is in 3NF when it satisfies the conditions below.

    1. It is fully functionally dependent on every key of 'R'

    2. It is non_transitively dependent on every key of 'R'

    Self Assessment Question(s) (SAQs) (For section 9.3)

    1. Define and explain 1 NF.

    2. Explain 2-NF.

    3. Discuss 3-NF.

    9.4 Boyce Codd Normal Form (BCNF)

    Database relation are designed so that they are neither partial

    dependencies nor transitive dependencies, because these types of

    dependencies result in update anomalies. A functional dependency

    describes the relationship between attributes in a relation. For example, 'A'

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 176

    and 'B' are attributes in relation R. 'B' is functionally dependent on 'A' (A B)

    if each value of 'A' is associated with exactly one value of 'B'.

    The left_hand side and the right_hand side functional dependency are

    sometimes called the determinant and dependent respectively.

    A relation is in BCNF if and only if every determinant is a Candidate key.

    The difference between the third normal form and BCNF is that for a

    functional dependency A B, the third normal form allows this dependency

    in a relation if 'B' is a primary_key attribute and 'A' is not a Cndidate key.

    Where as in BCNF. 'A' must be Candidate Key. Therefore BCNF is a

    stronger form of the third normal form.

    PRODUCT (prd#,prdname,price)

    Prd#->prodname,price

    CUSTOMER (cust#,custname,custaddr)

    Cust#->custname,custaddr

    ORDER (ord#,cust#mord#,qty,amt)

    Ord#->qty,amt

    The PRODUCT scheme is in BCNF. Since the prd# is a candidate key,

    similarly customer schema is also in BCNF.

    The schema ORDER, however is not in BCNF, because ord# is not a super

    key for ORDER, i.e. we could have a pair of tuples representing a single

    ord#.

    For e.g.

    (1234,145,13,789)

    (1234,123,53,455)

    here ord# is not a candidate key. However, the FD ord#->amt is not trivial;

    therefore ORDER does not satisfy the definition of CNF. It suffers from the

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 177

    problem of repetition of information. This redundancy can be eliminated by

    decomposing into ORDER1, ORDER2.

    ORDER1(ord#,cust#)

    ORDER2(prd#,qty,amt)

    Example 2:

    Consider for example LOTS relation. It has got a 5 functional dependency

    FD1 to FD4, Suppose we have thousands of lots in the relation but the lots

    are from only two countries: A and B. suppose lot size in country A is

    0.5.0.6.1.0 acres, where as lot size in country B is restricted to

    1.1.1.2..2.0 acres. In such a situation we would have additional functional

    dependency FD5: area -> country_name. Here FD5 can be represented by

    16 tuples in a separate relation R(Area,Country_name), since there are only

    16 possible area values. This representation reduces the redundancy of

    repeating the same information in thousands of LOTS1A tuples.

    Figure 9.3: Boyce-Codd normal form (a) BCNF normalization of LOTS1A with

    the functional dependency FD2 being lost in the decomposition

    (b) A schematic relation with FDS; it is in 3NF but not in BCNF

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 178

    Self Assessment Question(s) (SAQs) (For Section 9.4)

    1. Explain the concept of BCNF.

    9.5 Fourth Normal Form (4NF)

    Multi valued dependencies are based on the concept of first normal form,

    which prohibits attributes having a set of values. If we have two or more

    multi valued independent attributes in the same relation, we get into a

    situation where we have to repeat every value of one of the attributes, with

    every value of the other attributes to keep the relation state consistent, and

    to maintain independence among the attributes involved. This constraint is

    specified by a Multi valued dependency.

    Consider a table employee that has the attribute name, project and hobby.

    An employee can work in more than one project and can have more than one hobby.

    The employees projects and hobbies are independent of one another. A given project or hobby is associated with any number of employees.

    To keep the Relation State consistent we must have separate tuples to

    represent every combination of employee's project and employees

    hobbies.

    The drawback of EMPLOYEE relation is redundant data. This redundant

    data leads to update anomaly. For example, if we wish to add one more

    project on Sybase, so that employ B is handling, then we must add two

    more tuples for each hobby. The values Reading and Movie of hobby are

    repeated with each value of project. This redundancy is undesirable. One

    way to remove redundancy is to decompose EMPLOYEE relation into two

    relations PROJECT AND HOBBY.

    NOW, if we wish to insert Sybase in PROJECT relation, then there is only

    one entry required.

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 179

    Definition (MVD): A relation R(X.Y.Z) is said to have multivalued

    dependency XY if the set of Y values for a given [X,Z] pair does not depend on Z, but depends only on X, then we say XY "X multi-determines y" or "y is multi-dependent on x". Then such FD is called

    Multivalued Dependency (MVD) and is represented by a double arrows

    We can also define MVD as, for each value of X there is a set of values for

    Y, and a set of values for Z. However, the set of values for Y and Z are

    independent of each other.

    So wherever two independent one_to_many relationships (A:B and A:C) are

    mixed on the same relation, a multivalued dependency arises. Multivalued

    dependency can be avoided using the fourth normal form.

    ENPLOYEE

    NAME PROJECT HOBBY

    A Microsoft Cricket

    A Oracle Music

    A Microsoft Music

    A Oracle Cricket

    B INTEL Movies

    B Sybase Reading

    B INTEL Reading

    B Sybase Movies

    Decomposed relation to reduce redundancy

    PROJECT

    NAME PROJECT

    A Microsoft

    A Oracle

    B Intel

    B Sybase

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 180

    HOBBY

    NAME PROJECT

    A Cricket

    A Music

    B Movie

    B Reading

    Fourth Normal Form (4NF) : The definition of 4NF is violated when a relation

    has undesirable multivalued dependencies, and hence identify such

    relations and decompose into 4NF relations.

    Alternate definition: A relation R is said to be in 4NF if for every MVD

    AB that holds over R, one of the following is true: B A (trivial), or AB = R or A is a super key

    The Employee relation is not in 4NF because of the non-trivial MVDs

    (project and hobby attributes of employee relation are independent of each

    other) and NAME is not a super key of EMPLOYEE. To make this relation

    into 4NF you have to decompose EMPLOYEE to PROJECT AND HOBBY.

    Self Assessment Question(s) (SAQs) (For section 9.5)

    1. Explain the concept of multivalued dependencies.

    9.6 Normalization using join dependencies

    Join dependency: the 5NF is also called "Project Join Normal form". It is

    important to note that normalization into 5NF is considered very rarely in

    practice.

    Definiton: relation r is in 5NF, if for all join dependencies at least one of the

    following holds:

    (R1,R2..Rn) dependency Every Ri is a candidate key for R.

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 181

    For an example of a JD, the relation shown in the figure states that CSE

    department offers subjects like Data structure and RDBMS, which are taken

    by Leela. Similarly, the other departments offer different subjects.

    However, no student takes all the subjects and no subject has all students

    enrolled in it, and therefore all three fields are needed to represent the

    information.

    DST

    Dept Subject Student

    CSE Data structures Leela

    Mech Thermodynamics Arjun

    CSE RDBMS Leela

    Maths Discrete Structure Parvathy

    The above relation does not suffer any MVD, because Subject and Student

    are not independent. To make this relation into 5NF we decompose it as:

    DJ (Dept. Subject)

    DS (Dept, Student)

    SS (Subject, Student)

    The three relations shown above satisfy the rules of 5NF, and also they are

    lossless. One of the major differences between 4NF and 5NF is that in a

    given relation R(X,Y,Z), if the attributes Y and Z are independent, then it

    suffers 4N,F and if they have dependency, then it is in NF. The 4NF gives

    generally two relations after decomposition, whereas 5NF gives three

    relations to keep all the information of the original relation.

    Self Assessment Question(s) (SAQs) (For section 9.6)

    1. What do you mean by join dependencies? Explain 5-NF

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 182

    9.7 Summary

    We have learnt in this unit concepts like

    o Information Design Guide Lines for relational DB:

    o Normal forms Based on Primary Keys:

    o Second Normal Form (2NF)

    o Third Normal Form (3NF )

    o Boyce Codd Normal Form (BCNF)

    o Fourth Normal Form (4NF)

    Normalization using Join Dependencies

    9.8 Terminal Questions (TQs)

    1. Discuss the criteria for bad relational schemas.

    2. Discuss the attribute semantics as an information measure of goodness

    of a relation schema.

    3. Discuss the first, second & third normal forms.

    4. Discuss the concept of multi-valued dependency.

    9.9 Multiple Choice Questions (MCQs)

    1. --------- Eliminates redundancy and promotes integrity

    A) Normalization

    B) Integration

    C) Consistency

    D) None of the above

    2. A relation schema R is in if every attribute of R takes only single

    atomic values.

    a) First Normal form

    b) Second Normal form

    c) Third Normal form

    d) None of the above

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 183

    3. .is a functional dependency in which one or more non-key attributes

    are functionally dependent on part of the primary key. They are

    sequential access devices

    a) A full functional dependency

    b) A Partial functional dependency

    c) Functional dependency

    d) None of the above

    4 A relation r is in .. if for all join dependencies at least one of the

    following holds:

    (R1,R2..Rn) os atrovoa; kpom-dependency Every Ri is a candidate key for R.

    o first normal form

    o Second Normal form

    o Fifth Normal form

    o None of the above

    9.10 Answers to SAQs, TQs, and MCQs

    9.10.1 Answers to Self Assessment Questions (SAQs)

    For Section 9.1

    1. Normalization is the process of building database structures to store

    data, because any application ultimately depends on its data structures.

    (Refer section 9.1)

    For Section 9.2

    1.

    Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.(Refer section 9.2)

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 184

    For Section 9.3

    1. A relation schema R is in first normal form if every attribute of R takes

    only single atomic values. (Refer section 9.3)

    2. A second normal form is based on the concept of full functional

    dependency. A relation is in second normal form if every non-prime

    attribute A in R is fully functionally dependent on the Primary Key of R.

    (Refer section 9.3.1)

    3. This is based on the concept of transitive dependency. We should

    design relational schema in such a way that there should not be any

    transitive dependencies because they lead to update anomalies.

    (Refer section 9.3.2)

    For Section 9.4

    1. Database relations are designed so that they neither partial

    dependencies nor transitive dependencies, because these types of

    dependencies result in update anomalies. A functional dependency

    describes the relationship between attributes in a relation. For e.g. 'A'

    and 'B' are attributes in relation R. 'B' is functionally dependent on 'A'

    (A B) if each value of 'A' is associated with exactly one value of 'B'.

    The left_hand side and the right_hand side in a functional dependency

    are sometimes called the determinant and dependent respectively.

    A relation is in BCNF if and only if every determinant is a Candidate key.

    (Refer section 9.4)

    For Section 9.5

    1. Multi valued dependencies are based on the concept of first normal

    form, which prohibits attributes having a set of values.

    (Refer section 9.5)

  • Database Management Systems Unit 9

    Sikkim Manipal University Page No.: 185

    For Section 9.6

    1. Join dependency, the 5NF is also called "Project Join Normal form". It is

    important to note that normalization into 5NF is considered very rarely in

    practice.

    Definiton: relation r is in 5NF, if for all join dependencies at least one of

    the following holds:

    (R1,R2..Rn) dependency Every Ri is a candidate key for R.

    (Refer section 9.6)

    9.10.2 Answers to Terminal Questions (TQs)

    1. Criteria for good and bad relation schemas.

    Semantics of attributes Reducing the Redundant values in tuples Reducing the null values in tuples Disallowing spurious tuples.

    (Refer section 9.2)

    2. Whenever we group attributes to form a relation, we assume that a

    certain meaning is associated with the attributes. This meaning is called

    Semantics, and specifies how the attribute values in a tuple relate to one

    another. (Refer section 9.2)

    3. A relation schema R is in first normal form if every attribute of R takes

    only single atomic values. (Refer section 9.3)

    4. Multi valued dependencies are based on the concept of first normal

    form, which prohibits attributes having a set of values.(Refer section 9.5)

    9.10.3 Answers to Multiple Choice Questions (MCQs)

    1. A

    2. A

    3. B

    4. C