db systems - data modeling

1

Data modeling

Semester 1 2014-2015

Dr. Vo Thi Ngoc Chau

([email protected])

Faculty of Computer Science & Engineering

Ho Chi Minh City University of Technology

2

Main references

[1] R. Elmasri, S. R. Navathe, Fundamentals of Database Systems- 4th Edition, Pearson- Addison Wesley, 2003.

[2] H. G. Molina, J. D. Ullman, J. Widom, Database System Implementation, Prentice-Hall, 2000.

[3] H. G. Molina, J. D. Ullman, J. Widom, Database Systems: The Complete Book, Prentice-Hall, 2002

[4] A. Silberschatz, H. F. Korth, S. Sudarshan, Database System Concepts 3rd Edition, McGraw-Hill, 1999.

[5] R. Ramakrishnan, J. Gehrke, Database Management Systems- 2rd Edition, McGraw-Hill.

[6] M. P. Papazoglou, S. Spaccapietra, Z. Tari, Advances in Object-Oriented Data Modeling, MIT Press, 2000.

[7]. G. Simsion, Data Modeling: Theory and Practice, Technics Publications, LLC, 2007.

3

Content

2.1. Concepts

2.2. Why is data modeling important?

2.3. Data modeling with entity-

relationship and relational models

2.4. Dependencies and normalization

2.5. Conclusion

2.1. Concepts

Data modeling

Conceptual data model

Entity relationship model

Representational data model

Relational data model

Database design

Relational database design 4

Data modeling

Modeling - Cambridge dictionary

Model = a representation of something, either

as a physical object which is usually smaller

than the real object, or as a simple description

of the object which might be used in calculations

Modeling = constructing a representation of

something, either as a physical object which is

usually smaller than the real object, or as a

simple description of the object which might be

used in calculations 5

Data modeling

Data modeling [7], pp. 31-32

formalizing and representing the data structures of reality

Shoval and Frumermann, 1994, p28

a representation of the things of significance to an enterprise and the relationships among those things

Hay, 1996a

an attempt to capture the essence of things both concrete and abstract

Keuffel, 1996

6

Data modeling


an abstract representation of the data about entities, events, activities, and their associations within an organization

McFadden, Hoffer et al., 1999

The core idea underlying all the definitions is the same: a data model is used for describing entities and their relationships within a core domain.

Topi and Ramesh, 2002

7

Data modeling


Data modeling is generally viewed as a design activity

Srinivasan and Teeni, 1990

an activity that involves the creation of abstractions

Davydov, 1994

the art and science of arranging the structure and relationship of data

McComb, 2004, p293

data modeling is a design discipline

Simsion and Witt, 2005, p7 8


A data model (aka semantic data model)

Provides concepts close to the way many users perceive data

Provides the concepts essential for supporting the application environment at a very high non-system-specific level

Used for a conceptual schema of a database from data requirements

Example: entity-relationship model

9


Characteristics of a conceptual data model

Expressiveness

Distinctions between data, relationships, constraints

Simplicity

Simple enough for an end user to use and understand an easy diagrammatic notation

Minimality

A small number of basic concepts that are distinct and orthogonal in their meaning

Formality

Concepts must be formally defined state criteria for the validity of a schema in the model

Unique interpretation

Complete and unambiguous semantics for each modeling construct 10

S. Navathe. Evolution of data modeling for databases. Communications of the ACM 35(9)(1992) 112-123.

Representational data model

A data model (aka implementation/logical data model)

Provides concepts understood by end-users but not too far removed from the way data is organized within the computer

Hides some details of data storage but able to be implemented on a computer system in a direct way (in some DBMS)

Example: relational data model

11

Database design

Design the logical and physical structure of one or

more databases to accommodate the information

needs of the users in an organization for a defined

set of applications

The overall database design activity has to undergo

a systematic process called the design methodology,

whether the target database is managed by an

RDBMS, ODBMS, or ORDBMS, ...

The result of the design activity is a rigidly defined

database schema that cannot easily be modified

once the database is implemented. 12

Database design

Goals:

Satisfy the information content requirements of

the specified users and applications

Provide a natural and easy-to-understand

structuring of the information

Support processing requirements and any

performance objectives, such as response time,

processing time, and storage space 13

Database design

14

Figure 2.1 Phases of database

design and implementation for

large databases

[1], pp. 368

Database design

Six main phases of the overall database

design and implementation process

Requirements collection and analysis

Conceptual database design

Choice of a DBMS

Data model mapping (also called logical

database design)

Physical database design

Database system implementation and tuning 15

Database design Phase 1: Requirements Collection and Analysis

Identify the major application areas and user groups that will use the database or whole work will be affected by the database

Study and analyze existing documentation concerning the applications

Study the current operating environment and planned use of the information

Analyze the types of transactions and their frequencies as well as of the flow of information within the system

Study geographic characteristics regarding users, origin of transactions, destination of reports,

Specify the input and output data for the transactions

Collect written responses to sets of questions from the potential database users or user groups

Users priorities and the importance users place on various applications

Requirements from users and applications of the information system that will interact with the database system

Time-consuming but crucial to the success of the information system

Know and analyze the expectations of the users and the intended uses of the database in as much detail as possible 16

Database design

Phase 2: Conceptual Database Design

Phase 2a: Conceptual Schema Design

Examine the data requirements resulting from Phase 1

Produce a conceptual database schema in a DBMS-independent high-level data model

Phase 2b: Transaction Design

Design the characteristics of known database transactions (applications) in a DBMS-independent way to ensure that the database schema will include all the information required by these transactions

Identify transactions input/output and functional behavior

Group transactions into three categories: retrieval transactions, update transactions, mixed transactions 17

Database design

Phase 2: Conceptual Database Design


Goal = a complete understanding of the database structure, meaning (semantics), interrelationships, and constraints

Identify: entity types, relationship types, attributes, key attributes, cardinality and participation constraints on relationships, weak entity types, specialization/generalization hierarchies

Approaches

The centralized (or one-shot) schema design approach

The view integration approach 18

Database design


The centralized (or one-shot) schema design approach

The requirements of the different applications and user groups from Phase 1 are merged into a single set of requirements before schema design begins.

A single schema corresponding to the merged set of requirements is then designed.

The DBA is responsible for deciding how to merge the requirements and for designing the conceptual schema for the whole database

Once the conceptual schema is designed and finalized, external schemas for the various user groups and applications can be specified by the DBA.

19

Database design


The view integration approach

The requirements are not merged.

A schema (or view) is designed for each user group or application based only on its own requirements.

Each user group or application

These schemas are merged or integrated into a global conceptual schema for the entire database.

The DBA

The individual views can be reconstructed as external schemas after view integration.

20

Database design

Phase 3: Choice of a DBMS

Technical factors

Suitability & type of the DBMS for the task

Storage structures and access paths, the user and programmer interfaces, high-level query languages, development tools, architectural options, supported by DBMS

DBMS portability among different types of hardware

Non-technical factors

Cost: software acquisition cost, maintenance cost, hardware acquisition cost, database creation and conversion cost, personnel cost, training cost, operating cost

Availability of vendor services

21

Database design

Phase 4: Data Model Mapping (Logical

Database Design)

Create a conceptual schema and external

schemas in the data model of the selected DBMS

The result of this phase should be DDL

statements in the language of the chosen DBMS

that specify the conceptual and external level

schemas of the database system. 22

Database design

Phase 5: Physical Database Design

Restricted to choosing the most appropriate structures for the database files from among the options offered by that DBMS

Choose specific storage structures and access paths for the database files

Response time

Space utilization

Transaction throughput

Estimate record size and number of records in each database file

Estimate the update and retrieval patterns for the file cumulatively from all the transactions

Estimate the file growth, either in the record size because of new attributes or in the number of records 23

Database design

Phase 6: Database System Implementation and Tuning

Create the database schemas and (empty) database files

Responsibility of the DBA in conjunction with the database designers

Reformat the data for loading into the new database if needed

Load/populate with the data if needed

Implement database transactions referring to the conceptual specifications of transaction, then write and test program code with embedded DML commands

Responsibility of the application programmers

Database tuning continues as long as the database is in existence, as long as performance problems are discovered, and while the requirements keep changing.

24

Relational database design

Two main approaches

A top-down design technique

Designing a conceptual schema in a high-level data model

Mapping the conceptual schema into a relational database schema

Analyzing each of the resulting relations based on the functional dependencies and assigned primary keys

A bottom-up design technique (relational synthesis)

View relational database schema design strictly in terms of functional and other types of dependencies specified on the database attributes

Synthesize the relation schemas applying a normalization algorithm

25

2.2. Why is data modeling important?

Leverage

Make programming simpler and cheaper

Poor data organization can be expensive to fix.

Conciseness

Take more directly to the heart of the business requirements

Data quality

Problems with data quality can be traced back to a lack of consistency in:

Defining and interpreting data

Implementing mechanisms to enforce the definitions

A key role in achieving good data quality by establishing a common understanding of what is to be held in each table and column and how it is to be interpreted

26 G. C. Simsion, G. C. Witt, Data modeling essentials 3rd edition, Elsevier Inc, 2005, pp. 8-10.

2.3. Data modeling with entity-relationship and relational models

Entity-relationship model


Data model mapping

27

Database design

28



large databases

[1], pp. 368

Entity Relationship Model

P. P-S. Chen. The Entity-

Relationship Model Toward

a Unified View of Data. ACM

Transactions on Database

Systems 1(1)(March 1976)

9-36.

Entity Relationship Model P. P-S. Chen. The Entity-Relationship Model Toward a Unified View of Data. ACM Transactions on Database Systems 1(1)(March 1976) 9-36.

The entity-relationship model adopts the more natural view that the real world consists of entities and relationships.

The model can achieve a high degree of data independence and is based on set theory and relation theory.

The entity-relationship model can be used as a basis for a unified view of data.

A special diagrammatric technique, the entity-relationship diagram, is introduced as a tool for database design.

29


30 Fig. 1. Analysis of data models using multiple levels of logical views


The entity-relationship (ER) modeling

concepts

Entity types

Relationship types

Attributes

Keys

Structural constraints

31


32

Symbol Meaning Example

Entity type Employees

Weak entity type Dependents of each employee

Relationship type Employee

works_on Project

Identifying relationship of the weak entity type

Dependents dependents_of

Employees

Employee

Dependent

works_on

dependents_of


33

Symbol Meaning Example

Attribute Name of an employee

Key Attribute Distinct identifier of

an employee

Multivalued Attribute

Phone numbers of an employee

Composite Attribute

Address (Street, District, City) of an

employee

Derived Attribute Age of an employee

(derived from date of birth)

Name

EmployeeID

PhoneNumber

Address

Age

Street District City


34

Constraints

35

Figure 3.2. An ER schema diagram for the COMPANY database [1], pp. 54.

36

Figure 3.18. An ER diagram for a BANK database schema [1], pp. 81.

a. List the entity types

b. Is there a weak entity type? If so, give its name, partial key, and identifying relationship

c. What constraints do the partial key and the identifying relationship of the weak entity type specify in this diagram?

d. List the names of all relationship types, and specify the (min, max) constraint on each participation of an entity type in a relationship type.

Enhanced Entity Relationship Modeling

Class/subclass relationships and type inheritance

The relationship between a superclass and any one of its subclasses

Specialization

Define a set of subclasses of an entity type

Establish additional specific attributes with each subclass

Establish additional specific relationship types between each subclass and other entity types or other subclasses

Generalization

A reverse process of abstraction in which we suppress the differences among several entity types

Identify their common features

Generalize them into a single superclass of which the original entity types are special subclasses

37

38 Figure 4.1. EER diagram notation to represent subclasses and specialization. [1], pp. 87.

total specialization partial specialization

39

Figure 4.3. Generalization [1], pp. 91.

40 Figure 4.5. EER diagram notation for overlapping (nondisjoint) specialization. [1], pp. 93.

- Disjoint (d): the subclasses of the specialization must be disjoint.

- Overlapping (o): the subclasses are not constrained to be disjoint, their sets of entities may overlap.

41

Figure 4.7. A specialization lattice with multiple inheritance for a UNIVERSITY database

[1], pp. 96.

a. List superclasses, subclasses

of each superclass

b. List class/subclass

relationships

c. Describe constraints on each

specialization

Database design

42



large databases

[1], pp. 368

A representational data

model supported by the

chosen DBMS

Relational Data Model

E. F. Codd. A Relational

Model of Data for Large

Shared Data Banks.

Communications of the ACM

13(6)(June, 1970) 377-387.

Data model E. F. Codd. Data models in database management, ACM, 1980.

A combination of three following components

(1). A collection of data structure types (the building

blocks of any database that conforms to the model);

(2). A collection of operators or inferencing rules, which

can be applied to any valid instances of the data types

listed in (1), to retrieve or derive data from any parts

of those structures in any combinations desired;

(3). A collection of general integrity rules, which

implicitly or explicitly define the set of consistent

database states or changes of state or both -- these

rules may sometimes be expressed as insert-update-

delete rules 43

Relational Data Model E. F. Codd. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13(6)(June, 1970) 377-387.

The term relation is used here in its accepted mathematical sense.

A relational database based on the relational data model is a collection of relations.

Given sets S1, S2, , Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on.

Sj is the jth domain of R.

R is said to have degree n.

More concisely, R is a subset of the Cartesian product S1xS2xxSn. 44


45

Fig. 1. A relation of degree 4

A relation of degree 4, called supply, which reflects the shipments-in-progress

of parts from specified suppliers to specified projects in specified quantities.


An n-ary relation R has the following properties:

Each row represents an n-tuple of R.

The ordering of rows is immaterial.

All rows are distinct.

The ordering of columns is significant it corresponds to the ordering S1, S2, , Sn of the domains on which R is defined.

The significance of each column is partially conveyed by labeling it with the name of the corresponding domain.

46













delete rules 47


Operations on relations

Usual set operations: union, intersection, minus

Compatible relations

Projection ()

Select certain columns of a relation (striking out the others)

Selection ()

Select a subset of the tuples from a relation that satisfy a selection condition

Cartesian product (cross product) (x)

Combine tuples from two relations in a combinatorial fashion

Join (inner/outer join, equijoin, natural join)

Combine related tuples from two relations into single tuples


50


51

T R : S













delete rules 52


Constraints inherent in the data model

Domain constraints

Key constraints

Constraints on nulls

Entity integrity constraints

Referential integrity constraints

53


Domain constraints

Within each tuple, the value of each attribute A must be an atomic value from the domain dom(A).

Key constraints

1. Two distinct tuples in any state of the relation cannot have identical values for (all) the attributes in the key.

2. It is a minimal superkey that is, a superkey from which we cannot remove any attributes and still have the uniqueness constraint in condition 1 hold.

Superkey of the relation schema R specifies a uniqueness constraint that no two distinct tuples in any state r of R can have the same value for the superkey. 54


Key constraints

A relation schema may have more than one key.

Candidate keys: primary key and secondary keys

Constraints on nulls

Specify whether null values are or are not permitted on attributes

Entity integrity constraint

No primary key value can be null.

Referential integrity constraint

Specify a referential integrity constraint between the two relation schemas R1 and R2 55


Referential integrity constraint

A set of attributes FK (foreign key) in relation

schema R1 is a foreign key of R1 that references

relation R2 if it satisfies the following two rules:

The attributes in FK have the same domain(s) as the

primary key attributes PK of R2; the attributes FK are

said to reference or refer to the relation R2.

A value of FK in a tuple t1 of the current state r1(R1)

either occurs as a value of PK for some tuple t2 in the

current state r2(R2) or is null. In the former case, we

have t1[FK] = t2[PK], and we say that the tuple t1

references or refers to the tuple t2. 56


57

Relation Employee

Relation schema: EMPLOYEE (FNAME, MINIT, LNAME, SSN,

BDATE, ADDRESS, SEX, SALARY, SUPERSSN, DNO)

Attributes = FNAME, MINIT, LNAME, SSN,

Domain of Attribute SEX = {F, M}

Domain of SALARY = a set of positive integer numbers

Primary key = SSN

Foreign key = SUPERSSN

58

The SQL standard has gone through a number of revisions.

http://en.wikipedia.org/wiki/SQL (Accessed on 09/05/2012)

High-level database language on relational databases: SQL-86, SQL-89, SQL-92, SQL-99,

SQL (structured query language)

Data Definition Language (DDL): Create,

Alter, Drop

Data Manipulation Language (DML): Select,

Insert, Update, Delete

Data Control Language (DCL): Commit,

Rollback, Grant, Revoke

59


60

CREATE TABLE TableName

{(colName dataType [NOT NULL] [UNIQUE]

[DEFAULT defaultOption]

[CHECK searchCondition] [,...]}

[PRIMARY KEY (listOfColumns),]

{[UNIQUE (listOfColumns),] [,]}

{[FOREIGN KEY (listOfFKColumns)

REFERENCES ParentTableName [(listOfCKColumns)],

[ON UPDATE referentialAction]

[ON DELETE referentialAction ]] [,]}

{[CHECK (searchCondition)] [,] })


61

SELECT [DISTINCT | ALL]

{* | [columnExpression [AS newName]] [,...] }

FROM TableName [alias] [, ...]

[WHERE condition]

[GROUP BY columnList]

[HAVING condition]

[ORDER BY columnList]

Database design

62



large databases

[1], pp. 368

Data model mapping

from a conceptual

schema based on the ER

model to a relational

database schema based

on the relational data

model

Data model mapping

63

Table 7.1. Correspondence between ER and Relational Models

[1], pp. 198.

Data model mapping

Mapping of specialization or generalization

Convert each specialization with m subclasses {S1, S2, , Sm} and (generalized) superclass C, where the attributes of C are {k, a1, , an} and k is the (primary) key, into relation schemas using one of the four following options:

Multiple relations Superclass and subclasses

Multiple relations Subclass relations only (total)

Single relation with one type attribute (disjoint)

Single relation with multiple type attributes (overlapping) 64

65

Figure 4.3. Generalization [1], pp. 91.

Using Multiple relations Subclass relations only (total) Figure 7.4, pp. 200.

66

Figure 7.4. Using Single relation with multiple type attributes with Boolean fields MFlag and PFlag. [1], pp. 200.

Data model mapping

Mapping algorithms

Automatically create a relational schema from a conceptual schema design

Notes on data model mapping

Slightly different for other post-relational data models as compared to the relational data model

Constraints on databases

Inherent model-based constraints

Schema-based constraints

Application-based constraints 67

Query

72

Example: For every project located in Stafford, retrieve the project number, the controlling department number and the department managers last name, address and birthdate.

SQL query:

Relational algebra expression:

SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE

FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E

WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND

P.PLOCATION=STAFFORD;

PNUMBER, DNUM, LNAME, ADDRESS, BDATE(((PLOCATION=STAFFORD(PROJECT))

DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))

Query: For each project on which more than two employees work, retrieve the project number, project name, and the number of employees who work on that project.

73

SELECT PNUMBER, PNAME, COUNT (*) AS PROJ_NUMBER

FROM PROJECT, WORKS_ON

WHERE PNUMBER=PNO

GROUP BY PNUMBER, PNAME

HAVING COUNT (*) > 2

Temp (PNUMBER, PNAME, PROJ_NUMBER) PNUMBER, PNAME COUNT *

(PNUMBER=PNO(PROJECT X WORKS_ON)) Result PROJ_NUMBER > 2 (Temp)

Query: retrieve the last name and first name of each employee

whose salary is greater than the max salary in department 5.

74

DNO = 5

EMPLOYEE

MAX SALARY

SALARY > C

EMPLOYEE

LNAME, FNAME

Data model mapping

Informal measures of quality for relation schema design

Semantics of the attributes

How to interpret the attribute values stored in a tuple of the relation how the attribute values in a tuple relate to one another

Reducing the redundant values in tuples

Storage space & update anomalies

Reducing the null values in tuples

Multiple interpretations of nulls

Disallowing the possibility of generating spurious tuples

Join relations with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

75


Functional dependency (FD)

The single most important concept in relational schema design theory

A constraint between two sets of attributes from the database

A property of the semantics or meaning of the attributes.

Examples

SSN ENAME

The value of an employees social security number (SSN) uniquely determines the employee name (ENAME)

PNUMBER {PNAME, PLOCATION}

{SSN, PNUMBER} HOURS 76


Functional dependency (FD)

A constraint between two sets of attributes from the database

A constraint on the possible tuples that can form a relation state r of a relation schema R = {A1, A2, , An}

Denoted by XY

X, Y: two sets of attributes that are subsets of R

For any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].

The values of Y of a tuple in r depend on (are determined by) the values of X.

The values of X of a tuple uniquely (functionally) determine the values of Y.

If XY in R, this does not say whether or not YX in R. 77


Armstrongs inference rules (IR1-3) and others (IR4-6)

78


Closure F+ of a set of functional dependencies

F specified on relation schema R

F+, the closure of F, is formally the set of all

dependencies that include F as well as all

dependencies that can be inferred from F.

79


Closure F+ of a set of functional dependencies F

F = {SSN{ENAME, BDATE, ADDRESS, DNUMBER},

DNUMBER{DNAME, DMGRSSN}}

F+ = {SSN{ENAME, BDATE, ADDRESS, DNUMBER},

DNUMBER{DNAME, DMGRSSN}, SSNSSN, ,

DNUMBER DNAME, SSN{DNAME, DMGRSSN}} 80


Closure F+ of a set of functional dependencies F

F = {{SSN, PNUMBER}HOURS, SSNENAME,

PNUMBER{PNAME, PLOCATION}}

F+ = ??? 81


Closure X+ of a set of attributes X under F

where X is a set of attributes that appears as a

left-hand side of a functional dependency in F

X+, the closure of X under F, is the set

of attributes that are functionally

determined by X based on F.

82





83





F = {{SSN, PNUMBER}HOURS, SSNENAME,

PNUMBER{PNAME, PLOCATION}}

{SSN, PNUMBER}+ = {SSN, PNUMBER, HOURS, ENAME,

PNAME, PLOCATION}

{SSN}+ = {SSN, ENAME}

{PNUMBER}+ = {PNUMBER, PNAME, PLOCATION} 84





F = {{A, B}{C}, {A}{D, E}, {B}{M}, {M}{G, H},

{D}{I, J}}

{A, B}+ = ???

{A}+ = ???

{B}+ = ???

{M}+ = ???, and {D}+ = ??? 85


Identifying a key K of R based on a set of given functional dependencies F

86


Identifying a key K of R based on a set of

given functional dependencies F

R = (A, B, C, D, E, G)

F = {CE, AEG, EAD}

K = ???

87


Identifying a key K of R based on a set of given functional dependencies F

R = (A, B, C, D, E, G)

F = {CE, AEG, EAD}

K = {A, B, C, D, E, G}

(K-{A})+ = {B, C, D, E, G, A} K = {B, C, D, E, G}

(K-{B})+ = {C, D, E, G, A} K = {B, C, D, E, G}

(K-{C})+ = {B, D, E, G, A} K = {B, C, D, E, G}

(K-{D})+ = {B, C, E, G, A, D} K = {B, C, E, G}

(K-{E})+ = {B, C, G, E, A, D} K = {B, C, G}

(K-{G})+ = {B, C, E, A, D, G} K = {B, C} 88


Identifying a key K of R based on a set of


R = (A, B, C, E, G)

F = {AB, BCE, EGA}

K = ???

89


Identifying all keys K of R based on a set of given functional dependencies F

Find N = U - fFright(f)

N is a set of the attributes not in the right side of any functional dependency in F. U is a set of all the attributes of R.

Find N+, the closure of N under F

If N+ = U then there is only one key K = N.

Otherwise, find D = fFright(f) - fFleft(f)

D is a set of the attributes only in the right side of the functional dependencies in F.

Find L = U (N+D)

L is a set of the attributes neither in the right side nor functionally determined by N.

Let Li be any subset of L. Find (NLi)+ under F.

If (NLi)+ = U then K = NLi.

Ki, Kj such that Ki Kj, remove Kj. 90


Identifying all keys K of R based on a set of given functional dependencies F

R = (A, B, C, E, G)

F = {AB, BCE, EGA}

All keys K ???

91


R = (A, B, C, E, G)

F = {AB, BCE, EGA}

N = U - fFright(f) = {C, G}

N+ = {C, G}

D = fFright(f) - fFleft(f) =

L = U (N+D) = {A, B, E}

Subsets of L: L1 = {A}, L2 = {B}, L3 = {E}, L4 = {A, B}, L5 = {A, E}, L6 = {B, E}, L7 = {A, B, E}

(NL1)+ = {C, G, A, B, E} K1 = {C, G, A}

(NL2)+ = {C, G, B, E, A} K2 = {C, G, B}

(NL3)+ = {C, G, E, A, B} K3 = {C, G, E} 92


Identifying all keys K of R based on a set of


R = (A, B, C, E, G)

F = {ABC, CGE, BG, EA}

All keys K ???

93


Membership of a functional dependency XY

with respect to a set of functional

dependencies F, i.e. XY is inferred from F,

denoted by F XY

A functional dependency XY is inferred from a set of

dependencies F specified on R if XY holds in every

legal relation state r of R; that is, whenever r satisfies

all the dependencies in F, XY also holds in r. 94





denoted by F XY

Check if F XY

Find X+, the closure of X under F

If Y X+ then F XY

95





denoted by F XY

F = {A C, AC D, E AD, E H}

F A CD

A+ under F = {A, C, D} {C, D} F A CD

F E AH ??? 96


A set of functional dependencies F is said to

cover another set of functional dependencies

E if every FD in E is also in F+; that is, if every

dependency in E can be inferred from F.

Alternatively, E is covered by F.

Two sets of functional dependencies E and F

are equivalent if E+ = F+; that is, every

dependency in E can be inferred from F and

every dependency in F can be inferred from E;

that is also, E covers F and F covers E. 97


F = {A C, AC D, E AD, E H}

G = {A CD, E AH}

Are F and G equivalent?

* Check if F covers G.

* Check if G covers F.

98


The minimal cover F of a set of functional

dependencies E is a set of functional

dependencies that satisfies the property that

every dependency in E is in the closure F+ of

F. In addition, this property is lost if any

dependency from the set F is removed; F

must have no redundancies in it, and the

dependencies in F are in a standard form. 99



dependencies E

1. Every dependency in F has a single attribute for its right-

hand side.

2. We cannot replace any dependency X A in F with a

dependency Y A, where Y is a proper subset of X, and still

have a set of dependencies that is equivalent to F.

3. We cannot remove any dependency from F and still have a

set of dependencies that is equivalent to F. 100


The minimal cover F of a set of functional dependencies E

101



dependencies E

E = {A B, ABCD I, IJ G, IJ H, ACDJ GI}

The minimal cover F of E ???

102


Normalization

A process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of:

(1) minimizing redundancy

(2) minimizing the insertion, deletion, and update anomalies

Normal form of a relation

The highest normal form condition that the relation meets the degree to which it has been normalized

103


The process of normalization through

decomposition produces the relational

schemas that have:

The lossless join or non-additive join property

No spurious tuple generation problem occurs.

The dependency preservation property

Each FD is represented in some individual relation.

104


105

Table 10.1. Summary of normal forms based on primary keys and corresponding normalization ([1], pp. 321)


Pure relations based on the relational data

model are all in 1NF.

Relations in 2NF have non-key attributes

functionally dependent on the whole

primary key.

Relations in 3NF have non-key attributes

only functionally dependent on the whole

primary key. 106

107

Primary key = {SSN, PNUMBER}

Non-key attributes = HOURS, ENAME, PNAME, PLOCATION

FD1: {SSN, PNUMBER} HOURS

FD2: SSN ENAME

FD3: PNUMBER {PNAME, PLOCATION}

Functionally dependent on a part of the primary key

108

Primary key = SSN

Non-key attributes = ENAME, BDATE, ADDRESS,

DNUMBER, DNAME, DMGRSSN

FD1: SSN {ENAME, BDATE, ADDRESS, DNUMBER}

FD2: DNUMBER {DNAME, DMGRSSN} Non-key attributes are functionally dependent on another non-key attribute.


Relation: LOTS (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE)

Primary key: PROPERTY_ID#

Secondary key: {COUNTY_NAME, LOT#}

Functional dependencies

FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA, PRICE, TAX_RATE}

FD3: COUNTY_NAME TAX_RATE

FD4: AREA PRICE 109

110

Is LOTS really in 1NF?

Normalization is carried out as follows. Determine LOTS1, LOTS2, LOTS1A, and LOTS1B.

Figure 10.11 (d). Summary of the progressive normalization of LOTS.

[1], pp. 322.


Relation: LOTS (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE)




FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA, PRICE, TAX_RATE}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA, PRICE, TAX_RATE}

FD3: COUNTY_NAME TAX_RATE

FD4: AREA PRICE 111

2NF


Relation: LOTS1 (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA, PRICE) -- in 2NF




FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA, PRICE}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA, PRICE}

FD4: AREA PRICE

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF

Primary key: COUNTY_NAME

FD: COUNTY_NAME TAX_RATE

112

3NF


Relation: LOTS1A (PROPERTY_ID#, COUNTY_NAME, LOT#, AREA) -- in 3NF




FD1: PROPERTY_ID# {COUNTY_NAME, LOT#, AREA}

FD2: {COUNTY_NAME, LOT#} {PROPERTY_ID#, AREA}

Relation: LOTS1B (AREA, PRICE) -- in 3NF

Primary key: AREA

FD: AREA PRICE

Relation: LOTS2 (COUNTY_NAME, TAX_RATE) -- in 3NF

Primary key: COUNTY_NAME

FD: COUNTY_NAME TAX_RATE 113


Boyce-Codd normal form (BCNF)

A simpler form of 3NF, but stricter than 3NF

Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF.

For a relation in 3NF, each non-key attribute must:

Fully functionally dependent on every key

Nontransitively dependent on every key

Not functionally dependent on other nonkey attributes

For a relation in BCNF, all key/non-key attributes must be functionally dependent on superkeys.

XA holds in R => X is a superkey of R.

XA holds in R and X is not a superkey of R => R is not in BCNF.

114


115

Figure 10.12. (a) BCNF normalization of LOTS1A [1], pp. 325. FD5: AREA COUNTY_NAME

AREA is not a superkey of LOTS1A. LOTS1A is in 3NF; but not in BCNF.


116

Candidate key = {STUDENT, COURSE}

FD1: {STUDENT, COURSE} INSTRUCTOR

FD2: INSTRUCTOR COURSE

Normalization???


Multivalued dependencies

A multivalued dependency X Y specified on

relation schema R, where X and Y are both

subsets of R, specifies the following constraint on

any relation state r of R: if two tuple t1 and t2

exist in r such that t1[X] = t2[X], then two tuples

t3 and t4 should also exist in r with the following

properties, where Z is used to denote (R-(XUY)):

t3[X] = t4[X] = t1[X] = t2[X]

t3[Y] = t1[Y] and t4[Y] = t2[Y]

t3[Z] = t2[Z] and t4[Z] = t1[Z] 117


Multivalued dependencies

ENAME PNAME

ENAME DNAME

No association between PNAME and DNAME 118


Multivalued dependencies (MVDs)

A multivalued dependency X Y is a trivial MVD

if:

(a) Y is a subset of X

(b) X U Y = R

A trivial MVD does not specify any significant or

meaningful constraint on R.

A nontrivial MVD satisfies neither (a) nor (b).

119


Fourth normal form

A relation schema R is in 4NF with respect to a

set of dependencies (that includes functional

dependencies and multivalued dependencies) if,

for every nontrivial multivalued dependency X Y

in F+, X is a superkey for R.

120

2.5. Conclusion

Data modeling

Database design process: 6 phases

Conceptual database design

Entity-relationship model

Choice of a DBMS a representational data model


Data model mapping

Functional/multivalued dependencies

Normalization: 1NF, 2NF, 3NF, BCNF, 4NF 121

122

Questions ???

Review - Chapter 12 in [1] 12.1. What are the six phases of database design? Discuss each phase.

12.2. Which of the six phases are considered the main activities of the database design process itself? Why?

12.3. Why is it important to design the schemas and applications in parallel?

12.4. Why is it important to use an implementation-independent data model during conceptual schema design? What models are used in current design tools? Why?

12.5. Discuss the importance of Requirements Collection and Analysis.

12.7. Discuss the characteristics that a data model for conceptual schema design should possess.

12.8. Compare and contrast the two main approaches to conceptual schema design.

12.9. Discuss the strategies for designing a single conceptual schema from its requirements.

12.10. What are the steps of the view integration approach to conceptual schema design? What are the difficulties during each step?

12.13. Discuss the factors that influence the choice of a DBMS package for the information system of an organization.

12.14. What is system-independent data model mapping? How is it different from system-dependent data model mapping?

12.15. What are the important factors that influence physical database design? 123

Review - Chapters 3, 4 in [1] 3.1. Discuss the role of a high-level data model in the database design process.

3.2. List the various cases where use of a null value would be appropriate.

3.3. Define the following terms: entity, attribute, attribute value, relationship instance, composite attribute, multivalued attribute, derived attribute, complex attribute, key attribute, value set (domain).

3.4. What is an entity type? What is an entity set? Explain the differences among an entity, an entity type, and an entity set.

3.6. What is a relationship type? Explain the differences among a relationship instance, a relationship type, and a relationship set.

3.7. What is a participation role? When is it necessary to use role names in the description of relationship types?

3.12. When is the concept of a weak entity used in data modeling? Define the terms owner entity type, weak entity type, identifying relationship type, and partial key.

4.1. What is a subclass? When is a subclass needed in data modeling?

Define the following terms: superclass of a subclass, superclass/subclass relationship, is-a relationship, specialization, generalization, category, specific (local) attributes) specific relationships.

4.6. Discuss the two main types of constraints on specializations and generalizations.

124

Review - Chapter 5 in [1]

5.1. Define the following terms: domain, attribute, n-tuple,

relation schema, relation state, degree of a relation, relational

database schema, relational database state.

5.2. Why are tuples in a relation not ordered?

5.3. Why are duplicate tuples not allowed in a relation?

5.4. What is the difference between a key and a superkey?

5.8. Discuss the entity integrity and referential integrity

constraints. Why is each considered important?

5.9. Define foreign key. What is this concept used for?

125

Review - Chapter 10 in [1] 10.1. Discuss attribute semantics as an informal measure of goodness for a relation schema.

10.2. Discuss insertion, deletion, and modification anomalies. Why are they considered bad? Illustrate with examples.

10.3. Why should nulls in a relation be avoided as far as possible? Discuss the problem of spurious tuples and how we may prevent it.

10.4. State the informal guidelines for relation schema design that we discussed. Illustrate how violation of these guidelines may be harmful.

10.5. What is a functional dependency? What are the possible sources of the information that defines the functional dependencies that hold among the attributes of a relation schema?

10.6. Why can we not infer a functional dependency automatically from a particular relation state?

10.12. What does the term unnormalized relation refer to? How did the normal forms develop historically from first normal form up to Boyce-Codd normal form?

10.13. Define first, second, and third normal forms when only primary keys are considered. How do the general definitions of 2NF and 3NF, which consider all keys of a relation, differ from those that consider only primary keys?

10.14. What undesirable dependencies are avoided when a relation is in 2NF?

10.15. What undesirable dependencies are avoided when a relation is in 3NF?

10.16. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it considered a stronger form of 3NF? 126

db systems - data modeling

Documents