database lecture note

Upload: miliyon-tilahun

Post on 04-Jun-2018

243 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Database Lecture Note

    1/106

    AAU

    2006

    Database

    Instructor Seble

    M.T

    W W W . A A U . E D U . E T

  • 8/13/2019 Database Lecture Note

    2/106

    Fundamentals of Database

  • 8/13/2019 Database Lecture Note

    3/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 1

    CHAPTER ONE

    Database System

    Database systems are designed to manage large data set in an organization. The data

    management involves both definition and the manipulation of the data which ranges from

    simple representation of the data to considerations of structures for the storage of information.

    The data management also consider the provision of mechanisms for the manipulation of

    information.

    Today, Databases are essential to every business. They are used to maintain internal records,

    to present data to customers and clients on the World-Wide-Web, and to support many other

    commercial processes. Databases are likewise found at the core of many modern

    organizations.

    The power of databases comes from a body of knowledge and technology that has developed

    over several decades and is embodied in specialized software called a database management

    system, or DBMS. A DBMS is a powerful tool for creating and managing large amounts of

    data efficiently and allowing it to persist over long periods of time, safely. These systems are

    among the most complex types of software available.

    Thus, for our question: What is a database? In essence a database is nothing more than a

    collection of shared information that exists over a long period of time, often many years. Incommon dialect, the term database refers to a collection of data that is managed by a DBMS.

    Thus the DB course is about:

    How to organize data

    Supporting multiple users

    Efficient and effective data retrieval

    Secured and reliable storage of data

    Maintaining consistent data Making information useful for decision making

    Data management passes through the different levels of development along with the

    development in technology and services. These levels could best be described by categorizing

    the levels into three levels of development. Even though there is an advantage and a problem

    overcome at each new level, all methods of data handling are in use to some extent. The major

    three levels are;

  • 8/13/2019 Database Lecture Note

    4/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 2

    1. Manual Approach

    2. Traditional File Based Approach

    3. Database Approach

    1. Manual ApproachIn the manual approach, data storage and retrieval follows the primitive and traditional way

    of information handling where cards and paper are used for the purpose.

    Files for as many event and objects as the organization has are used to store

    information.

    Each of the files containing various kinds of information is labeled and stored in one or

    more cabinets.

    The cabinets could be kept in safe places for security purpose based on the sensitivity

    of the information contained in it.

    Insertion and retrieval is done by searching first for the right cabinet then for the right

    the file then the information.

    One could have an indexing system to facilitate access to the data

    Limitations of the Manual approach

    Prone to error Difficult to update, retrieve, integrate

    You have the data but it is difficult to compile the information

    Limited to small size information

    Cross referencing is difficult

    An alternative approach of data handling is a computerized way of dealing with the

    information. The computerized approach could also be either decentralized or centralizedbase on where the data resides in the system.

    2. Traditional File Based Approach

    After the introduction of Computer for data processing to the business community, the need

    to use the device for data storage and processing increase. There were, and still are, several

  • 8/13/2019 Database Lecture Note

    5/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 3

    computer applications with file based processing used for the purpose of data handling. Even

    though the approach evolved over time, the basic structure is still similar if not identical.

    File based systems were an early attempt to computerize the manual filing system.

    This approach is the decentralized computerized data handling method.

    A collection of application programs perform services for the end-users. In such

    systems, every application program that provides service to end users, define and

    manage its own data.

    Such systems have number of programs for each of the different applications in the

    organization.

    Since every application defines and manages its own data, the system is subjected to

    serious data duplication problem.

    File, in traditional file based approach, is a collection of records which contains

    logically related data.

  • 8/13/2019 Database Lecture Note

    6/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 4

    Limitations of the Traditional File Based approach

    As business application become more complex demanding more flexible and reliable data

    handling methods, the shortcomings of the file based system became evident. These

    shortcomings include, but not limited to:

    Separation or Isolation of Data: Available information in one application may not be

    known.

    Limited data sharing

    Lengthy development and maintenance time

    Duplication or redundancy of data

    Data dependency on the application

    Incompatible file formats between different applications and programs creating

    inconsistency.

    Fixed query processing which is defined during application development

    The limitations for the traditional file based data handling approach arise from two basic

    reasons.

    1. Definition of the data is embedded in the application program which makes it

    difficult to modify the database definition easily.

    2. No control over the access and manipulation of the data beyond that imposed by

    the application programs.

    The most significant problem experienced by the traditional file based approach of data

    handling is the update anomalies. We have three types of update anomalies;

    1. Modification Anomalies: a problem experienced when one ore more data value is

    modified on one application program but not on others containing the same data set.

    2. Deletion Anomalies: a problem encountered where one record set is deleted from one

    application but remain untouched in other application programs.

    3. Insertion Anomalies: a problem encountered where one cannot decide whether the data

    to be inserted is valid and consistent with other similar data set.

  • 8/13/2019 Database Lecture Note

    7/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 5

    3. Database Approach

    Following a famous paper written by Ted Codd in 1970, database systems changed

    significantly. Codd proposed that database systems should present the user with a view of

    data organized as tables called relations. Behind the scenes, there might be a complex data

    structure that allowed rapid response to a variety of queries. But, unlike the user of earlier

    database systems, the user of a relational system would not be concerned with the storage

    structure. Queries could be expressed in a very high-level language, which greatly increased

    the efficiency of database programmers. The database approach emphasizes the integration

    and sharing of data throughout the organization.

    Thus in Database Approach:

    Database is just a computerized record keeping system or a kind of electronic filing

    cabinet.

    Database is a repository for collection of computerized data files.

    Database is a shared collection of logically related data designed to meet the information

    needs of an organization. Since it is a shared corporate resource, the database is

    integrated with minimum amount of or no duplication.

    Database is a collection of logically related data where these logically related data

    comprises entities, attributes, relationships, and business rules of an organization's

    information.

    In addition to containing data required by an organization, database also contains a

    description of the data which called as Metadata or Data Dictionary or Systems

    Catalogue or Data about Data.

    Since a database contains information about the data (metadata), it is called a self

    descriptive collection on integrated records.

    The purpose of a database is to store information and to allow users to retrieve and

    update that information on demand.

    Database is deigned once and used simultaneously by many users.

    Unlike the traditional file based approach in database approach there is program data

    independence. That is the separation of the data definition from the application. Thus the

    application is not affected by changes made in the data structure and file organization.

    Each database application will perform the combination of: Creating database, Reading,

    Updating and Deleting data.

  • 8/13/2019 Database Lecture Note

    8/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 6

    Benefits of the database approach

    Data can be shared: two or more users can access and use same data instead of storing

    data in redundant manner for each user.

    Improved accessibility of data: by using structured query languages, the users can easily

    access data without programming experience.

    Redundancy can be reduced: isolated data is integrated in database to decrease the

    redundant data stored at different applications.

    Quality data can be maintained: the different integrity constraints in the database

    approach will maintain the quality leading to better decision making

    Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the

    data in the database to some extent.

    Transaction support can be provided: basic demands of any transaction support systems

    are implanted in a full scale DBMS.

    Integrity can be maintained: data at different applications will be integrated together with

    additional constraints to facilitate shared data resource.

    Security majors can be enforced: the shared data can be secured by having different levels

    of clearance and other data security mechanisms.

    Improved decision support: the database will provide information useful for decision

    making.

    Standards can be enforced: the different ways of using and dealing with data by different

    unite of an organization can be balanced and standardized by using database

    approach.

    Compactness: since it is an electronic data handling method, the data is stored compactly

    (no voluminous papers).

    Speed: data storage and retrieval is fast as it will be using the modern fast computer

    systems.

    Less labour: unlike the other data handling methods, data maintenance will not demand

    much resource.

    Centralized information control: since relevant data in the organization will be stored at

    one repository, it can be controlled and managed at the central level.

  • 8/13/2019 Database Lecture Note

    9/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 7

    Limitations and risk of Database Approach

    Introduction of new professional and specialized personnel.

    Complexity in designing and managing data

    Te cost and risk during conversion from the old to the new system

    High cost incurred to develop and maintain

    Complex backup and recover services from the users perspective

    Reduced performance due to centralization

    High impact on the system when failure occur

  • 8/13/2019 Database Lecture Note

    10/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 8

    Database Management System (DBMS)

    Database Management System (DBMS) is a Software package used for providing EFFICIENT,

    CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or even same

    data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT (data outlivesprograms that operate on it) data. A DBMS also provides a systematic method for creating,

    updating, storing, retrieving data in a database. DBMS also provides the service of controlling

    data access, enforcing data integrity, managing concurrency control, and recovery. Having

    this in mind, a full scale DBMS should at least have the following services to provide

    to the user.

    1. Data storage, retrieval and update in the database

    2. A user accessible catalogue

    3. Transaction support service: ALL or NONE transaction, which minimize data

    inconsistency.

    4. Concurrency Control Services: access and update on the database by different

    users simultaneously should be implemented correctly.

    5.Recovery Services: a mechanism for recovering the database after a failure must

    be available.6. Authorization Services (Security): must support the implementation of access

    and authorization service to database administrator and users.

    7. Support for Data Communication: should provide the facility to integrate with

    data transfer software or data communication managers.

    8.Integrity Services: rules about data and the change that took place on the data,

    correctness and consistency of stored data, and quality of data based on

    business constraints.

    9. Services to promote data independency between the data and the application

    10. Utility services: sets of utility service facilities like

    Importing data Statistical analysis support Index reorganization Garbage collection

  • 8/13/2019 Database Lecture Note

    11/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 9

    DBMS and Components of DBMS Environment

    A DBMS is software package used to design, manage, and maintain databases. It provides the

    following facilities:

    Data Definition Language (DDL):

    o Language used to define each data element required by the organization.

    o Commands for setting up schema of database

    Data Manipulation Language (DML):

    o Language used by end-users and programmers to store, retrieve, and access the

    data e.g. SQL

    o Also called "query language"

    Data Dictionary: tool used to store and organize information about the data

    The DBMS is software that helps to design, handle, and use data using the database approach.

    Taking a DBMS as a system, one can describe it with respect to it environment or other

    systems interacting with the DBMS. The DBMS environment has five components.

    1. Hardware: Components that are comprised of personal computers, mainframe or

    any server computers, network infrastructure, etc.

    2. Software: those components like the DBMS software, application programs,

    operating systems, network software, and other relevant software.

    3. Data: This is the most important component to the user of the database. There are two

    types of data in a database approach that is Operational andMetadata.

    The structure of the data in the database is called the schema, which is composed of the

    Entities, Properties of entities, and relationship between entities.

    4. Procedure: this is the rules and regulations on how to design and use a database. It

    includes procedures like how to log on to the DBMS, how to use facilities, how to start

    and stop transaction, how to make backup, how to treat hardware and software failure,

    how to change the structure of the database.

    5. People: people in the organization responsible to designing, implement, manage,

    administer and use of the database.

  • 8/13/2019 Database Lecture Note

    12/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 10

    Database Development Life Cycle

    As it is one component in most information system development tasks, there are several steps

    in designing a database system. Here more emphasis is given to the design phases of the

    system development life cycle. The major steps in database design are;

    1. Planning: that is identifying information gap in an organization and propose a

    database solution to solve the problem.

    2. Analysis: that concentrates more on fact finding about the problem or the

    opportunity. Feasibility analysis, requirement determination and structuring, and

    selection of best design method are also performed at this phase.

    3. Design: in database designing more emphasis is given to this phase. The phase is

    further divided into three sub-phases.

    a. Conceptual Design: concise description of the data, data type, relationship

    between data and constraints on the data.

    There is no implementation or physical detail consideration.

    Used to elicit and structure all information requirements

    b. Logical Design: a higher level conceptual abstraction with selected specific data

    model to implement the data structure.

    It is particular DBMS independent and with no other physical

    considerations.

    c. Physical Design: physical implementation of the upper level design of the

    database with respect to internal storage and file structure of the database for

    the selected DBMS.

    To develop all technology and organizational specification.

    4. Implementation: the testing and deployment of the designed database for use.

    5. Operation and Support: administering and maintaining the operation of the

    database system and providing support to users.

  • 8/13/2019 Database Lecture Note

    13/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 11

    Roles in Database Design and Use

    As people are one of the components in DBMS environment, there are group of roles played

    by different stakeholders of the designing and operation of a database system.

    1. DataBase Administrator (DBA) Responsible to oversee, control and manage the database resources (the database itself,

    the DBMS and other related software)

    Authorizing access to the database

    Coordinating and monitoring the use of the database

    Responsible for determining and acquiring hardware and software resources

    Accountable for problems like poor security, poor performance of the system

    Involves in all steps of database developmentWe can have further classifications of this role in big organizations having huge amount of

    data and user requirement.

    1. Data Administrator (DA): is responsible on management of data resources.

    Involves in database planning, development, maintenance of standards policies

    and procedures at the conceptual and logical design phases.

    2. DataBase Administrator (DBA): is more technically oriented role. Responsible

    for the physical realization of the database. Involves in physical design,

    implementation, security and integrity control of the database.

    2. DataBase Designer (DBD)

    Identifies the data to be stored and choose the appropriate structures to represent and

    store the data.

    Should understand the user requirement and should choose how the user views the

    database.

    Involve on the design phase before the implementation of the database system.

    We have two distinctions of database designers, one involving in the logical and conceptual

    design and another involving in physical design.

  • 8/13/2019 Database Lecture Note

    14/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 12

    1. Logical and Conceptual DBD

    Identifies data (entity, attributes and relationship) relevant to the

    organization

    Identifies constraints on each data

    Understand data and business rules in the organization

    Sees the database independent of any data model at conceptual level and

    consider one specific data model at logical design phase.

    2. Physical DBD

    Take logical design specification as input and decide how it should be

    physically realized.

    Map the logical data model on the specified DBMS with respect to tables and

    integrity constraints. (DBMS dependent designing)

    Select specific storage structure and access path to the database

    Design security measures required on the database

    3. Application Programmer and Systems Analyst

    System analyst determines the user requirement and how the user wants to view

    the database.

    The application programmer implements these specifications as programs; code,

    test, debug, document and maintain the application program.

    Determines the interface on how to retrieve, insert, update and delete data in the

    database.

    The application could use any high level programming language according to the

    availability, the facility and the required service.

    4. End Users

    Workers, whose job requires accessing the database frequently for various purpose. There are

    different group of users in this category.

    1. Nave Users:

    Sizable proportion of users

    Unaware of the DBMS

  • 8/13/2019 Database Lecture Note

    15/106

    Chapter One: Introduction to Database Systems

    AAU, Computer Science Department, 2013 Page 13

    Only access the database based on their access level and demand

    Use standard and pre-specified types of queries.

    2. Sophisticated Users

    Are users familiar with the structure of the Database and facilities of the

    DBMS.

    Have complex requirements

    Have higher level queries

    Are most of the time engineers, scientists, business analysts, etc

    3. Casual Users

    Users who access the database occasionally.

    Need different information from the database each time.

    Use sophisticated database queries to satisfy their needs.

    Are most of the time middle to high level managers.

    These users can be again classified as Actors on the Scene and Workers Behind the Scene.

    Actors On the Scene:

    Data Administrator

    Database Administrator

    Database Designer

    End Users

    Workers Behind the Scene

    DBMS designers and implementers: who design and implement different DBMS

    software.

    Tool Developers: experts who develop software packages that facilitates databasesystem designing and use. Prototype, simulation, code generator developers could be

    an example. Independent software vendors could also be categorized in this group.

    Operators and Maintenance Personnel: system administrators who are responsible

    for actually running and maintaining the hardware and software of the database

    system and the information technology facilities.

  • 8/13/2019 Database Lecture Note

    16/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 1

    CHAPTER TWO

    DATABASE SYSTEM ARCHITECTURE

    Data Models, Schemas, and Instances

    One fundamental characteristic of the database approach is that it provides some level of data

    abstraction by hiding details of data storage that are not needed by most database users. A data

    model is a collection of concepts that can be used to describe the structure of a database and also

    it provides the necessary means to achieve this abstraction. By structure of a database, we mean

    the data types, relationships, and constraints that should hold for the data. Most data models also

    include a set of basic operations for specifying retrievals and updates on the database.

    In addition to the basic operations provided by the data model, it is becoming more common to

    include concepts in the data model to specify the dynamic aspect or behavior of a database

    application. This allows the database designer to specify a set of valid user defined operations

    that arc allowed on the database objects. An example of a user-defined operation could be

    COMPUTE_GPA, which can be applied to a STUDENT object. On the other hand, generic

    operations to insert, delete, modify, or retrieve any kind of object are often included in the basic

    data model operations.

    Categories of Data Models

    Many data models have been proposed, which we can categorize according to the types of

    concepts they use to describe the database structure. High-level or conceptual data models

    provide concepts that are close to the way many users perceive data, whereas low-level or

    physical data models provide concepts that describe the details of how data is stored in the

    computer. Concepts provided by low-level data models are generally meant for computer

    specialists, not for typical end users. Between these two extremes is a class of representational

    (or implementation) data models, which provide concepts that may be understood by end users

    but that are not too far removed from the way data is organized within the computer.

    Representational data models hide some details of data storage but can be implemented on a

    computer system in a direct way.

    Conceptual data models use concepts such as entities, attributes, and relationships. An entity

    represents a real-world object or concept, such as an employee or a project that is described in

  • 8/13/2019 Database Lecture Note

    17/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 2

    the database. An attribute represents some property of interest that further describes an entity,

    such as the employee's name or salary. A relationship among two or more entities represents an

    association among two or more entities, for example, a works-on relationship between an

    employee and a project.

    Representational or implementation data models are the models used most frequently in

    traditional commercial DBMSs. These include the widely used relational data model, as well as

    the so-called legacy data models-the network and hierarchical models-that have been widely

    used in the past. Representational data models represent data by using record structures and

    hence are sometimes called record-based data models. We can regard object data models as a

    new family of higher-level implementation data models that are closer to conceptual data

    models.

    FIGURE 1.2 A database that stores student and course information.

  • 8/13/2019 Database Lecture Note

    18/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 3

    Hierarchical Data Model

    In the hierarchical data model, information is organized as a collection of inverted trees of

    records. The inverted trees may be of arbitrary depth. The record at the root of a tree has zero or

    more child records; the child records, in turn, serve as parent records for their immediate

    descendants. This parent-child relationship recursively continues down the tree.

    The records consists of fields, where each field may contain simple data values (e.g. integer, real,

    text)., or a pointer to a record. The pointer graph is not allowed to contain cycles. Some

    combinations of fields may form the key for a record relative to its parent.

    Applications can navigate a hierarchical database by starting at a root and successively navigate

    downward from parent to children until the desired record is found. Applications can interleave

    parent-child navigation with traversal of pointers. Searching down a hierarchical tree is very fast

    since the storage layer for hierarchical databases use contiguous storage for hierarchical

    structures. All other types of queries require sequential search techniques.

    The hierarchical data model is impoverished for expressing complex information models. Often

    a natural hierarchy does not exist and it is awkward to impose a parent-child relationship.

    Pointers partially compensate for this weakness, but it is still difficult to specify suitable

    hierarchical schemas for large models.

    Hierarchical Data Model Summary

    The simplest data model

    Record type is referred to as node or segment

    The top node is the root node

    Nodes are arranged in a hierarchical structure as sort of upside-down tree

    A parent node can have more than one child node

    A child node can only have one parent node

    The relationship between parent and child is one-to-many

    Relation is established by creating physical link between stored records (each is stored with a

    predefined access path to other records)

    Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in

    manufacturing, personnel organization in companies

  • 8/13/2019 Database Lecture Note

    19/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 4

    Fig. Example of Hierarchical data model

    Network Data Model

    In the network data model, information is organized as a collection of graphs of record that are

    related with pointers. Network data models represent data in a symmetric manner, unlike the

    hierarchical data model (distinction between a parent and a child). A network data model is more

    flexible than a hierarchical data model and still permits efficient navigation. The records consists of lists of fields (fixed or variable length with maximum length), where

    each field contains a simple value (fixed or variable size). The network data model also

    introduces the notion of indexes of fields and records, sets of pointers, and physical placement of

    records.

    Network Data Model Summary

    Allows record types to have more that one parent unlike hierarchical model

    A network data models sees records as set members

    Each set has an owner and one or more members

    Allow no many to many relationship between entities

    Like hierarchical model network model is a collection of physically linked records.

    Allow member records to have more than one owner

    Database contains a complex array of pointers that thread through a set of records.

  • 8/13/2019 Database Lecture Note

    20/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 5

    Fig. Many-to-many relationship defined using Network data type

    Relational Data Model

    In the relational data model, information is organized in relations (two-dimensional tables). Each

    relation contain a set of tuples (records). Each tuple contain a number of fields. A field may

    contain a simple value (fixed or variable size) from some domain (e.g. integer, real, text, etc.).

    The relational data model is based on a mathematical foundation, called relational algebra. This

    mathematical foundation is the cornerstone to some of the very attractive properties of relationaldatabases, since it first of all offers data independence, and offers a mathematical framework for

    many of the optimizations possible in relational databases (e.g. query optimization).

    Relational Data Model Summary

    Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for

    Large Shared Data Banks')

    Viewed as a collection of tables called Relations equivalent to collection of record

    types

    Stores information or data in the form of tables .. rows and columns

    A row of the table is called tuple.. equivalent to record

    A column of a table is called attribute.. equivalent to field

    The tables seem to be independent but are related some how.

    Conducts searches by using data in specified columns of one table to find additional data

    in another table

    In conducting searches, a relational database matches information from a field in one

    table with information in a corresponding field of another table to produce a third table

    that combines requested data from both tables

    Can define more flexible and complex relationship

  • 8/13/2019 Database Lecture Note

    21/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 6

    Schemas, Instances, and Database State

    In any data model, it is important to distinguish between the descriptions of the database and the

    database itself. The description of a database is called the database schema, which is specified

    during database design and is not expected to change frequently. Most data models have certainconventions for displaying schemas as diagrams. A displayed schema is called a schema

    diagram. Figure 2.1 shows a schema diagram for the database shown in Figure 1.2; the diagram

    displays the structure of each record type but not the actual instances of records. We call each

    object in the schema-such as STUDENT or COURSE-a schemaconstruct.

    A schema diagram displays only some aspects of a schema, such as the names of record types

    and data items, and some types of constraints. Other aspects are not specified in the schema

    diagram; for example, Figure 2.1 shows neither the data type of each data item nor the

    relationships among the various files. Many types of constraints are not represented in schema

    diagrams. A constraint such as "students majoring in computer science must take CS1310 before

    the end of their sophomore year" is quite difficult to represent.

    The actual data in a database may change quite frequently. For example, the database shown in

    Figure 1.2 changes every time we add a student or enter a new grade for a student. The data in

    the database at a particular moment in time is called a database state or snapshot. It is also

    called the current set of occurrences or instancesin the database. In a given database state, each

    schema construct has its own current set of instances; for example, the STUDENT construct will

    contain the set of individual student entities (records) as its instances. Many database states can

    be constructed to correspond to a particular database schema. Every time we insert or delete a

    record or change the value of a data item in a record, we change one state of the database into

    another state.

    The distinction between database schema and database state is very important. When we define a

    new database, we specify its database schema only to the DBMS. At this point, the

    corresponding database state is the empty state with no data. We get the initial state of the

    databasewhen the database is first populated or loaded with the initial data. From then on, every

    time an update operation is applied to the database, we get another database state. At any point in

    time, the database has a current state. The DBMS is partly responsible for ensuring that every

    state of the database is a valid state-that is, a state that satisfies the structure and constraints

  • 8/13/2019 Database Lecture Note

    22/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 7

    specified in the schema. Hence, specifying a correct schema to the DBMS is extremely

    important, and the schema must be designed with the utmost care. The DBMS stores the

    descriptions of the schema constructs and constraints-also called the meta-data-in the DBMS

    catalog so that DBMS software can refer to the schema whenever it needs to. The schema is

    sometimes called the intension, and a database state an extensionof the schema.

    Although, as mentioned earlier, the schema is not supposed to change frequently, it is not

    uncommon that changes need to be occasionally applied to the schema as the application

    requirements change. For example, we may decide that another data item needs to be stored for

    each record in a file, such as adding the DateOfBirth to the STUDENT schema in Figure 2.1.

    This is known as schemaevolution. Most modern DBMSs include some operations for schema

    evolution that can be applied while the database is operational.

    STUDENT

    CourseName CourseNumber CreditHours Department

    COURSE

    SectionIdentifier CourseNumber Semester Year Instructor

    SECTION

    FIGURE 2.1 Schema diagram for the database in Figure 1.2.

    Name StudentNumber Class Major

  • 8/13/2019 Database Lecture Note

    23/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 8

    THREE-SCHEMA ARCHITECTURE AND DATA INDEPENDENCE

    Three of the four important characteristics of the database approach, are (1) insulation of

    program; and data (program-data and program-operation independence), (2) support of multiple

    user views, and (3) use of a catalog to store the database description (schema). In this section wespecify an architecture for database systems, called the three-schema architecture also known

    as the ANSI/SPARC architecture, that was proposed to help achieve and visualize these

    characteristics. We then further discuss the concept of data independence.

    The Three-Schema Architecture

    The goal of the three-schema architecture, illustrated in Figure 2.2, is to separate the user

    applications and the physical database. In this architecture, schemas can be defined at the

    following three levels:

    Internal Level

    The internal level has an internal schema, which describes the physical storage structure of the

    database. The internal schema uses a physical data model and describes the complete details of

    data storage and access paths for the database.

    Conceptual Level

    The conceptual level has a conceptual schema, which describes the structure of the whole

    database for a community of users. The conceptual schema hides the details of physical storage

    structures and concentrates on describing entities, data types, relationships, user operations, and

    constraints. Usually, a representational data model is used to describe the conceptual schema

    when a database system is implemented. This implementation conceptual schema is often based

    on a conceptual schema design in a high-level data model.

    External Level

    The external or view level includes a number of external schemas or user views. Each external

    schema describes the part of the database that a particular user group is interested in and hides

    the rest of the database from that user group. As in the previous case, each external schema is

    typically implemented using a representational data model, possibly based on an external schema

    design in a high level data model.

  • 8/13/2019 Database Lecture Note

    24/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 9

    The three-schema architecture is a convenient tool with which the user can visualize the schema

    levels in a database system. Most DBMSs do not separate the three levels completely, but

    support the three-schema architecture to some extent. Some DBMSs may include physical-level

    details in the conceptual schema. In most DBMSs that support user views, external schemas are

    specified in the same data model that describes the conceptual-level information. Some DBMSs

    allow different data models to be used at the conceptual and external levels.

    Notice that the three schemas are only descriptions of data; the only data that actually exists is at

    the physical level. In a DBMS based on the three-schema architecture, each user group refers

    only to its own external schema. Hence, the DBMS must transform a request specified on an

    external schema into a request against the conceptual schema, and then into a request on the

    internal schema for processing over the stored database. If the request is a database retrieval, the

    data extracted from the stored database must be reformatted to match the user's external view.

    The processes of transforming requests and results between levels are called mappings. These

    mappings may be time-consuming, so some DBMSs-especially those that are meant to support

    small databases-do not support external views. Even in such systems, however, a certain amount

    of mapping is necessary to transform requests between the conceptual and internal levels.

    ANSI-SPARC Three-level Architecture

  • 8/13/2019 Database Lecture Note

    25/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 10

    External View 1

    Staff_No FName Salary

    External View 2

    FName DOB

    Conceptual Level

    Staff_No FName LName DOB Salary Branch_No

    I nternal Level

    struct STAFF{

    int Staff_No;

    int Branch_No;

    char FName[15];char LName[15];

    struct DOB;

    float salary;

    }

    Fig. Differences between Three Levels of ANSI-SPARC Architecture

    The ANSI-SPARC Architecture Defines DBMS schemas at three levels:

    Internal schema

    o at the internal level to describe physical storage structures and access paths.

    Typically uses aphysical data model.

    Conceptual schema

    o at the conceptual level to describe the structure and constraints for the whole

    database for a community of users. Uses a conceptual or an implementation data

    model.

    External schemas

    o at the external level to describe the various user views. Usually uses the same data

    model as the conceptual level.

  • 8/13/2019 Database Lecture Note

    26/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 11

    Data Independence

    The three-schema architecture can be used to further explain the concept of data independence,

    which can be defined as the capacity to change the schema at one level of a database system

    without having to change the schema at the next higher level. We can define two types of dataindependence:

    Logical Data Independence

    Logical data independence is the capacity to change the conceptual schema without having to

    change external schemas or application programs. We may change the conceptual schema to

    expand the database (by adding a record type or data item), to change constraints, or to reduce

    the database (by removing a record type or data item). In the last case, external schemas that

    refer only to the remaining data should not be affected. For example, the external schema of

    Figure l.4a should not be affected by changing the GRADE_REPORT file shown in Figure 1.2

    into the one shown in Figure 1.5a. Only the view definition and the mappings need be changed in

    a DBMS that supports logical data independence. After the conceptual schema undergoes a

    logical reorganization, application programs that reference the external schema constructs must

    work as before. Changes to constraints can be applied to the conceptual schema without affecting

    the external schemas or application programs.

    Logical Data Independence Summary

    Refers to immunity of external schemas to changes in conceptual schema.

    Conceptual schema changes e.g. addition/removal of entities should not require changes

    to external schema or rewrites of application programs.

    The capacity to change the conceptual schema without having to change the external

    schemas and their application programs.

    Physical Data Independence

    Physical data independence is the capacity to change the internal schema without having to

    change the conceptual schema. Hence, the external schemas need not be changed as well.

    Changes to the internal schema may be needed because some physical files had to be

    reorganized-for example, by creating additional access structures-to improve the performance of

    retrieval or update. If the same data as before remains in the database, we should not have to

  • 8/13/2019 Database Lecture Note

    27/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 12

    change the conceptual schema. For example, providing an access path to improve retrieval speed

    of SECTION records (Figure 1.2) by Semester and Year should not require a query such as "list

    all sections offered in fall 1998" to be changed, although the query would be executed more

    efficiently by the DBMS by utilizing the new access path.

    Physical Data Independence Summary

    The ability to modify the physical schema without changing the logical schema

    Applications depend on the logical schema

    In general, the interfaces between the various levels and components should be well

    defined so that changes in some parts do not seriously influence others.

    The capacity to change the internal schema without having to change the conceptual

    schema

    Refers to immunity of conceptual schema to changes in the internal schema

    Internal schema changes e.g. using different file organizations, storage structures/devices

    should not require change to conceptual or external schemas.

    Whenever we have a multiple-level DBMS, its catalog must be expanded to include information

    on how to map requests and data among the various levels. The DBMS uses additional software

    to accomplish these mappings by referring to the mapping information in the catalog. Data

    independence occurs because when the schema is changed at some level, the schema at the next

    higher level remains unchanged; only the mapping between the two levels is changed. Hence,

    application programs referring to the higher-level schema need not be changed.

    The three-schema architecture can make it easier to achieve true data independence, both

    physical and logical. However, the two levels of mappings create an overhead during

    compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because

    of this, few DBMSs have implemented the full three-schema architecture.

  • 8/13/2019 Database Lecture Note

    28/106

    Fundamental of Database Systems: Data Models and Database System Architecture

    Page 13

    Fig. Data Independence and the ANSI-SPARC Three-level Architecture

  • 8/13/2019 Database Lecture Note

    29/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 1

    CHAPTER THREE

    Relational Data Model & ER-Model

    In database development life cycle, once all the requirements have been collected and analyzed,

    the next step is to create a conceptualschemafor the database, using a high-level conceptual

    datamodel(one of which is ER-Model). This step is called conceptualdesign. The conceptual

    schema is a concise description of the data requirements of the users and includes detailed

    descriptions of the entity types, relationships, and constraints; these are expressed using the

    concepts provided by the high-level data model. Because these concepts do not include

    implementation details, they are usually easier to understand and can be used to communicate

    with nontechnical users. The high-level conceptual schema enables the database designers to

    concentrate on specifying the properties of the data, without being concerned with storage

    details. Consequently, it is easier for them to come up with a good conceptual database design.

    The next step in database design is the actual implementation of the database, using a

    commercial DBMS. Most current commercial DBMSs use an implementation data model such

    as the relational or the object-relational database model-so the conceptual schema is transformed

    from the high-level data model into the implementation data model. This step is called logical

    designor datamodelmapping, and its result is a database schema in the implementation datamodel of the DBMS.

    The last step is the physicaldesignphase, during which the internal storage structures, indexes,

    access paths, and file organizations for the database files are specified. In this chapter only the

    basic ER model concepts for conceptual schema design are discussed.

    Relational Data Model

    The building blocks of the relational data model are:

    Entities: real world physical or logical object Attributes: properties used to describe each Entity or real world object. Relationship: the association between Entities Constraints: rules that should be obeyed while manipulating the data.

  • 8/13/2019 Database Lecture Note

    30/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 2

    1. Entities/Relation/TableThe basic object that the ER model represents is an entity, which is a "thing" in the real world

    with an independent existence. An entity may be an object with a physical existence (for

    example, a particular person, car, house, or employee) or it may be an object with a conceptualexistence (for example, a company, a job, or a university course). Each entity has attributes-the

    particular properties that describe it. For example, an employee entity may be described by the

    employee's name, age, address, salary, and job. A particular entity will have a value for each of

    its attributes.

    NB:

    The name given to an entity should always be a singular noun descriptive of each item tobe stored in it. E.g.: student NOT students.

    Every relation has a schema, which describes the columns, or fields The relation itself corresponds to our familiar notion of a table A relation is a collection of tuples, each of which contains values for a fixed number of

    attributes

    2. Attributes/Fields/ColumnsAttributes are pieces of information ABOUT entities. The analysis must of course identify those

    which are actually relevant to the proposed application. At this level we need to know such

    things as:

    Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (A DOMAINis a set of values from

    which attribute values may be taken.) Each attribute has values taken from a domain. For

    example, the domain of Name is string and that for salary is real

    Whether the attribute is part of the entity identifier (attributes which just describe anentity and those which help to identify it uniquely)

    Whether it is permanent or time-varying (which attributes may change their values overtime)

    Whether it is required or optional for the entity (whose values will sometimes beunknown or irrelevant)

  • 8/13/2019 Database Lecture Note

    31/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 3

    Several types of attributes occur in the ER model: simple versus composite, single-valued versus

    multi-valued, and stored versus derived.

    Composite versus Simple (Atomic) AttributesComposite attributes can be divided into smaller subparts, which represent more basic

    attributes with independent meanings. For example, Address attribute of an employee entity can

    be subdivided into StreetAddress, City, State, and postal code. Attributes that are not divisible

    are called simpleor atomicattributes. Composite attributes can form a hierarchy; for example,

    StreetAddress can be further subdivided into three simple attributes: Number, Street, and

    ApartmentNumber. The value of a composite attribute is the concatenation of the values of its

    constituent simple attributes.

    Single-Valued versus Multi-valued AttributesMost attributes have a single value for a particular entity; such attributes are called single-

    valued. For example, Age is a single-valued attribute of a person. In some cases an attribute can

    have a set of values for the same entity-for example, Colors attribute for a car, or a

    CollegeDegrees attribute for a person. Cars with one color have a single value, whereas two-tone

    cars have two values for Colors. Similarly, one person may not have a college degree, another

    person may have one, and a third person may have two or more degrees; therefore, different

    persons can have different numbers of values for the CollegeDegrees attribute. Such attributes

    are called multi-valued. A multi-valued attribute may have lower and upper bounds to constrain

    the number of values allowed for each individual entity. For example, the Colors attribute of a

    car may have between one and three values, if we assume that a car can have at most three

    colors.

    Stored versus Derived AttributesIn some cases, two (or more) attribute values are related-for example, the Age and BirthDate

    attributes of a person. For a particular person entity, the value of Age can be determined from the

    current (today's) date and the value of that person's BirthDate. The Age attribute is hence called a

    derived attribute and is said to be derivable from the BirthDate attribute, which is called a

    stored attribute. Some attribute values can be derived from related entities; for example, an

  • 8/13/2019 Database Lecture Note

    32/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 4

    attribute NumberOfEmployees of a department entity can be derived by counting the number of

    employees related to (working for) that department.

    Null ValuesIn some cases a particular entity may not have an applicable value for an attribute. For example,

    the ApartmentNumber attribute of an address applies only to addresses that are in apartment

    buildings and not to other types of residences, such as single-family homes. Similarly, a

    CollegeDegrees attribute applies only to persons with college degrees. For such situations, a

    special value called nullis created. An address of a single-family home would have null for its

    ApartmentNumber attribute, and a person with no college degree would have null for

    CollegeDegrees.

    NB:

    NULL applies to attributes which are not applicable or which do not have values. You may enter the value NA (meaning not applicable) Value of a key attribute cannot be null. Default value- assumed value if no explicit value

    Key Attributes of Entity

    An important constraint on the entities of an entity is the key or uniqueness constraint on

    attributes. An entity usually has an attribute whose values are distinct for each individual entity

    in the entity set. Such an attribute is called a keyattribute (Primary Key), and its values can be

    used to identify each entity uniquely. For example, a Name attribute can be a key for a

    COMPANY entity type, because no two companies are allowed to have the same name. For the

    PERSON entity type, a typical key attribute is SocialSecurityNumber. Sometimes, several

    attributes together form a key, meaning that the combination of the attribute values must be

    distinct for each entity. If a set of attributes possesses this property, the proper way to represent

    this is to define a composite attribute and designate it as a key attribute of the entity type.

    Notice that such a composite key must be minimal; that is, all component attributes must be

    included in the composite attribute to have the uniqueness property. An entity type may also

    have nokey, in which case it is called a weakentity.

  • 8/13/2019 Database Lecture Note

    33/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 5

    Summary of Key Attributes

    o Each row of a table is uniquely identified by a PRIMARYKEYcomposed of one ormore columns

    oGroup of columns, that uniquely identifies a row in a table is called a CANDIDATEKEY

    o A column or combination of columns that matches the primary key of another table iscalled a FOREIGNKEY. Used to cross-reference tables

    Value Sets (Domains) of AttributesEach simple attribute of an entity type is associated with a value set (or domain of values), which

    specifies the set of values that may be assigned to that attribute for each individual entity. For

    instance, if the range of ages allowed for employees is between 16 and 70, we can specify the

    value set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16 and

    70. Similarly, we can specify the value set for the Name

    3. RelationshipsA relationship type R among n entity types E1, E2, --- En, defines a set of associations among

    entities. For example, consider a relationship type WORKS_FOR between two entity types

    EMPLOYEE and DEPARTMENT, which associates each employee with the department for

    which the employee works.

    Degree of a Relationship

    The degree of a relationship type is the number of participating entity types. Hence, the

    WORKS-FOR relationship is of degree two. A relationship type of degree two is called binary,

    and one of degree three is called ternary. Relationships can generally be of any degree, but the

    ones most common are binary relationships. Higher-degree relationships are

    For example, suppose that we want to capture which employees use which skills on which

    project. We might try to represent this data in a database as three binary relationships between

    skills and project, project and employee, and employee and skill.

  • 8/13/2019 Database Lecture Note

    34/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 6

    Fig. A ternary relationship

    Works-on

    o Abebe and Kebede have worked on projectsA andB. Used-on

    o Abebe has used programming skills on project x Have skill

    o An employee has a certain skill. This is different than used on because there aresome skills that an employee has that he or she may not have used on a particular

    project.

    Neededo A project needs a particular skill. This is different than used on because there may

    be some skills for which employees have not been assigned to the project yet.

    Manageso An employee manages a project. This is a completely different dimension than

    skill so it could not be captured by used on.

    Role Names and Recursive Relationships

    Each entity that participates in a relationship plays a particular role in the relationship. The role

    name signifies the role that a participating entity from the entity type plays in each relationship

    instance, and helps to explain what the relationship means. For example, in the WORKS_FOR

    relationship type, EMPLOYEE plays the role of employee or worker and DEPARTMENT plays

    the role of department or employer.

  • 8/13/2019 Database Lecture Note

    35/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 7

    In some cases the same entity participates more than once in a relationship with the same entity

    in different roles. In such cases the role name is essential for distinguishing the meaning of each

    participation. Such relationship types are called recursive relationships. For example, the

    SUPERVISION relationship type relates an employee to a supervisor, where both employee and

    supervisor entities are members of the same EMPLOYEE entity.

    Cardinality Ratios for Binary Relationships

    An important concept about relationship is the number of instances/tuples that can be associated

    with a single instance from one entity in a single relationship. The cardinalityratiofor a binary

    relationship specifies the maximum number of relationship instances that an entity can participate

    in. The number of instances participating or associated with a single instance from an entity in a

    relationship is called the CARDINALITYof the relationship.

    For example, in the WORKS_FOR binary relationship type, DEPARTMENT: EMPLOYEE is of

    cardinality ratio 1: N, meaning that each department can be related to (that is, employs) any

    number of employees. But an employee can be related to (work for) only one department. The

    possible cardinality ratios for binary relationship types are 1:1, l:M, M:l, and M:M.

    ONE-TO-ONE:one tuple is associated with only one other tuple An example of a 1:1 binary relationship is MANAGES, which relates a

    department entity to the employee who manages that department. This means any

    point in time an employee can manage only one department and a department has

    only one manager

    Another example of 1:1 relationship: Building Location as a single buildingwill be located in a single location and as a single location will only accommodate

    a single Building.

  • 8/13/2019 Database Lecture Note

    36/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 8

    ONE-TO-MANY: one tuple can be associated with many other tuples, but not thereverse.

    Eg1. Department-Student: as one department can have multiple students. Eg 2: Employee-Deprtament: as one employee can work in multiple departments.

    MANY-TO-ONE:many tuples are associated with one tuple but not the reverse. E.g. Employee Department: as many employees belong to a single

    department.

    MANY-TO-MANY: one tuple is associated with many other tuples and from theother side, with a different role name one tuple will be associated with many tuples

    E.g. StudentCourse: as a student can take many courses and a single coursecan be attended by many students.

    E.g. Employee- Project: as an employee works-on many projects and a singleproject can be done by many employees.

  • 8/13/2019 Database Lecture Note

    37/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 9

    4. Relational Constraints/Integrity Rules Relational Integrity

    o Domain Integrity: No value of the attribute should be beyond the allowablelimits

    o Entity Integrity: In a base relation, no attribute of a primary key can be nullo Referential Integrity: If a foreign key exists in a relation, either the foreign key

    value must match a candidate key in its home relation or the foreign key value

    must be null foreign key to primary key match-ups

    o Enterprise Integrity: Additional rules specified by the users or databaseadministrators of a database are incorporated

    Relational languages and views

    The languages in relational database management systems are the DDL and the DML that are

    used to define or create the database and perform manipulation on the database. We have the two

    kinds of relation in relational database. The difference is on how the relation is created, used and

    updated:

    1. Base RelationA Named Relation corresponding to an entity in the conceptual schema, whose tuples arephysically stored in the database.

    2. ViewIs the dynamic result of one or more relational operations operating on the base relations to

    produce another virtual relation. So a view virtually derived relation that does not necessarily

    exist in the database but can be produced upon request by a particular user at the time of request.

    Purpose of a view

    Hides unnecessary information from users Provide powerful flexibility and security Provide customized view of the database for users A view of one base relation can be updated. Update on views derived from various relations is not allowed

  • 8/13/2019 Database Lecture Note

    38/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 10

    Conceptual Database Design

    Conceptual design revolves around discovering and analyzing organizational and user data

    requirements. The important activities are to identify: Entities, Attributes, Relationships and

    Constraints and based on these components develop the ER model using ER diagrams.

    The Entity Relationship (E-R) Model

    Entity-Relationship modeling is used to represent conceptual view of the database. The main

    components of ER Modeling are: Entities, Attributes, Relationships and Constraints. Before

    working on the conceptual design of the database, one has to know and answer the following

    basic questions.

    What are the entities and relationships in the enterprise What information about these entities and relationships should we store in the

    database?

    What are the in tegri ty constraints that hold? Constraints on each data with respect toupdate, retrieval and store.

    Represent this information pictorially in ER diagrams, then map ER diagram into arelational database schema.

    ER-Diagrams

    Entity is represented by a RECTANGLE containing the name of the entity

    Attributes are represented by OVALS and are connected to the entity by a line

  • 8/13/2019 Database Lecture Note

    39/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 11

    A derived attribute is indicated by a DOTTED LINE

    PRIMARY KEYS are underlined

    Relationships are represented by DIAMOND shaped symbols

    Structural Constraints on Relationships

    Relationship types usually have certain constraints that limit the possible combinations of entities

    that may participate in the corresponding relationship set. These constraints are determined from

    the miniworld situation that the relationships represent. For example, if a company has a rule that

    each employee must work for exactly one department, then we would like to describe this

    constraint in the schema. We can distinguish two main types of relationship structural

    constraints: cardinality ratio and participation.

    1. Constraints on Relationship / Multiplicity/ Cardinality ConstraintsMultiplicity constraint is the number of or range of possible occurrence of an entity

    type/relation that may relate to a single occurrence/tuple of an entity type/relation through a

    particular relationship. These constraints are mostly used to insure appropriate enterprise

    constraints.

    Specifies that each entity e in E participates in at least min and at most maxrelationship instances in R

    Default(no constraint): min=0, max=n (signifying no limit) Must have minmax, min0, max 1

    Key

  • 8/13/2019 Database Lecture Note

    40/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 12

    One-to-One RelationshipRelationship Managesbetween Employee and Department

    The multiplicity of the relationship is:o

    One department can only have one managero One employee could manage either one or no departments

    01 11

    One-to-Many RelationshipRelationship Leadsbetween Employeeand Project

    The multiplicity of the relationshipo One staff may Lead one or more project(s)o One project is Lead by one employee

    0* 11

    Many-to-Many RelationshipRelationship Teachesbetween Instructor and Course

    The multiplicity of the relationshipo One Instructor Teaches one or more Course(s)o One Course is thought by zero or more instructor(s)

    1* 0*

    2. Participation of an Entity Set in a Relationship SetTotal participation (indicated by double line): every entity in the entity set participates in at

    least one relationship in the relationship set. The entity with total participation will be connected

    with the relationship using a double line.

    Employee DepartmentManages

    Employee ProjectLeads

    Instructor CourseTeaches

  • 8/13/2019 Database Lecture Note

    41/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 13

    E.g. 1: Participation of EMPLOYEE in belongs to relationship with DEPARTMENT is total

    since every employee should belong to a department.

    E.g. 2: Participation of EMPLOYEE in manages relationship with DEPARTMENT,

    DEPARTMENT will have total participation but not EMPLOYEE

    Partial participation: some entities may not participate in any relationship in the relationship

    set

    E.g. 1: Participation of EMPLOYEE in manages relationship with DEPARTMENT,

    EMPLOYEE will have partial participation since not all employees are managers.

  • 8/13/2019 Database Lecture Note

    42/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 14

    Attributes of Relationship Types

    Relationship types can also have attributes, similar to those of entity types. For example, to

    record the number of hours per week that an employee works on a particular project, we can

    include an attribute Hours for the WORKS_ON relationship type. Another example is to include

    the date on which a manager started managing a department via an attribute StartDate for the

    MANAGES relationship type.

    Notice that attributes of 1:1 or 1:M relationship types can be migrated to one of the participating

    entity types. For example, the StartDate attribute for the MANAGES relationship can be an

    attribute of either EMPLOYEE or DEPARTMENT, although conceptually it belongs to

  • 8/13/2019 Database Lecture Note

    43/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 15

    MANAGES. This is because MANAGES is a 1:1 relationship, so every department or employee

    entity participates in at most one relationship instance. Hence, the value of the StartDate attribute

    can be determined separately, either by the participating department entity or by the participating

    employee (manager) entity.

    Weak Entity Types

    Entity types that do not have key attributes of their own are called weakentitytypes. In contrast,

    regular entity types that do have a key attribute are called strongentitytypes. Entities belonging

    to a weak entity type are identified by being related to specific entities from another entity type

    in combination with one of their attribute values.

    A weak entity type always has a total participation constraint (existence dependency) with

    respect to its identifying relationship, because a weak entity cannot be identified without an

    owner entity. However, not every existence dependency results in a weak entity type. For

    example, a DRIVER_LICENSE entity cannot exist unless it is related to a PERSON entity, even

    though it has its own key (LicenseNumber) and hence is not a weak entity.

    Consider the entity type DEPENDENT, related to EMPLOYEE, which is used to keep track of

    the dependents of each employee via a l:N relationship (Figure 3.2). The attributes of

    DEPENDENT are Name (the first name of the dependent), BirthDate, Sex, and Relationship (to

    the employee). Two dependents of two distinct employees may, by chance, have the same values

    for Name, BirthDate, Sex, and Relationship, but they are still distinct entities. They are identified

    as distinct entities only after determining the particular employee entity to which each dependent

    is related. Each employee entity is said to own the dependent entities that are related to it. A

    weak entity type normally has a partial key, which is the set of attributes that can uniquely

    identify weak entities that are related to the same owner entity. In our example, if we assume that

    no two dependents of the same employee ever have the same first name, the attribute Name of

    DEPENDENT is the partial key. In the worst case, a composite attribute of all the weak entity's

    attributes will be the partial key.

  • 8/13/2019 Database Lecture Note

    44/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 16

    ER-to-Relational Mapping Algorithm

    Step 1:Mapping of Regular Entity TypesFor each regular (strong) entity type E in the ER schema, create a relation R that includes all the

    simple attributes of E. Include only the simple component attributes of a composite attribute.

    Choose one of the key attributes of E as primary key for R. If the chosen key of E is composite,

    the set of simple attributes that form it will together form the primary key of R.

    Example: For the company database ER (Figure 3.2)

    o regular entities are:EMPLOYEE, DEPARTMENT, and PROJ ECTo The foreign key and relationship attributes, if any, are not included yet; they will

    be added during subsequent steps. These include the attributes SUPERSSN and

    DNO of EMPLOYEE, MGRSSN and MGRSTARTDATE of DEPARTMENT,

    and DNUM of PROJECT.

    o In our example, we choose empid, DNUMBER, and PNUMBER as primary keysfor the relations EMPLOYEE, DEPARTMENT, and PROJECT,

    Step 2: Mapping of Weak Entity TypesFor each weak entity W in the ER schema with owner entity type E, create a relation R and

    include all simple attributes (or simple components of composite attributes) of W as attributes of

    R. In addition, include as foreign key attributes of R the primary key attributes) of the relations)

    that correspond to the owner entity types): this takes care of the identifying relationship type of

    W. The primary key of R is the combination of the primary keys of the owners and the partial

    key of the weak entity type W, if any.

    In our example, we create the relation DEPENDENT in this step to correspond to theweak entity type DEPENDENT.

    We include the primary key empId of the EMPLOYEE relation which corresponds to theowner entity type-as a foreign key attribute of DEPENDENT;

    The primary key of the DEPENDENT relation is the combination {empId,DEPENDENT_NAME} because DEPENDENT_NAME is the partial key of

    DEPENDENT.

  • 8/13/2019 Database Lecture Note

    45/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 17

    Step 3: Mapping of Binary 1:1 Relationship Types.For each binary 1:1 relationship type R in the ER schema, identify the relations S and T that

    correspond to the entity types participating in R. Choose one of the relations-S, say-and include

    as a foreign key in S the primary key of T. It is better to choose an entity type with total

    participation in R in the role of S. Include all the simple attributes (or simple components of

    composite attributes) of the 1:1 relationship type R as attributes of S.

    In our example, we map the 1:1 relationship type MANAGES by choosing theparticipating entity type DEPARTMENT to serve in the role of S, because its

    participation in the MANAGES relationship type is total (every department has a

    manager). We include the primary key of the EMPLOYEE relation as foreign key in the

    DEPARTMENT relation and rename it MGRSSN. We also include the simple attributeSTARTDATE of the MANAGES relationship type in the DEPARTMENT relation and

    rename it MGRSTARTDATE.

    Note that it is possible to include the primary key of S as a foreign key in T instead. Inour example, this amounts to having a foreign key attribute, say

    DEPARTMENT_MANAGED in the EMPLOYEE relation, but it will have a null value

    for employee tuples who do not manage a department.

    Step 4: Mapping of Binary 1:N Relationship Types.For each regular binary 1:N relationship type R, identify the relation S that represents the

    participating entity type at the N-side of the relationship type. Include as foreign key in S the

    primary key of the relation T that represents the other entity type participating in R; this is done

    because each entity instance on the N-side is related to at most one entity instance on the 1-side

    of the relationship type. Include any simple attributes (or simple components of composite

    attributes) of the 1:N relationship type as attributes of S.

    In our example, we now map the 1:N relationship types WORKS_FOR, CONTROLS,and SUPERVISION .

    For WORKS_FOR we include the primary key DNUMBER of the DEPARTMENTrelation as foreign key in the EMPLOYEE relation and call it DNO.

  • 8/13/2019 Database Lecture Note

    46/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 18

    For SUPERVISION we include the primary key of the EMPLOYEE relation as foreignkey in the EMPLOYEE relation itself because the relationship is recursive-and call it

    SUPERSSN.

    The CONTROLS relationship is mapped to the foreign key attribute DNUM ofPROJECT, which references the primary key DNUMBER of the DEPARTMENT

    relation.

    Step 5: Mapping of Binary M:N Relationship Types.For each binary M:N relationship type R, create a new relation S to represent R. Include as

    foreign key attributes in S the primary keys of the relations that represent the participating entity

    types; their combination will form the primary key of S. Also include any simple attributes of the

    M:N relationship type (or simple components of composite attributes) as attributes of S. Notice

    that we cannot represent an M:N relationship type by a single foreign key attribute in one of the

    participating relations (as we did for 1:1 or I:N relationship types) because of the M:N

    cardinality ratio; we must create a separate relationship relationS.

    In our example, we map the M:N relationship type WORKS_ON by creating therelation WORKS_ON.

    We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keysin WORKS_ON and rename them PNO and ESSN, respectively.

    We also include an attribute HOURS in WORKS_ON to represent the HOURS attributeof the relationship type. The primary key of the WORKS_ON relation is the combination

    of the foreign key attributes {ESSN, PNO}.

    Step 6: Mapping of Multivalued Attributes.For each multivalued attribute A, create a new relation R. This relation R will include an

    attribute corresponding to A, plus the primary key attribute K-as a foreign key in R-of the

    relation that represents the entity type or relationship type that has A as an attribute. The primary

    key of R is the combination of A and K. If the multivalued attribute is composite, we include its

    simple components.

  • 8/13/2019 Database Lecture Note

    47/106

    Fundamental of Database Systems: Relational Data Model & ER-Model

    Page 19

    In our example, we create a relation DEPT_LOCATIONS. The attribute DLOCATIONrepresents the multivalued attribute LOCATIONS of DEPARTMENT, while

    DNUMBER-as foreign key represents the primary key of the DEPARTMENT relation.

    The primary key of DEPT_LOCATIONS is the combination of {DNUMBER,

    DLOCATION}. A separate tuple will exist in DEPT_LOCATIONS for each location that

    a department has.

    Step 7: Mapping of N-ary Relationship Types.For each n-ary relationship type R, where n > 2, create a new relation S to represent R. Include as

    foreign key attributes in S the primary keys of the relations that represent the participating entity

    types. Also include any simple attributes of the n-ary relationship type (or simple components of

    composite attributes) as attributes of S. The primary key of S is usually a combination of all theforeign keys that reference the relations representing the participating entity types.

    For example, consider the relationship type SUPPLY of the Figure above. This can be mapped to

    the relation SUPPLY shown in below; whose primary key is the combination of the three foreign

    keys {SNAME, PARTNO, PROJNAME}.

  • 8/13/2019 Database Lecture Note

    48/106

    Fundamental of Database Systems: Functional Dependency and Normalization

    Page 1

    CHAPTER FOUR

    Functional Dependency and Normalization

    So far, we have assumed that attributes are grouped to form a relation schema by using the common sense

    of the database designer or by mapping a database schema design from a conceptual data model such as

    the ER or some other conceptual data model. These models make the designer identify entity types and

    relationship types and their respective attributes, which leads to a natural and logical grouping of the

    attributes into relations when the mapping procedures in Chapter 3 are followed. However, we still need

    some formal measure of why one grouping of attributes into a relation schema may be better than another.

    So far in our discussion of conceptual design and its mapping into the relational model, we have not

    developed any measure of appropriateness or "goodness" to measure the quality of the design, other than

    the intuition of the designer. In this chapter we discuss some of the theory that has been developed with

    the goal of evaluating relational schemas for design quality-that is, to measure formally why one set of

    groupings of attributes into relation schemas is better than another.

    In this chapter, the concept of functional dependency, a formal constraint among attributes that is the

    main tool for formally measuring the appropriateness of attribute groupings into relation schemas is

    defined. We show how functional dependencies can be used to group attributes into relation schemas that

    are in a normal form. A relation schema is in a normal form when it satisfies certain desirable properties.

    The process of normalization consists of analyzing relations to meet increasingly more stringent normal

    forms leading to progressively better groupings of attributes. Normal forms are specified in terms offunctional dependencies-which are identified by the database designer-and key attributes of relation

    schemas.

    Informal Measures to goodness of a Relation

    The ease with which the meaning of a relation's attributes can be explained is an informal measure of how

    well the relation is designed.

    GUIDELINE 1. Design a relation schema so that it is easy to explain its meaning. Do not combine

    attributes from multiple entity types and relationship types into a single relation. Intuitively, if a relation

    schema corresponds to one entity type or one relationship type, it is straightforward to explain its

    meaning. Otherwise, if the relation corresponds to a mixture of multiple entities and relationships,

    semantic ambiguities will result and the relation cannot be easily explained.

  • 8/13/2019 Database Lecture Note

    49/106

    Fundamental of Database Systems: Functional Dependency and Normalization

    Page 2

    GUIDELINE 2. Design the base relation schemas so that no insertion, deletion, or modification

    anomalies are present in the relations. If any anomalies are present, note them clearly and make sure that

    the programs that update the database will operate correctly.

    GUIDELINE 3. As far as possible, avoid placing attributes in a base relation whose values may

    frequently be null. If nulls are unavoidable, make sure that they apply in exceptional cases only and do

    not apply to a majority of tuples in the relation.

    Normalization

    A relational database is merely a collection of data, organized in a particular manner. As the father of the

    relational database approach, Codd created a series of rules called normal forms that help define that

    organization.

    Database normalization is a series of steps followed to obtain a database design that allows for consistent

    storage and efficient access of data in a relational database. These steps reduce data redundancy and the

    risk of data becoming inconsistent.

    Normalization is the process of identifying the logical associations between data items and designing a

    database that will represent such associations but without suffering the update anomalies which are;

    Insertion Anomalies, Deletion Anomalies and Modification Anomalies.

    All the normalization rules will eventually remove the update anomalies that may exist during data

    manipulation after the implementation. The underlying ideas in normalization are simple enough.

    Through normalization we want to design for our relational database a set of tables that;

    Contain all the data necessary for the purposes that the database is to serve, Have as little redundancy as possible, Accommodate multiple values for types of data that require them, Permit efficient updates of the data in the database, and Avoid the danger of losing data unknowingly

    Normalization may reduce system performance since data will be cross referenced from many tables.

    Thus denormalization is sometimes used to improve performance, at the cost of reduced consistency

    guarantees.

  • 8/13/2019 Database Lecture Note

    50/106

    Fundamental of Database Systems: Functional Dependency and Normalization

    Page 3

    Drawbacks of Normalization

    Requires data to see the problems May reduce performance of the system

    Is time consuming, Difficult to design and apply and Prone to human error

    The type of problems that could occur in insufficiently normalized table is called update anomalies which

    includes;

    1. Insertion anomaliesAn "insertion anomaly" is a failure to place information about a new database entry into all the places in

    the database where information about that new entry needs to be stored. In a properly normalized

    database, information about a new entry needs to be inserted into only one place in the database; in an

    inadequately normalized database, information about a new entry may need to be inserted into more than

    one place and, human fallibility being what it is, some of the needed additional insertions may be missed.

    2. Deletion anomaliesA "deletion anomaly" is a failure to remove information about an existing database entry when it is time

    to remove that entry. In a properly normalized database, information about an old, to-be-gotten-rid-of

    entry needs to be deleted from only one place in the database; in an inadequately normalized database,

    information about that old entry may need to be deleted from more than one place, and, human fallibility

    being what it is, some of the needed additional deletions may be missed.

    3. Modification anomaliesA modification of a database involves changing some value of the attribute of a table. In a properly

    normalized database table, whatever information is modified by the user, the change will be effected and

    used accordingly.

    The purpose of normalization is to reduce the chances for anomalies to occur in a database.

  • 8/13/2019 Database Lecture Note

    51/106

    Fundamental of Database Systems: Functional Dependency and Normalization

    Page 4

    Example of problems related with Anomalies

    EmpID FName LName SkillID Skill SkillType School SchoolAddress Skill

    Level

    12 Abebe Mekuria 2 SQL Database AAU Sidist_Kilo 5

    16 Lemma Alemu 5 C++ Programming Unity Gerji 6

    28 Chane Kebede