(a) what is metadata ? differentiate between …database engine the core service for storing,...

Tulsiramji Gaikwad-Patil College of Engineering amp Technology

Department of MCA

Question paper Solution

Winter-17

Academic Session 2018 ndash 2019

Subject DBMS

MCA-1st year (Sem II)

EITHER

(a) What is metadata Differentiate between conventional file processing

system and database system Ans

Metadata is data that describes other data Meta is a prefix that in most information technology

usages means an underlying definition or description

Metadata summarizes basic information about data which can make finding and working with

particular instances of data easier For example author date created and date modified and file

size are examples of very basic document metadata Having the abilty to filter through that

metadata makes it much easier for someone to locate a specific document

In addition to document files metadata is used for images videos spreadsheets and web pages

The use of metadata on web pages can be very important Metadata for web pages contain

descriptions of the pagersquos contents as well as keywords linked to the content These are usually

expressed in the form of metatags The metadata containing the web pagersquos description and

summary is often displayed in search results by search engines making its accuracy and details

very important since it can determine whether a user decides to visit the site or not Metatags are

often evaluated by search engines to help decide a web pagersquos relevance and were used as the

key factor in determining position in a search until the late 1990s The increase in search engine

optimization (SEO) towards the end of the 1990s led to many websites ldquokeyword stuffingrdquo their

metadata to trick search engines making their websites seem more relevant than others Since

then search engines have reduced their reliance on metatags though they are still factored in

when indexing pages Many search engines also try to halt web pagesrsquo ability to thwart their

system by regularly changing their criteria for rankings with Google being notorious for

frequently changing their highly-undisclosed ranking algorithms

Metadata can be created manually or by automated information processing Manual creation

tends to be more accurate allowing the user to input any information they feel is relevant or

needed to help describe the file Automated metadata creation can be much more elementary

usually only displaying information such as file size file extension when the file was created

and who created the file

The difference between file processing system and database approach is as follow

File based system Database system

1 The data and program are inter- dependent 1 The data and program are independent of

each other

2 File-based system caused data redundancy

The data may be duplicated in different files

2 Database system control data redundancy

The data appeared only once in the system

3 File ndashbased system caused data inconsistency

The data in different files may be different that

cause data inconsistency

3 In database system data always consistent

Because data appeared only once

4 The data cannot be shared because data is

distributed in different files

4 In database data is easily shared because data

is stored at one place

5 In file based system data is widely spread Due

to this reason file based system provides poor

security

5 It provides many methods to maintain data

security in the database

6 File based system does not provide consistency

constrains

6 Database system provides a different

consistency constrains to maintain data

integrity in the system

7 File based system is less complex system 7 Database system is very complex system

8 The cost of file processing system is less then

database system

8 The cost of database system is much more

than a file processing system

9 File based system takes much space in the

system and memory is wasted in this approach

9 Database approach store data more

efficiently it takes less space in the system and

memory is not wasted

10 To generate different report to take a crucial

decision is very difficult in file based system

10 The report can be generated very easily in

required format in database system Because

data in database is stored in an organized

manner And easily retrieve to generate

different report

11 File based system does not provide

concurrency facility

11 Database system provides concurrency

facility

12 File based system does not provide data

atomicity functionality

12 Database system provides data atomicity

functionality

13 The cost of file processing system is less than

database system

13 The cost of database system is more than

file processing system

14 It is difficult to maintain as it provides less

controlling facility

14 Database provides many facility to maintain

program

15 If one application fail it does not affects other

files in system

15 If database fail it affects all application that

dependent on database

16 Hardware cost is less than database system 16 Hardware cost is high in database than file

system

(b) What is Database Management System Explain the components of

Database Management

System

Organizations employ Database Management Systems (or DBMS) to help them effectively

manage their data and derive relevant information out of it A DBMS is a technology tool that

directly supports data management It is a package designed to define manipulate and manage

data in a database

Some general functions of a DBMS

Designed to allow the definition creation querying update and administration of databases

Define rules to validate the data and relieve users of framing programs for data maintenance

Convert an existing database or archive a large and growing one

Run business applications which perform the tasks of managing business processes interacting

with end-users and other applications to capture and analyze data

Some well-known DBMSs are Microsoft SQL Server Microsoft Access Oracle SAP and

others

Components of DBMS

DBMS have several components each performing very significant tasks in the database

management system environment Below is a list of components within the database and its

environment

Software

This is the set of programs used to control and manage the overall database This includes the

DBMS software itself the Operating System the network software being used to share the data

among users and the application programs used to access data in the DBMS

Hardware Consists of a set of physical electronic devices such as computers IO devices storage devices

etc this provides the interface between computers and the real world systems

Data DBMS exists to collect store process and access data the most important component The

database contains both the actual or operational data and the metadata

Procedures These are the instructions and rules that assist on how to use the DBMS and in designing and

running the database using documented procedures to guide the users that operate and manage

Database Access Language

This is used to access the data to and from the database to enter new data update existing data

or retrieve required data from databases The user writes a set of appropriate commands in a

database access language submits these to the DBMS which then processes the data and

generates and displays a set of results into a user readable form

Query Processor

This transforms the user queries into a series of low level instructions This reads the online

userrsquos query and translates it into an efficient series of operations in a form capable of being sent

to the run time data manager for execution

Run Time Database Manager Sometimes referred to as the database control system this is the central software component of

the DBMS that interfaces with user-submitted application programs and queries and handles

database access at run time Its function is to convert operations in userrsquos queries It provides

control to maintain the consistency integrity and security of the data

Data Manager Also called the cache manger this is responsible for handling of data in the database providing a

recovery to the system that allows it to recover the data after a failure

Database Engine The core service for storing processing and securing data this provides controlled access and

rapid transaction processing to address the requirements of the most demanding data consuming

applications It is often used to create relational databases for online transaction processing or

online analytical processing data

Data Dictionary This is a reserved space within a database used to store information about the database itself A

data dictionary is a set of read-only table and views containing the different information about

the data used in the enterprise to ensure that database representation of the data follow one

standard as defined in the dictionary

Report Writer Also referred to as the report generator it is a program that extracts information from one or

more files and presents the information in a specified format Most report writers allow the user

to select records that meet certain conditions and to display selected fields in rows and columns

or also format the data into different charts

(c) Explain three level architecture proposal for DBMS8

In the previous tutorial we have seen the DBMS architecture ndash one-tier two-tier and three-tier In

this guide we will discuss the three level DBMS architecture in detail

DBMS Three Level Architecture Diagram

This architecture has three levels

1 External level

2 Conceptual level

3 Internal level

1 External level

It is also called view level The reason this level is called ldquoviewrdquo is because several users can

view their desired data from this level which is internally fetched from database with the help of

conceptual and internal level mapping

The user doesnrsquot need to know the database schema details such as data structure table definition

etc user is only concerned about data which is what returned back to the view level after it has

been fetched from database (present at the internal level)

External level is the ldquotop levelrdquo of the Three Level DBMS Architecture

2 Conceptual level

It is also called logical level The whole design of the database such as relationship among data

schema of data etc are described in this level

Database constraints and security are also implemented in this level of architecture This level is

maintained by DBA (database administrator)

3 Internal level

This level is also known as physical level This level describes how the data is actually stored in

the storage devices This level is also responsible for allocating space to the data This is the

lowest level of the architecture

(d) Explain

(i) Data Independence

o Data independence can be explained using the three-schema architecture

o Data independence refers characteristic of being able to modify the schema at one level

of the database system without altering the schema at the next higher level

There are two types of data independence

1 Logical Data Independence

o Logical data independence refers characteristic of being able to change the conceptual

schema without having to change the external schema

o Logical data independence is used to separate the external level from the conceptual

o If we do any changes in the conceptual view of the data then the user view of the data

would not be affected

o Logical data independence occurs at the user interface level

2 Physical Data Independence

o Physical data independence can be defined as the capacity to change the internal schema

without having to change the conceptual schema

o If we do any changes in the storage size of the database system server then the

Conceptual structure of the database will not be affected

o Physical data independence is used to separate conceptual levels from the internal levels

o Physical data independence occurs at the logical interface level

(ii) Data Integration Ans

Data integration involves combining data residing in different sources and providing users with

a unified view of them[1] This process becomes significant in a variety of situations which

include both commercial (such as when two similar companies need to merge their databases)

and scientific (combining research results from different bioinformatics repositories for

example) domains Data integration appears with increasing frequency as the volume (that is big

data[2]) and the need to share existing data explodes[3] It has become the focus of extensive

theoretical work and numerous open problems remain unsolved Data integration encourages

collaboration between internal as well as external users

Figure 1 Simple schematic for a data warehouse The Extract transform load (ETL) process

extracts information from the source databases transforms it and then loads it into the data

warehouse

Figure 2 Simple schematic for a data-integration solution A system designer constructs a

mediated schema against which users can run queries The virtual databaseinterfaces with the

source databases via wrapper code if required

Issues with combining heterogeneous data sources often referred to as information silos under a

single query interface have existed for some time In the early 1980s computer scientists began

designing systems for interoperability of heterogeneous databases[4] The first data integration

system driven by structured metadata was designed at the University of Minnesota in 1991 for

the Integrated Public Use Microdata Series (IPUMS) IPUMS used a data warehousing approach

which extracts transforms and loads data from heterogeneous sources into a single

view schema so data from different sources become compatible[5] By making thousands of

population databases interoperable IPUMS demonstrated the feasibility of large-scale data

integration The data warehouse approach offers a tightly coupled architecture because the data

are already physically reconciled in a single queryable repository so it usually takes little time to

resolve queries[6]

The data warehouse approach is less feasible for data sets that are frequently updated requiring

the extract transform load(ETL) process to be continuously re-executed for synchronization

Difficulties also arise in constructing data warehouses when one has only a query interface to

summary data sources and no access to the full data This problem frequently emerges when

integrating several commercial query services like travel or classified advertisement web

applications

As of 2009 the trend in data integration favored loosening the coupling between data[citation

needed] and providing a unified query-interface to access real time data over a mediated schema

(see Figure 2) which allows information to be retrieved directly from original databases This is

consistent with the SOA approach popular in that era This approach relies on mappings between

the mediated schema and the schema of original sources and transforming a query into

specialized queries to match the schema of the original databases Such mappings can be

specified in two ways as a mapping from entities in the mediated schema to entities in the

original sources (the Global As View (GAV) approach) or as a mapping from entities in the

original sources to the mediated schema (the Local As View (LAV) approach) The latter

approach requires more sophisticated inferences to resolve a query on the mediated schema but

makes it easier to add new data sources to a (stable) mediated schema

As of 2010 some of the work in data integration research concerns the semantic

integration problem This problem addresses not the structuring of the architecture of the

integration but how to resolve semantic conflicts between heterogeneous data sources For

example if two companies merge their databases certain concepts and definitions in their

respective schemas like earnings inevitably have different meanings In one database it may

mean profits in dollars (a floating-point number) while in the other it might represent the

number of sales (an integer) A common strategy for the resolution of such problems involves the

use of ontologies which explicitly define schema terms and thus help to resolve semantic

conflicts This approach represents ontology-based data integration On the other hand the

problem of combining research results from different bioinformatics repositories requires bench-

marking of the similarities computed from different data sources on a single criterion such as

positive predictive value This enables the data sources to be directly comparable and can be

integrated even when the natures of experiments are distinct[7]

As of 2011 it was determined that current data modeling methods were imparting data isolation

into every data architecture in the form of islands of disparate data and information silos This

data isolation is an unintended artifact of the data modeling methodology that results in the

development of disparate data models Disparate data models when instantiated as databases

form disparate databases Enhanced data model methodologies have been developed to eliminate

the data isolation artifact and to promote the development of integrated data models[8] One

enhanced data modeling method recasts data models by augmenting them with

structural metadata in the form of standardized data entities As a result of recasting multiple data

models the set of recast data models will now share one or more commonality relationships that

relate the structural metadata now common to these data models Commonality relationships are

a peer-to-peer type of entity relationships that relate the standardized data entities of multiple

data models Multiple data models that contain the same standard data entity may participate in

the same commonality relationship When integrated data models are instantiated as databases

and are properly populated from a common set of master data then these databases are

integrated

Since 2011 data hub approaches have been of greater interest than fully structured (typically

relational) Enterprise Data Warehouses Since 2013 data lakeapproaches have risen to the level

of Data Hubs (See all three search terms popularity on Google Trends[9]) These approaches

combine unstructured or varied data into one location but do not necessarily require an (often

complex) master relational schema to structure and define all data in the Hub

EITHER

(a) Explain E-R Model with suitable example

Ans It is a ldquotop-downrdquo approach

This data model allows us to describe how data is used in a real-world enterprise

an iterative process A team-oriented process with all business managers (or

designates) involved should validate with a ldquobottom-uprdquo approach Has three primary

components entity relationship attributes

Many notation methods Chen was the first to become established

The building blocks of E-R model are entities relationships and attributes

Entity An entity may be defined as a thing which is recognized as being capable of an

independent existence and which can be uniquely identified An entity is an abstraction from the

complexities of some domain When we speak of an entity we normally speak of some aspect of

the real world which can be distinguished from other aspects of the real world An entity may be

a physical object such as a house or a car an event such as a house sale or a car service or a

concept such as a customer transaction or order An entity-type is a category An entity strictly

speaking is an instance of a given entity-type There are usually many instances of an entity-

type Because the term entity-type is somewhat cumbersome most people tend to use the term

entity as a synonym for this term

Attributes It is a Characteristic of an entity Studentrsquos (entity) attributes student ID student

name address etc

Attributes are of various types

SimpleSingle Attributes

Composite Attributes

Multivalued attributes

Derived attributes

Relationship Relationship captures how two or more entities are related to one another

Relationships can be thought of as verbs linking two or more nouns Examples an owns

relationship between a company and a computer a supervises relationship between an employee

and a department a performs relationship between an artist and a song a proved relationship

between a mathematician and a theorem Relationships are represented as diamonds connected

by lines to each of the entities in the relationship Types of relationships are as follows

One to many 1lt------- M

Many to one M------1

Many to many M------M

Symbols and their meanings

Rectangles represent entity sets

Diamonds represent relationship sets

Lines link attributes to entity sets and entity sets to relationship sets

Ellipses represent attributes

Double ellipses represent Multivalued attributes

Dashed ellipses denote derived attributes

Underline indicates primary key attributes

Example

(b) Given Entity Customer with attributes customer_id(primary key) name(

first_name last_name middle_name) phone_number date_of_birth

address(citystatezip_codestreet)

Street(Street_namestreet_numberapartment_number

Entity relationship diagram displays the relationships of entity set stored in a database In other

words we can say that ER diagrams help you to explain the logical structure of databases At

first look an ER diagram looks very similar to the flowchart However ER Diagram includes

many specialized symbols and its meanings make this model unique

Sample ER

Diagram

Facts about ER Diagram Model

o ER model allows you to draw Database Design

o It is an easy to use graphical tool for modeling data

o Widely used in Database Design

o It is a GUI representation of the logical structure of a Database

o It helps you to identifies the entities which exist in a system and the relationships

between those entities

(b)Differentiate between Network and Hierarchical data model in DBMS

Ans Hierarchical model

1 One to many or one to one relationships

2 Based on parent child relationship

3 Retrieve algorithms are complex and asymmetric

4 Data Redundancy more

Network model

1 Many to many relationships

2 Many parents as well as many children

3 Retrieve algorithms are complex and symmetric

Relational model

1 One to OneOne to many Many to many relationships

2 Based on relational data structures

3 Retrieve algorithms are simple and symmetric

4 Data Redundancy less

(c)Draw E-R diagram on Library Management System

(d) State advantages and disadvantages of following file organizations

(i) Index-Sequential file

Sequential File Organization

1 A sequential file is designed for efficient processing of records in sorted order on some

search key

o Records are chained together by pointers to permit fast retrieval in search key

o Pointer points to next record in order

o Records are stored physically in search key order (or as close to this as possible)

o This minimizes number of block accesses

o Figure 1015 shows an example with bname as the search key

2 It is difficult to maintain physical sequential order as records are inserted and deleted

o Deletion can be managed with the pointer chains

o Insertion poses problems if no space where new record should go

o If space use it else put new record in an overflow block

o Adjust pointers accordingly

o Figure 1016 shows the previous example after an insertion

o Problem we now have some records out of physical sequential order

o If very few records in overflow blocks this will work well

o If order is lost reorganize the file

o Reorganizations are expensive and done when system load is low

3 If insertions rarely occur we could keep the file in physically sorted order and reorganize

when insertion occurs In this case the pointer fields are no longer required

The Sequential File

Fixed format used for records

Records are the same length

All fields the same (order and length)

Field names and lengths are attributes of the file

One field is the key filed

Uniquely identifies the record

Records are stored in key sequence

The Sequential File

New records are placed in a log file or transaction file

Batch update is performed to merge the log file with the master file

(ii) Direct file

Direct Access File System (DAFS) is a network file system similar to Network File System

(NFS) and Common Internet File System (CIFS) that allows applications to transfer data while

bypassing operating system control buffering and network protocol operations that can

bottleneck throughput DAFS uses the Virtual Interface (VI) architecture as its underlying

transport mechanism Using VI hardware an application transfers data to and from application

buffers without using the operating system which frees up the processor and operating system

for other processes and allows files to be accessed by servers using several different operating

systems DAFS is designed and optimized for clustered shared-file network environments that

are commonly used for Internet e-commerce and database applications DAFS is optimized for

high-bandwidth InfiniBand networks and it works with any interconnection that supports VI

including Fibre Channel and Ethernet

Network Appliance and Intel formed the DAFS Collaborative as an industry group to specify and

promote DAFS Today more than 85 companies are part of the DAFS

Collaborative

EITHER

(a) Explain tuple relational calculus

Relational Calculus

Relational calculus query specifies what is to be retrieved rather than how to retrieve it

No description of how to evaluate a query

In first-order logic (or predicate calculus) predicate is a truth-valued function

with arguments

When we substitute values for the arguments function yields an expression

called a proposition which can be either true or false

Relational Calculus

If predicate contains a variable (eg lsquox is a member of staffrsquo) there must be a range for x

When we substitute some values of this range for x proposition may be true for

other values it may be false

When applied to databases relational calculus has forms tuple and domain

Tuple Relational Calculus

Interested in finding tuples for which a predicate is true Based on use of tuple variables

Tuple variable is a variable that lsquoranges overrsquo a named relation ie variable

whose only permitted values are tuples of the relation

Specify range of a tuple variable S as the Staff relation as

Staff(S)

To find set of all tuples S such that P(S) is true

S | P(S)

Tuple Relational Calculus - Example

To find details of all staff earning more than $10000

S | Staff(S) Ssalary gt 10000

To find a particular attribute such as salary write

Ssalary | Staff(S) Ssalary gt 10000

Can use two quantifiers to tell how many instances the predicate applies to

Existential quantifier $ (lsquothere existsrsquo)

Universal quantifier (lsquofor allrsquo)

Tuple variables qualified by or $ are called bound variables otherwise called

free variables

Existential quantifier used in formulae that must be true for at least one instance such as

Staff(S) Ugrave ($B)(Branch(B) Ugrave

(BbranchNo = SbranchNo) Ugrave Bcity = lsquoLondonrsquo)

Means lsquoThere exists a Branch tuple with same branchNo as the branchNo of the current

Staff tuple S and is located in Londonrsquo

Universal quantifier is used in statements about every instance such as

(B) (Bcity lsquoParisrsquo)

Means lsquoFor all Branch tuples the address is not in Parisrsquo

Can also use ~($B) (Bcity = lsquoParisrsquo) which means lsquoThere are no branches with an

address in Parisrsquo

Formulae should be unambiguous and make sense

A (well-formed) formula is made out of atoms

R(Si) where Si is a tuple variable and R is a relation

Sia1 q Sja2

Sia1 q c

Can recursively build up formulae from atoms

An atom is a formula

If F1 and F2 are formulae so are their conjunction F1 Ugrave F2 disjunction

F1 Uacute F2 and negation ~F1

If F is a formula with free variable X then ($X)(F) and (X)(F) are also

formulae

Example - Tuple Relational Calculus

a) List the names of all managers who earn more than $25000

SfName SlName | Staff(S)

Sposition = lsquoManagerrsquo Ssalary gt 25000

b) List the staff who manage properties for rent in Glasgow

S | Staff(S) ($P) (PropertyForRent(P) (PstaffNo = SstaffNo) Ugrave Pcity = lsquoGlasgowrsquo)

Expressions can generate an infinite set For example

S | ~Staff(S)

To avoid this add restriction that all values in result must be values in the domain

of the expression

Data Manipulations in SQL

Select Update Delete Insert Statement

Basic Data retrieval

Condition Specification

Arithmetic and Aggregate operators

SQL Join Multiple Table Queries

Set Manipulation

Any In Contains All Not In Not Contains Exists Union Minus Intersect

Categorization

Updates

Creating Tables

Empty tables are constructed using the CREATE TABLE statement

Data must be entered later using INSERT

CREATE TABLE S ( SNO CHAR(5)

SNAME CHAR(20)

STATUS DECIMAL(3)

CITY CHAR(15)

PRIMARY KEY (SNO) )

Creating Tables

A table name and unique column names must be specified

Columns which are defined as primary keys will never have two rows with the same key

Primary key may consist of more than one column (values unique in combination)

called composite key

(b) Explain Data Manipulation in SQL

A data manipulation language (DML) is a computer programming language used for adding

(inserting) deleting and modifying (updating) data in a database A DML is often

a sublanguage of a broader database language such as SQL with the DML comprising some of

the operators in the language[1] Read-only selecting of data is sometimes distinguished as being

part of a separate data query language (DQL) but it is closely related and sometimes also

considered a component of a DML some operators may perform both selecting (reading) and

writing

A popular data manipulation language is that of Structured Query Language (SQL) which is

used to retrieve and manipulate data in a relational database[2] Other forms of DML are those

used by IMSDLI CODASYL databases such as IDMS and others

In SQL the data manipulation language comprises the SQL-data change statements[3] which

modify stored data but not the schema or database objects Manipulation of persistent database

objects eg tables or stored procedures via the SQL schema statements[3] rather than the data

stored within them is considered to be part of a separate data definition language (DDL) In SQL

these two categories are similar in their detailed syntax data types expressions etc but distinct

in their overall function[3]

The SQL-data change statements are a subset of the SQL-data statements this also contains

the SELECT query statement[3] which strictly speaking is part of the DQL not the DML In

common practice though this distinction is not made and SELECT is widely considered to be

part of DML[4] so the DML consists of all SQL-datastatements not only the SQL-data

change statements The SELECT INTO form combines both selection and manipulation

and thus is strictly considered to be DML because it manipulates (ie modifies) data

Data manipulation languages have their functional capability organized by the initial word in a

statement which is almost always a verb In the case of SQL these verbs are

SELECT FROM WHERE (strictly speaking DQL)

SELECT INTO

INSERT INTO VALUES

UPDATE SET WHERE

DELETE FROM WHERE

For example the command to insert a row into table employees

INSERT INTO employees (first_name last_name fname) VALUES (John Capita

xcapit00)

(c) Explain following integrity rules

(i) Entity Integrity

Integrity Rules are imperative to a good database design Most RDBMS have

these rules automatically but it is safer to just make sure that the rules are

already applied in the design There are two types of integrity mentioned in

integrity rules entity and reference Two additional rules that arent

necessarily included in integrity rules but are pertinent to database designs

are business rules and domain rules

Entity integrity exists when each primary key within a table has a value that

is unique this ensures that each row is uniquely identified by the primary

keyOne requirement for entity integrity is that a primary key cannot have a

null value The purpose of this integrity is to have each row to have a unique

identity and foreign key values can properly reference primary key values

Theta Join

In theta join we apply the condition on input relation(s) and then only those

selected

rows are used in the cross product to

be merged and included in the output It means

that in normal cross product all the rows of one relation are mappedmerged

with all

the rows of second relation but here only selected rows of

a relation are made cross

product with second relation It is denoted as unde

If R and S are two relations then is the condition which is applied for select

operation on one relation and then only selected rows are cross product with all the

rows of second relation For Example there are two relations of FACULTY and

COURSE now we will first apply select operation on the FACULTY relation for

selection certain specific rows then these rows will have across product with

COURSE relation so this is the difference in between cross product and theta join

We will now see first both the relation their different attributes and then finally the

cross product after carrying out select operation on relation

From this example the difference in between cross product and theta join become

(i) Referential Integrity

Referential integrity refers to the accuracy and consistency of data within a relationship

In relationships data is linked between two or more tables This is achieved by having

the foreign key (in the associated table) reference a primary key value (in the primary ndash or

parent ndash table) Because of this we need to ensure that data on both sides of the relationship

remain intact

So referential integrity requires that whenever a foreign key value is used it must reference a

valid existing primary key in the parent table

Example

For example if we delete record number 15 in a primary table we need to be sure that therersquos no

foreign key in any related table with the value of 15 We should only be able to delete a primary

key if there are no associated records Otherwise we would end up with an orphaned record

Here the related table contains a foreign key value that doesnrsquot exist in the primary key field of

the primary table (ie the ldquoCompanyIdrdquo field) This has resulted in an ldquoorphaned recordrdquo

So referential integrity will prevent users from

Adding records to a related table if there is no associated record in the primary table

Changing values in a primary table that result in orphaned records in a related table

Deleting records from a primary table if there are matching related records

Consequences of a Lack of Referential Integrity

A lack of referential integrity in a database can lead to incomplete data being returned usually

with no indication of an error This could result in records being ldquolostrdquo in the database because

theyrsquore never returned in queries or reports

It could also result in strange results appearing in reports (such as products without an associated

company)

Or worse yet it could result in customers not receiving products they paid for

Worse still it could affect life and death situations such as a hospital patient not receiving the

correct treatment or a disaster relief team not receiving the correct supplies or information

Data Integrity

Referential integrity is a subset of data integrity which is concerned with the accuracy and

consistency of all data (relationship or otherwise) Maintaining data integrity is a crucial part of

working with databases

(d)Explain following domain in details with example

AnsDefinition The domain of a database attribute is the set of all allowable values that

attribute may assume

Examples

A field for gender may have the domain male female unknown where those three values are

the only permitted entries in that column

In data management and database analysis a data domain refers to all the unique values which

a data element may contain The rule for determining the domain boundary may be as simple as

a data type with an enumerated list of values[1]

For example a database table that has information about people with one record per person

might have a gender column This gender column might be declared as a string data type and

allowed to have one of two known code values M for male F for femalemdashand NULL for

records where gender is unknown or not applicable (or arguably U for unknown as a sentinel

value) The data domain for the gender column is M F

In a normalized data model the reference domain is typically specified in a reference table

Following the previous example a Gender reference table would have exactly two records one

per allowed valuemdashexcluding NULL Reference tables are formally related to other tables in a

database by the use of foreign keys

Less simple domain boundary rules if database-enforced may be implemented through a check

constraint or in more complex cases in a database trigger For example a column requiring

positive numeric values may have a check constraint declaring that the values must be greater

than zero

This definition combines the concepts of domain as an area over which control is exercised and

the mathematical idea of a set of values of an independent variable for which a function is

defined

(ii) Degree and cardinality

The degree of relationship (also known as cardinality) is the number of occurrences in one

entity which are associated (or linked) to the number of occurrences in another

There are three degrees of relationship known as

1 one-to-one (11)

2 one-to-many (1M)

3 many-to-many (MN)

The latter one is correct it is MN and not MM

One-to-one (11)

This is where one occurrence of an entity relates to only one occurrence in another entityA one-

to-one relationship rarely exists in practice but it can However you may consider combining

them into one entity

For example an employee is allocated a company car which can only be driven by that

employee

Therefore there is a one-to-one relationship between employee and company car

One-to-Many (1M)

Is where one occurrence in an entity relates to many occurrences in another entityFor example

taking the employee and department entities shown on the previous page an employee works in

one department but a department has many employees

Therefore there is a one-to-many relationship between department and employee

Many-to-Many (MN)

This is where many occurrences in an entity relate to many occurrences in another entity

The normalisation process discussed earlier would prevent any such relationships but the

definition is included here for completeness

As with one-to-one relationships many-to-many relationships rarely exist Normally they occur

because an entity has been missed

For example an employee may work on several projects at the same time and a project has a

team of many employees

Therefore there is a many-to-many relationship between employee and project

EITHER

(a) Explain DBTG Data Manipulation

Ans The acronym DBTG refers to the Data Base Task Group of the Conference on

Data Systems Languages (CODASYL) the group responsible for standardization of the

programming language COBOL The DBTG final report appeared in Apri1971 it

introduced a new distinct and self-contained language The DBTG is intended to meet the

requirements of many distinct programming languages not just COBOL the user in a

DBTG system is considered to be an ordinary application programmer and the language

therefore is not biased toward any single specific programming language

(b) It is based on network model In addition to proposing a formal notation for networks (the

Data Definition Language or DDL) the DBTG has proposed a Subschema Data

Definition Language (Subschema DDL) for defining views of conceptual scheme that

was itself defined using the Data Definition Language It also proposed a Data

Manipulation Language (DML) suitable for writing applications programs that

manipulate the conceptual scheme or a view

(c) Architecture of DBTG Model

(d) The architecture of a DBTG system is illustrated in Figure

(e) The architecture of DBTG model can be divided in three different levels as the

architecture of a database system These are

(f) bull Storage Schema (corresponds to Internal View of database)

(g) bull Schema (corresponds to Conceptual View of database)

(h) bull Subschema (corresponds to External View of database)

(i) Storage Schema

(j) The storage structure (Internalmiddot View) of the database is described by the storage schema

written in a Data Storage Description Language (DSDL)

(k) Schema

(l) In DBTG the Conceptual View is defined by the schema The schema consists

essentially of definitions of the various type of record in the database the data-items they

contain and the sets into which they are grouped (Here logical record types aremiddot referred

to as record types the fields in a logical record format are called data items)

(m) Subschema

(n) The External view (not a DBTG term) is defined by a subschema A subschema consists

essentially of a specification of which schema record types the user is interested in which

schema data-items he or she wishes to see in those records and which schema

relationships (sets) linking those records he or she wishes to consider By default all

other types of record data-item and set are excluded

(o) In DBTG model the users are application programmers writing in an ordinary

programming language such as COBOL that has been extended to include the DBTG

data manipulation language Each application program invokes the corresponding

subschema using the COBOL Data Base Facility for example the programmer simply

specifies the name of the required subschema in the Data Division of the program This

invocation provides the definition of the user work area (UWA) for that program The

UWA contains a distinct location for each type of record (and hence for each type (data-

item) defined in the subschema The program may refer to these data-item and record

locations by the names defined in the subschema

EITHER

(a) Define Normalization Explain first and second normal form Ans Normalization The process of decomposing unsatisfactory bad relations by

breaking up their attributes into smaller relations

Normalization is carried out in practice so that the resulting designs are of high quality

and meet the desirable properties

Normalization in industry pays particular attention to

normalization up to 3NF BCNF or 4NF

We will pay particular attention up to 3NF

NF2 non-first normal form

1NF R is in 1NF iff all domain values are atomic2

2NF R is in 2 NF iff R is in 1NF and every nonkey attribute is fully dependent on the

3NF R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent

on the key

Unnormalized Form (UNF)

A table that contains one or more repeating groups

To create an unnormalized table

transform data from information source (eg form) into table format with columns

and rows

First Normal Form (1NF)

A relation in which intersection of each row and column contains one and only one value

If a table of data meets the definition of a relation it is in first normal form

Every relation has a unique name

Every attribute value is atomic (single-valued)

Every row is unique

Attributes in tables have unique names

The order of the columns is irrelevant

The order of the rows is irrelevant

UNF to 1NF

Nominate an attribute or group of attributes to act as the key for the unnormalized table

Identify repeating group(s) in unnormalized table which repeats for the key attribute(s)

Remove repeating group by

entering appropriate data into the empty columns of rows containing repeating

data (lsquoflatteningrsquo the table)

placing repeating data along with copy of the original key attribute(s) into a

separate relation

Second Normal Form (2NF)

Based on concept of full functional dependency

A and B are attributes of a relation

B is fully dependent on A if B is functionally dependent on A but not on any

proper subset of A

2NF - A relation that is in 1NF and every non-primary-key attribute is fully

functionally dependent on the primary key

1NF and no partial functional dependencies

Partial functional dependency when one or more non-key attributes are functionally

dependent on part of the primary key

Every non-key attribute must be defined by the entire key not just by part of the key

If a relation has a single attribute as its key then it is automatically in 2NF

1NF to 2NF

Identify primary key for the 1NF relation

Identify functional dependencies in the relation

If partial dependencies exist on the primary key remove them by placing them in a new

relation along with copy of their determinant

Third Normal Form (3NF)

2NF and no transitive dependencies

Transitive dependency a functional dependency between two or more non-key attributes

Based on concept of transitive dependency

A B and C are attributes of a relation such that if A Ugrave B and B Ugrave C then C is

transitively dependent on A through B (Provided that A is not functionally

dependent on B or C)

3NF - A relation that is in 1NF and 2NF and in which no non-primary-key

attribute is transitively dependent on the primary key

(c)Explain multivalued dependency with suitable example

As normalization proceeds relations become progressively more restricted

(stronger) in format and also less vulnerable to update anomalies

1 NF2 non-first normal form

2 1NF R is in 1NF iff all domain values are atomic2

3 2NF R is in 2 NF iff R is in 1NF and every nonkey attribute is fully dependent on

the key

4 3NF R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively

dependent on the key

5 BCNF R is in BCNF iff every determinant is a candidate key

6 Determinant an attribute on which some other attribute is fully functionally

dependent

Fourth Normal Form

Fourth normal form (or 4NF) requires that there are no non-trivial multi-valued dependencies

of attribute sets on something other than a superset of a candidate key A table is said to be in

4NF if and only if it is in the BCNF and multi-valued dependencies are functional

dependencies The 4NF removes unwanted data structures multi-valued dependencies

There is no Multivalued dependency in the relation

There are Multivalued dependency but the attributes are dependent between themselves

Either of these conditions must hold true in order to be fourth normal form

The relation must also be in BCNF Fourth normal form differs from BCNF only in that it

uses Multivalued dependencies

(d) What are inference axioms Explain its significance in Relational

Database Design

Ans Inference Axioms (A-axioms or Armstrongrsquos Axioms)

An inference axiom is a rule that states if a relation satisfies certain FDs then it must satisfy

certain other FDs

F1 Reflexivity X X

F2 Augmentation If (Z W X Y) then XW YZ

F3 Additivity If (X Y) (X Z) then X YZ

F4 Projectivity If (X YZ) then X Y

F5 Transitivity If (X Y) and (Y Z) then (X Z)

F6 Pseudotransitivity If (X Y) and (YZ W) then XZ W

Examples of the use of Inference Axioms

[From Ullman]

1 Consider R = (Street Zip City) F = City Street Zip Zip City

We want to show Street Zip Street Zip City

1 Zip City ndash Given

2 Street Zip Street City ndash Augmentation of (1) by Street

3 City Zip Zip ndash Given

4 City Street City Street Zip ndash Augmentation of (3) by City Street

5 Street Zip City Street Zip ndash Transitivity

[From Maier]

1 Let R = (ABCDEGHI) F = AB E AG J BE I E G GI H

Show that AB GH is derived by F

1 AB E - Given

2 AB AB ndash Reflexivity

3 AB B - Projectivity from (2)

4 AB BE ndash Additivity from (1) and (3)

5 BE I - Given

6 AB I ndash Transitivity from (4) and (5)

7 E G ndash Given

8 AB G ndash Transitivity from (1) and (7)

9 AB GI ndash Additivity from (6) and (8)

10 GI H ndash Given

11 AB H ndash Transitivity from (9) and (10)

12 AB GH ndash Additivity from (8) and (11)

Significance in Relational Database design A database structure commonly used in GIS in

which data is stored based on 2 dimensional tables where multiple relationships between data

elements can be defined and established in an ad-hoc manner elational Database Management

System - a database system made up of files with data elements in two-dimensional array (rows

and columns) This database management system has the capability to recombine data elements

to form different relations resulting in a great flexibility of data usage

A database that is perceived by the user as a collection of twodimensional tables

bull Are manipulated a set at a time rather than a record at a time

bull SQL is used to manipulate relational databases Proposed by Dr Codd in 1970

bull The basis for the relational database management system (RDBMS)

bull The relational model contains the following components

bull Collection of objects or relations

bull Set of operations to act on the relations

EITHER

(a) What is deadlock How can it be avoided How can it be

resolved once it occurs Ans A deadlock occurs when two different users or transactions require access to data that

is being locked by the other user It can be avoided in 2 ways 1 is to set measures which

prevent deadlocks from happening and 2 is to set ways in which to break the deadlock

after it happens One way to prevent or to avoid deadlocks is to require the user to request

all necessary locks atone time ensuring they gain access to everything they need or

nothing Secondly sometimes they can be avoided by setting resource access order

meaning resources must be locked in a certain order to prevent such instances Essentially

once a deadlock does occur the DBMS must have a method for detecting the deadlock

and then to resolve it the DBMS must select a transaction to cancel and revert the entire

transaction until the resources required become available allowing one transaction to

complete while the other has to be reprocessed at a later time 921 Explain the meaning

of the expression ACID transaction

CID means Atomic Consistency Isolation Durability so when any transaction happen it

should be Atomic that is it should either be complete or fully incomplete There should n

ot be anything like Semi complete The Database State should remain consistent after the

completion of the transaction If there are more than one Transaction then the transaction

should be scheduled in such a fashion that they remain in Isolation of one another Durabi

lity means that Once a transaction commits its effects will persist even if there are syste

m failures924 What is the purpose of transaction isolation levelsTransaction isolation

levels effect how the database is to operate while transactions are in process of being

changed Itrsquos purpose is to ensure consistency throughout the database for example if I

am changing a row which effects the calculations or outputs of several other rows then

all rows that are effected or possibly effected by a change in the row Irsquom working on will

be locked from changes until I am complete with my change This isolates the change and

ensures that the data interaction remains accurate and consistent and is known as

transaction level consistencyThe transaction being changed which may effect serveral

other pieces of data or rows of input could also effect how those rows are read So lets

say Irsquom processing a change to the tax rate inmy state so my store clerk shouldnrsquot be able

to read the total cost of a blue shirt because the total cost row is effected by any changes in

the tax rate row Essentially how you deal with the reading and viewing of data while a

change is being processed but hasnrsquot been committed is known as the transaction

isolation level Itrsquos purpose is to ensure that no one is misinformed prior to a transaction

be committed

(b) Explain concurrency control and database recovery in detail

Ans In a multiprogramming environment where multiple transactions can be executed

simultaneously it is highly important to control the concurrency of transactions We have

concurrency control protocols to ensure atomicity isolation and serializability of concurrent

transactions Concurrency control protocols can be broadly divided into two categories minus

Lock based protocols

Time stamp based protocols

Lock-based Protocols

Database systems equipped with lock-based protocols use a mechanism by which any

transaction cannot read or write data until it acquires an appropriate lock on it Locks are of two

kinds minus

Binary Locks minus A lock on a data item can be in two states it is either locked or

unlocked

Sharedexclusive minus This type of locking mechanism differentiates the locks based on

their uses If a lock is acquired on a data item to perform a write operation it is an

exclusive lock Allowing more than one transaction to write on the same data item

would lead the database into an inconsistent state Read locks are shared because no data

value is being changed

There are four types of lock protocols available minus

Simplistic Lock Protocol

Simplistic lock-based protocols allow transactions to obtain a lock on every object before a

write operation is performed Transactions may unlock the data item after completing the

lsquowritersquo operation

Pre-claiming Lock Protocol

Pre-claiming protocols evaluate their operations and create a list of data items on which they

need locks Before initiating an execution the transaction requests the system for all the locks it

needs beforehand If all the locks are granted the transaction executes and releases all the locks

when all its operations are over If all the locks are not granted the transaction rolls back and

waits until all the locks are granted

Two-Phase Locking 2PL

This locking protocol divides the execution phase of a transaction into three parts In the first

part when the transaction starts executing it seeks permission for the locks it requires The

second part is where the transaction acquires all the locks As soon as the transaction releases its

first lock the third phase starts In this phase the transaction cannot demand any new locks it

only releases the acquired locks

Two-phase locking has two phases one is growing where all the locks are being acquired by

the transaction and the second phase is shrinking where the locks held by the transaction are

being released

To claim an exclusive (write) lock a transaction must first acquire a shared (read) lock and then

upgrade it to an exclusive lock

Strict Two-Phase Locking

The first phase of Strict-2PL is same as 2PL After acquiring all the locks in the first phase the

transaction continues to execute normally But in contrast to 2PL Strict-2PL does not release a

lock after using it Strict-2PL holds all the locks until the commit point and releases all the locks

at a time

Strict-2PL does not have cascading abort as 2PL does

Timestamp-based Protocols

The most commonly used concurrency protocol is the timestamp based protocol This protocol

uses either system time or logical counter as a timestamp

Lock-based protocols manage the order between the conflicting pairs among transactions at the

time of execution whereas timestamp-based protocols start working as soon as a transaction is

created

Every transaction has a timestamp associated with it and the ordering is determined by the age

of the transaction A transaction created at 0002 clock time would be older than all other

transactions that come after it For example any transaction y entering the system at 0004 is

two seconds younger and the priority would be given to the older one

In addition every data item is given the latest read and write-timestamp This lets the system

know when the last lsquoread and writersquo operation was performed on the data item

(b) Explain database security mechanisms8

Database security covers and enforces security on all aspects and components of databases This

includes

Data stored in database

Database server

Database management system (DBMS)

Other database workflow applications

Database security is generally planned implemented and maintained by a database administrator

and or other information security professional

Some of the ways database security is analyzed and implemented include

Restricting unauthorized access and use by implementing strong and multifactor access

and data management controls

Loadstress testing and capacity testing of a database to ensure it does not crash in a

distributed denial of service (DDoS) attack or user overload

Physical security of the database server and backup equipment from theft and natural

disasters

Reviewing existing system for any known or unknown vulnerabilities and defining and

implementing a road mapplan to mitigate them

(d)Explain knowledge based database system in detail

The term knowledge-base was coined to distinguish this form of knowledge store from the

more common and widely used term database At the time (the 1970s) virtually all

large Management Information Systems stored their data in some type of hierarchical or

relational database At this point in the history of Information Technology the distinction

between a database and a knowledge base was clear and unambiguous

A database had the following properties

Flat data Data was usually represented in a tabular format with strings or numbers in each

Multiple users A conventional database needed to support more than one user or system

logged into the same data at the same time

Transactions An essential requirement for a database was to maintain integrity and

consistency among data accessed by concurrent users These are the so-

called ACID properties Atomicity Consistency Isolation and Durability

Large long-lived data A corporate database needed to support not just thousands but

hundreds of thousands or more rows of data Such a database usually needed to persist past

the specific uses of any individual program it needed to store data for years and decades

rather than for the life of a program

The first knowledge-based systems had data needs that were the opposite of these database

requirements An expert system requires structured data Not just tables with numbers and

strings but pointers to other objects that in turn have additional pointers The ideal representation

for a knowledge base is an object model (often called an ontology in artificial

intelligence literature) with classes subclasses and instances

Early expert systems also had little need for multiple users or the complexity that comes with

requiring transactional properties on data The data for the early expert systems was used to

arrive at a specific answer such as a medical diagnosis the design of a molecule or a response

to an emergency[1] Once the solution to the problem was known there was not a critical demand

to store large amounts of data back to a permanent memory store A more precise statement

would be that given the technologies available researchers compromised and did without these

capabilities because they realized they were beyond what could be expected and they could

develop useful solutions to non-trivial problems without them Even from the beginning the

more astute researchers realized the potential benefits of being able to store analyze and reuse

knowledge For example see the discussion of Corporate Memory in the earliest work of the

Knowledge-Based Software Assistant program by Cordell Green et al[2]

The volume requirements were also different for a knowledge-base compared to a conventional

database The knowledge-base needed to know facts about the world For example to represent

the statement that All humans are mortal A database typically could not represent this general

knowledge but instead would need to store information about thousands of tables that

represented information about specific humans Representing that all humans are mortal and

being able to reason about any given human that they are mortal is the work of a knowledge-

base Representing that George Mary Sam Jenna Mike and hundreds of thousands of other

customers are all humans with specific ages sex address etc is the work for a database[3][4]

As expert systems moved from being prototypes to systems deployed in corporate environments

the requirements for their data storage rapidly started to overlap with the standard database

requirements for multiple distributed users with support for transactions Initially the demand

could be seen in two different but competitive markets From the AI and Object-Oriented

communities object-oriented databases such as Versant emerged These were systems designed

from the ground up to have support for object-oriented capabilities but also to support standard

database services as well On the other hand the large database vendors such as Oracleadded

capabilities to their products that provided support for knowledge-base requirements such as

class-subclass relations and rules

Internet as a knowledge base[edit]

The next evolution for the term knowledge-base was the Internet With the rise of the Internet

documents hypertext and multimedia support were now critical for any corporate database It

was no longer enough to support large tables of data or relatively small objects that lived

primarily in computer memory Support for corporate web sites required persistence and

transactions for documents This created a whole new discipline known as Web Content

Management The other driver for document support was the rise of knowledge

management vendors such as Lotus Notes Knowledge Management actually predated the

Internet but with the Internet there was great synergy between the two areas Knowledge

management products adopted the term knowledge-base to describe their repositories but the

meaning had a subtle difference In the case of previous knowledge-based systems the

knowledge was primarily for the use of an automated system to reason about and draw

conclusions about the world With knowledge management products the knowledge was

primarily meant for humans for example to serve as a repository of manuals procedures

policies best practices reusable designs and code etc In both cases the distinctions between the

uses and kinds of systems were ill-defined As the technology scaled up it was rare to find a

system that could really be cleanly classified as knowledge-based in the sense of an expert

system that performed automated reasoning and knowledge-based in the sense of knowledge

management that provided knowledge in the form of documents and media that could be

leveraged by us humans

Department of MCA

Summer-17

Subject DBMS

QUE 1-

(A) Explain the following in the detail

(i) Concurrency control

AnsConcurrency control is the procedure in DBMS for managing simultaneous

operations without conflicting with each another Concurrent access is quite easy if all

users are just reading data There is no way they can interfere with one another Though for any practical database would have a mix of reading and WRITE operations and

hence the concurrency is a challenge

Concurrency control is used to address such conflicts which mostly occur with a multi-

user system It helps you to make sure that database transactions are performed

concurrently without violating the data integrity of respective databases

Therefore concurrency control is a most important element for the proper functioning of a system where two or multiple database transactions that require access to the same data

are executed simultaneously

(ii) Atomicity property

In database systems atomicity (ˌaeligtəˈmɪsəti from Ancient Greek ἄτομος translit aacutetomos lit undividable) is one of

the ACID (Atomicity Consistency Isolation Durability) transaction properties An atomic

transaction is an indivisible and irreducible series of database operations such that either all occur or nothing occurs[1] A guarantee of atomicity prevents updates to the database

occurring only partially which can cause greater problems than rejecting the whole series

outright As a consequence the transaction cannot be observed to be in progress by another

database client At one moment in time it has not yet happened and at the next it has already

occurred in whole (or nothing happened if the transaction was cancelled in progress)

An example of an atomic transaction is a monetary transfer from bank account A to account B It consists of two operations withdrawing the money from account A and saving it to account B

Performing these operations in an atomic transaction ensures that the database remains in a consistent

state that is money is neither lost nor created if either of those two operations fai

(B) Give the level architecture proposal for DBMS

Ans Objective of three level architecture proposal for DBMS

All users should be able to access same data

A users view is immune to changes made in other views

Users should not need to know physical database storage details

DBA should be able to change database storage structures without affecting the users views

Internal structure of database should be unaffected by changes to physical aspects of storage

DBA should be able to change conceptual structure of database without affecting all users

The architecture of a database management system can be broadly divided into three levels

a External level

b Conceptual level

c Internal level

Above three points are explain in detail given bellow-

External Level

This is the highest level one that is closest to the user It is also called the user view The user

view is different from the way data is stored in the database This view describes only a part of

the actual database Because each user is not concerned with the entire database only the part that

is relevant to the user is visible For example end users and application programmers get

different external views

Each user uses a language to carry out database operations The application programmer

uses either a conventional third-generation language such as COBOL or C or a fourth-generation

language specific to the DBMS such as visual FoxPro or MS Access

The end user uses a query language to access data from the database A query language is a

combination of three subordinate language

Data Definition Language (DDL)

Data Manipulation Language (DML)

Data Control Language (DCL)

The data definition language defines and declares the database object while the data

manipulation language performs operations on these objects The data control language is used to

control the userrsquos access to database objects

Conceptual Level - This level comes between the external and the internal levels The

conceptual level represents the entire database as a whole and is used by the DBA This level is

the view of the data ldquoas it really isrdquo The userrsquos view of the data is constrained by the language

that they are using At the conceptual level the data is viewed without any of these constraints

Internal Level - This level deals with the physical storage of data and is the lowest level of

the architecture The internal level describes the physical sequence of the stored records

So that objective of three level of architecture proposal for DBMS are suitable explain in

(C) Describe the structure of DBMS

Ans DBMS (Database Management System) acts as an interface between the user and the

database The user requests the DBMS to perform various operations (insert delete update and

retrieval) on the database The components of DBMS perform these requested operations on the

database and provide necessary data to the users

Fig Structure of Database Management System

Components of DBMS -

DDL Compiler

Data Manager

File Manager

Disk Manager

Query Processor

Telecommunication System

Data Files

Data Dictionary

Access Aids

1 DDL Compiler - Data Description Language compiler processes schema definitions specified

in the DDL It includes metadata information such as the name of the files data items storage

details of each file mapping information and constraints etc

2 DML Compiler and Query optimizer - The DML commands such as insert update delete

retrieve from the application program are sent to the DML compiler for compilation into object

code for database access The object code is then optimized in the best way to execute a query by

the query optimizer and then send to the data manager

3 Data Manager - The Data Manager is the central software component of the DBMS also knows

as Database Control System

The Main Functions Of Data Manager Are ndash

Convert operations in users Queries coming from the application programs or combination of

DML Compiler and Query optimizer which is known as Query Processor from users logical view

to physical file system

Controls DBMS information access that is stored on disk

It also controls handling buffers in main memory

It also enforces constraints to maintain consistency and integrity of the data

It also synchronizes the simultaneous operations performed by the concurrent users

It also controls the backup and recovery operations

4 Data Dictionary - Data Dictionary is a repository of description of data in the database It

contains information about

1 Data - names of the tables names of attributes of each table length of attributes and number of rows in each table

2 Relationships between database transactions and data items referenced by them

which is useful in determining which transactions are affected when certain data definitions are changed

3 Constraints on data ie range of values permitted

4 Detailed information on physical database design such as storage structure

access paths files and record sizes 5 Access Authorization - is the Description of database users their responsibilities

and their access rights

6 Usage statistics such as frequency of query and transactions 7 Data dictionary is used to actually control the data integrity database operation

and accuracy It may be used as a important part of the DBMS

8 Importance of Data Dictionary -

9 Data Dictionary is necessary in the databases due to following reasons 10 It improves the control of DBA over the information system and users

understanding of use of the system

11 bull It helps in document ting the database design process by storing documentation of the result of every design phase and design decisions

5 Data Files - It contains the data portion of the database

6 Compiled DML - The DML complier converts the high level Queries into low level file access

commands known as compiled DML

7 End Users The users of the database system can be classified in the following groups

depending on their degree of expertise or the mode of their interactions with the DBMS

1 Naiumlve users

2 Online Users

3 Application Programmers

4 Database administrator

i) Naiumlve User Naive users who need not have aware of the present of the database system or any other system A user of an automatic teller falls under this category The user is instructed through each step of a transaction he or she responds by pressing a coded key or entering a numeric value The operations that can be performed by this calls of users are very limited and affect a precise portion of the database in case of the user of the automatic teller machine only one or more of her or his own accounts Other such naive users are where the type and range of response is always indicated to the user Thus a very competent database designer could be allowed to use a particular database system only as a naive user

ii) Online users There are users who may communicate with the database directly via an online terminal or indirectly via a user interface and application program These users are aware of the presence of the database system and may have acquired a certain amount of expertise in the limited interaction they are permitted with the database through the intermediate application program The more sophisticated of these users may also use a data manipulation language to manipulate the database directly On-line users can also be naive users requiring help such as menus

iii) Application Users Professional programmers who are responsible for developing application programs or user interfaces utilized by the naive and online users fall into this category The application programs could be written in a general purpose programming language such as Assembler C COBOL FORTRAN PASCAL or PLI and include the commands required to manipulate the database

iv) Database Administrator Centralized control of the database is exerted by a person or group of persons under the supervision of a high level administrator This person or group is referred to as the database administrator (DBA) They are users who are the most familiar with the database and are responsible for creating modifying and maintaining its three levels

The DBA us the custodian of the data and controls the database structure The DBA administers the three levels of the database and in consultation with the overall user community sets up the definition of the global view or conceptual level of the database The DBA further specifies the external view of the various users and applications and is responsible for definition and implementation of the internal level including the storage structure and access methods to be used for the optimum performance of the DBMS

(D) What are the advantage o f using a DBMS over the conventional

fole processing system

Ans A database is a collection of non-redundant data which can be shared by different application

systems stresses the importance of multiple applications data sharing the spatial database

becomes a common resource for an agency implies separation of physical storage from use of the

data by an application program ie programdata independence the user or programmer or

application specialist need not know the details of how the data are stored such details are

transparent to the user changes can be made to data without affecting other components of the

system eg change format of data items (real to integer arithmetic operations) change file

structure (reorganize data internally or change mode of access) relocate from one device to

another eg from optical to magnetic storage from tape to disk

Advantages

1 Control of data redundancy

2 Data consistency

3 More information from the same amount of data 4 Sharing of data

5 Improved data integrity

6 Improved security 7 Enforcement of standards 8 Economy of scale

1 Controlling Data Redundancy - In the conventional file processing system

Every user group maintains its own files for handling its data files This may lead to

bull Duplication of same data in different files

bull Wastage of storage space since duplicated data is stored

bull Errors may be generated due to pupation of the same data in different files

bull Time in entering data again and again is wasted

bull Computer Resources are needlessly used

bull It is very difficult to combine information

2 Elimination of Inconsistency - In the file processing system information is duplicated

throughout the system So changes made in one file may be necessary be carried over to

another file This may lead to inconsistent data So we need to remove this duplication of

data in multiple file to eliminate inconsistency

3 Better service to the users - A DBMS is often used to provide better services to the users In

conventional system availability of information is often poor since it normally difficult to

obtain information that the existing systems were not designed for Once several conventional

systems are combined to form one centralized database the availability of information and its

update ness is likely to improve since the data can now be shared and DBMS makes it easy to

respond to anticipated information requests

Centralizing the data in the database also means that user can obtain new and combined

information easily that would have been impossible to obtain otherwise Also use of DBMS

should allow users that dont know programming to interact with the data more easily unlike

file processing system where the programmer may need to write new programs to meet every

new demand

4 Flexibility of the System is improved - Since changes are often necessary to the contents of

the data stored in any system these changes are made more easily in a centralized database

than in a conventional system Applications programs need not to be changed on changing the

data in the database

5 Integrity can be improved - Since data of the organization using database approach is

centralized and would be used by a number of users at a time It is essential to enforce

integrity-constraints

In the conventional systems because the data is duplicated in multiple files so updating or

changes may sometimes lead to entry of incorrect data in some files where it exists

6 Standards can be enforced - Since all access to the database must be through DBMS so

standards are easier to enforce Standards may relate to the naming of data format of data

structure of the data etc Standardizing stored data formats is usually desirable for the purpose

of data interchange or migration between systems

7 Security can be improved - In conventional systems applications are developed in an

adhoctemporary manner Often different system of an organization would access different

components of the operational data in such an environment enforcing security can be quiet

difficult Setting up of a database makes it easier to enforce security restrictions since data is

now centralized It is easier to control who has access to what parts of the database Different

checks can be established for each type of access (retrieve modify delete etc) to each piece

of information in the database

8 Organizations requirement can be identified - All organizations have sections and

departments and each of these units often consider the work of their unit as the most

important and therefore consider their need as the most important Once a database has been

setup with centralized control it will be necessary to identify organizations requirement and

to balance the needs of the competating units So it may become necessary to ignore some

requests for information if they conflict with higher priority need of the organization

It is the responsibility of the DBA (Database Administrator) to structure the database system

to provide the overall service that is best for an organization

9 Overall cost of developing and maintaining systems is lower - It is much easier to respond to

unanticipated requests when data is centralized in a database than when it is stored in a

conventional file system Although the initial cost of setting up of a database can be large

one normal expects the overall cost of setting up of a database developing and maintaining

application programs to be far lower than for similar service using conventional systems

Since the productivity of programmers can be higher in using non-procedural languages that

have been developed with DBMS than using procedural languages

10 Data Model must be developed - Perhaps the most important advantage of setting up of

database system is the requirement that an overall data model for an organization be build In

conventional systems it is more likely that files will be designed as per need of particular

applications demand The overall view is often not considered Building an overall view of an

organizations data is usual cost effective in the long terms

11 Provides backup and Recovery - Centralizing a database provides the schemes such as

recovery and backups from the failures including disk crash power failures software errors

which may help the database to recover from the inconsistent state to the state that existed

prior to the occurrence of the failure though methods are very complex

QUE2- EITHER

(A) Explain ER model with suitable example

This data model allows us to describe how data is used in a real-world enterprise an

iterative process A team-oriented process with all business managers (or designates)

involved should validate with a ldquobottom-uprdquo approach Has three primary components entity

relationship attributes

Entity An entity may be defined as a thing which is recognized as being capable of an independent

existence and which can be uniquely identified An entity is an abstraction from the complexities of some

domain When we speak of an entity we normally speak of some aspect of the real world which can be

distinguished from other aspects of the real world An entity may be a physical object such as a house or a car an event such as a house sale or a car service or a concept such as a customer transaction or order

An entity-type is a category An entity strictly speaking is an instance of a given entity-type There are

usually many instances of an entity-type Because the term entity-type is somewhat cumbersome most

people tend to use the term entity as a synonym for this term

Attributes It is a Characteristic of an entity Studentrsquos (entity) attributes student ID student name

address etc

Derived attributes

Relationship Relationship captures how two or more entities are related to one another Relationships can

be thought of as verbs linking two or more nouns Examples an owns relationship between a company and a computer a supervises relationship between an employee and a department a performs relationship

between an artist and a song a proved relationship between a mathematician and a theorem Relationships

are represented as diamonds connected by lines to each of the entities in the relationship Types of

relationships are as follows

One to many 1lt------- M Many to one M------1

Example

Given Entity Customer with attributes customer_id(primary key) name( first_name last_name

middle_name) phone_number date_of_birth address(citystatezip_codestreet)

Street(Street_namestreet_numberapartment_number)

--------------------------------------------------------------------------------------------------------

(c)Illustrate the construction of secondrery key retrieval with a suitable example

Ans In sequential File Index Sequential file and Direct File we have considered the retrieval and

update of data based on primary key

(i)We can retrieve and update data based on secondary key called as secondary key retrieval

(ii)In secondary key retrieval there are multiple records satisfying a given key value

(iii)For eg if we search a student file based on the attribute ldquostud_namerdquo we can get the set of

records which satisfy the given value

(D)Define the following terms -

(i) Specialization

(ii) Association

(iii) Relationship

(iv) Aggregation QUE 3-EITHER

(A) Let R(ABC) and Let r1 and r2 both be relations on schema R given the equivalent QBE

expression for each of the following queries -

(i) Y1 u y2

(ii) Y1 u y2

(iii) R1-r2

QUE4- EITHER

(A) What is join dependency Discuss 5NF

Ans Join Dependencies (JD)

A join dependency can be described as follows

1 If a table can be decomposed into three or more smaller tables it must be capable of being joined

again on common keys to form the original table

A table is in fifth normal form (5NF) or Projection-Join Normal Form (PJNF) if it is in 4NF and it cannot

have a lossless decomposition into any number of smaller tables

Another way of expressing this is and each join dependency is a consequence of the candidate keys

It can also be expressed as there are no pair wise cyclical dependencies in the primary key

comprised of three or more attributes

Anomalies can occur in relations in 4NF if the primary key has three or more fields

5NF is based on the concept of join dependence - if a relation cannot be decomposed any further then it is in 5NF

Pair wise cyclical dependency means that

You always need to know two values (pair wise)

For any one you must know the other two (cyclical)

Example Buying(buyer vendor item)

This is used to track buyers what they buy and from whom they buy

Take the following sample data

buyer vendor Item

Sally Liz Claiborne Blouses

Mary Liz Claiborne Blouses

Sally Jordach Jeans

Mary Jordach Jeans

Sally Jordach Sneakers

The question is what do you do if Claiborne starts to sell Jeans How many records must you create to

record this fact

The problem is there are pairwise cyclical dependencies in the primary key That is in order to determine

the item you must know the buyer and vendor and to determine the vendor you must know the buyer and

the item and finally to know the buyer you must know the vendor and the item The solution is to break

this one table into three tables Buyer-Vendor Buyer-Item and Vendor-Item

(B) Explain the architecture of an IMS System

Ans Information Management system (IMS) is an IBM program product that is designed to support

both batch and online application programs

Host Language

PCB PCB

DBD DBD DBD DBD DBD DBD hellip

Control

program

PCB PCB

PSB-B PSB-A

Application A Application B

Conceptual View

The conceptual view consists of collection of physical database The ldquophysicalrdquo is somewhat

misleading in this context since the user does not see such a database exactly as it is stored indeed

IMS provides a fairely high degree of insulation of the user from the storage structure Each physical

database is defined by a database description (DBD) The mapping of the physical database to storage

is also DBDrsquos corresponds to the conceptual schema plus the associated conceptualinternal mapping

definition

DBD (Database Description) Each physical databse is defined together with its mapping to

storage by a databse description (DBD) The source form of the DBD is written using a special

System370 Assembler Language macro statements Once written the DBD is assembled and the

object form is stored in a system library from which it may be extracted when required by the IMS

control program

All names of DBDrsquos in IMS are limited to a maximum length of eight characters

Example

1 DBD NAMEEDUCPDBD

2 SEGM NAME=COURSEBYTES=256

3 FILED NAME=(COURSESEQ)BYTES=3START=1 4 FIELD NAME=TITLE BYTES=33START=4

5 FIELD NAME=DESCRIPNBYETS=220START=37

6 SEGM NAME=PREREQPARENT=COURSEBYTES=36 7 FILED NAME=(COURSESEQ)BYTES=3START=1

8 FIELD NAME=TITLE BYTES=33START=4

9 SEGM NAME=OFFERINGPARENT=COURSEBYTES=20 10 FILED NAME=(DATESEQM)BYTES=6START=1

11 FIELD NAME=LOCATION BYTES=12START=7

12 FIELD NAME=FORMATBYETS=2START=19 13 SEGM NAME=TEACHERPARENT=OFFERINGBYTES=24

14 FIELD NAME=(EMPSEQ) BYTES=6START=1

15 FIELD NAME=NAMEBYETS=18START=7

16 SEGM NAME=STUDENTPARENT=OFFERINGBYTES=25

17 FILED NAME=(EMPSEQ)BYTES=6START=1

18 FIELD NAME=NAME BYTES=18START=7

19 FIELD NAME=GRADEBYTES=1START=25

External View

The user does not operate directly at the physical database level but rather on an ldquoexternal viewrdquo of

the data A particular userrsquos external view consists of a collection of ldquological databasesrdquo where each

logical database is a subset of the corresponding physical database Each logical database is defined

by means of a program communication block (PCB) The set of all PCBrsquos for one user corresponding

to the external schema plus the associated mapping definition is called program specification block

PCB Program Communication BLOCK Each logical Database is defined by a program

communication block (PCB) The PCB includes a specification of the mapping between the LDB and

the corresponding PDB

PSB Program Specification BLOCK The set of all PCBrsquos for a given user forms that userrsquos

program specification block (PSB)

Example

1 PCB TYPE=DBDBNAME=EDUCPDBDKEYLEN=15

2 SENSEG NAME=COURSEPROCOPT=G 3 SENSEG NAME=OFFERINGPARENT=COURSEPROCOPT=G

4 SENSEG NAME=STUDENTPARENT=OFFERINGPROCOPT=G

PROCOPT The PROCOPT entry specifies the types of operation that the user will be permitting to

perform on this segment In this example the entry is G (ldquogetrdquo) indicating retrieval only Other

possible values are I(ldquoinsertrdquo) R(ldquoreplacerdquo) and D(ldquodeleterdquo)

Internal View

The users are ordinary application programmers using a host language from which the IMS data

manipulation language DLI- ldquoData LanguageIrdquo- may be invoked by subroutine call End-users are

supported via user-written on-line application programs IMS does not provide an integrated query

language

(C) Explain the following -

(i) Functional dependency

Functional Dependency The value of one attribute (the determinant)

determines the value of another attribute

Candidate Key A possible key

Each non-key field is functionally dependent on every candidate key

No attribute in the key can be deleted without destroying the property of

unique identification

Main characteristics of functional dependencies used in

normalization

have a 11 relationship between attribute(s) on left and right-hand side of

a dependency hold for all time are nontrivial

Complete set of functional dependencies for a given relation can be very

Important to find an approach that can reduce set to a manageable size

Need to identify set of functional dependencies (X) for a relation that is

smaller than complete set of functional dependencies (Y) for that relation

and has property that every functional dependency in Y is implied by

functional dependencies in X

(D) Explain 4 NF with examples

Ans Normalization The process of decomposing unsatisfactory ldquobadrdquo relations by breaking up

their attributes into smaller relationsThe normal form of a relation refers to the highest normal form

condition that a relation meets and indicates the degree to which it has been normalized

Normalization is carried out in practice so that the resulting designs are of high quality and meet the

desirable properties

Normalization in industry pays particular attention to normalization up to 3NF BCNF or 4NF

The database designers need not normalize to the highest possible normal form

Formal technique for analyzing a relation based on its primary key and functional dependencies

between its attributes

Often executed as a series of steps Each step corresponds to a specific normal form which has

known properties

As normalization proceeds relations become progressively more restricted (stronger) in format and

also less vulnerable to update anomalies

7 NF2 non-first normal form 8 1NF R is in 1NF iff all domain values are atomic2

9 2NF R is in 2 NF iff R is in 1NF and every nonkey attribute is fully dependent on the key

10 3NF R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the

key 11 BCNF R is in BCNF iff every determinant is a candidate key

12 Determinant an attribute on which some other attribute is fully functionally dependent Fourth Normal Form

Fourth normal form (or 4NF) requires that there are no non-trivial multi-valued dependencies of

attribute sets on something other than a superset of a candidate key A table is said to be in 4NF if and

only if it is in the BCNF and multi-valued dependencies are functional dependencies The 4NF

removes unwanted data structures multi-valued dependencies

The relation must also be in BCNF Fourth normal form differs from BCNF only in that it uses

Multivalued dependencies

Either

(A) What are object oriented database systems What are its features

Ans Object databases are a niche field within the broader DBMS market dominated by relational

database management systems (RDBMS) Object databases have been considered since the early 1980s

and 1990s but they have made little impact on mainstream commercial data proc

Features of object oriented database systems

Most object databases also offer some kind of query language allowing objects to be found by a more declarative programming approach It is in the area of object query languages and the integration of the

query and navigational interfaces that the biggest differences between products are found An attempt at

standardization was made by the ODMG with the Object Query Language OQL

Access to data can be faster because joins are often not needed (as in a tabular implementation of a relational database) This is because an object can be retrieved directly without a search by following

pointers (It could however be argued that joining is a higher-level abstraction of pointer following)

Another area of variation between products is in the way that the schema of a database is defined A

general characteristic however is that the programming language and the database schema use the same

type definitions

Multimedia applications are facilitated because the class methods associated with the data are responsible

for its correct interpretation

Many object databases for example VOSS offer support for versioning An object can be viewed as the

set of all its versions Also object versions can be treated as objects in their own right Some object

databases also provide systematic support for triggers and constraints which are the basis of active

databases

The efficiency of such a database is also greatly improved in areas which demand massive amounts of

data about one item For example a banking institution could get the users account information and

provide them efficiently with extensive information such as transactions account information entries etc

C) How database recovery it done Discuss its different types

Ans SQL Server database recovery models give you backup-and-restore flexibility The model used will determine how much time and space your backups will take and how great your risk of data loss will

be when a breakdown occurs

System breakdowns happen all the time even to the best configured systems This is why you have to

explore the options available in order to prepare for the worst

SQL server database recovery can be easier achieved if you are running on at least the SQL server 2000

It has a built in feature known as the database recovery model that controls the following

Both the speed and size of your transaction log backups The degree to which you might be at risk of losing committed transactions in the event of

media failure

Models

There are three types of database recovery models available

Full Recovery Bulk Logged Recovery

Simple Recovery

Full Recovery

This is your best guarantee for full data recovery The SQL Server fully logs all operations so every row

inserted through a bulk copy program (bcp) or BULK INSERT operation is written in its entirety to the

transaction log When data files are lost because of media failure the transaction log can be backed up

Database restoration up to any specified time can be achieved after media failure for a database

file has occurred If your log file is available after the failure you can restore up to the last

transaction committed Log Marks feature allows you to place reference points in the transaction log that allow you to

recover a log mark

Logs CREATE INDEX operations Recovery from a transaction log backup that includes index

creations is done at a faster pace because the index does not have to be rebuilt

Bulk Logged Recovery Model

This model allows for recovery in case of media failure and gives you the best performance using the

least log space for certain bulk operations including BULK INSERT bcp CREATE INDEX

WRITETEXT and UPDATETEXT

Simple Recovery Model It allows for the fastest bulk operations and the simplest backup-and-restore strategy Under this model

SQL Server truncates the transaction log at regular intervals removing committed transactions

(d)Describe Deadlocks a Distributed System