dbms 6 er modeling

91
©Silberschatz, Korth and Sudarsha 2.1 Database System Concepts Entity-Relationship Model

Upload: rahul

Post on 18-Nov-2014

760 views

Category:

Documents


3 download

TRANSCRIPT

©Silberschatz, Korth and Sudarshan2.1Database System Concepts

Entity-Relationship Model

©Silberschatz, Korth and Sudarshan2.2Database System Concepts

HOW TO START MODELING

ENTERPRISE MODEL

BUSINESS PROCESS MODEL

REQUIREMENT OF INFORMATION

RELATIONSHIP AMONG INFORMATION

DECIDE ON THE TYPE OF MODEL

©Silberschatz, Korth and Sudarshan2.3Database System Concepts

HOW TO CONSTRUCT A HOUSE

MOBILISE THE FINANCES

SELECT A PLACE/SITE & BUY IT

REQUIREMENT OF MATERIALS AS PER PLAN

HIRING OF LABOUR FORCE

COMPLETE CONSTRUCTION & OCCUPY

©Silberschatz, Korth and Sudarshan2.4Database System Concepts

PUBLISHING A BOOK

DECIDE THE SUBJECT ON WHICH YOU ARE GOING TO WRITE A BOOK AND PUBLISH.

COLLECT ALL RELEVANT DATA FROM VARIOUS SOURCES ORGANISE THE SUBJECT CHAPTERWISE & SEQUENCE

THE CHAPTERS. DEVELOP THE SUBJECT IN EACH CHAPTER ADD ILLUSTRATIONS, EXAMPLES, QUOTES SUMMARY, REVIEW QUESTIONS, SELF STUDY WORK,

CASE STUDY AT THE END OF EACH CHAPTER ONCE ALL THE CHAPTERS ARE READY, INDEX/CONTENT

PREFACE, APPENDICES, BIBLIOGRAPHY, REFERENCES NAME INDEX, SUBJECT INDEX PUBLISHING AGENCY, PRICING, MARKETING

©Silberschatz, Korth and Sudarshan2.5Database System Concepts

WHAT IS DATA MODELING

It is a conceptual framework which defines the logical relationship among the data elements needed to support a basic business process or other activities.

It is the underlying structure of a database. It is a map or diagram that represents entities and

their relationships A collection of tools for describing

data data relationshipsdata semanticsdata constraints

©Silberschatz, Korth and Sudarshan2.6Database System Concepts

What is database all about

Database model is defined to consist a combination of the following components:

A collection of data object types, which form the basic building blocks for any database that confirms to the model.

A collection of general integrity rules, which constrain the set of occurrences of these object types that can largely appear in any such database.

A collection of operators, which can be applied to such object occurrences for retrieval and other purposes.

©Silberschatz, Korth and Sudarshan2.7Database System Concepts

TYPES OF MODELS AVAILABLE

OBJECT BASED LOGICAL MODELS.

RECORD BASED LOGICAL MODELS.

PHYSICAL MODELS.

©Silberschatz, Korth and Sudarshan2.8Database System Concepts

TYPES OF MODELS AVAILABLE

OBJECT BASED LOGICAL MODELS. Entity – relationship (ER) model Object oriented model Semantic data model Functional data model

RECORD BASED LOGICAL MODELS.Relational modelNetwork modelHierarchical model

PHYSICAL MODELS.Unifying modelFrame-memory model

©Silberschatz, Korth and Sudarshan2.9Database System Concepts

What is this Object based Logical Model

Object based logical models are used in describing data at the logical and view levels.

They are characterised by the fact that they provide fairly flexible structuring capabilities.

They allow data constraints to be specified explicitly.

©Silberschatz, Korth and Sudarshan2.10Database System Concepts

What is this ER model

Used at the design stage of the database.

It is like the flow chart in computer programming.

Certain symbols are used in flow charts, similarly here in ER modeling too, symbols are used.

Entity is a physical thing that exists, live or otherwise that is distinguishable from other objects. It can be explained by some of its characteristics called attributes.

©Silberschatz, Korth and Sudarshan2.11Database System Concepts

Entity Sets

A database can be modeled as:

a collection of entities,

relationship among entities. An entity is an object that exists and is distinguishable from

other objects.

Example: specific person, company, event, plant Entities have attributes

Example: people have names and addresses An entity set is a set of entities of the same type that share the

same properties.

Example: set of all persons, companies, trees, holidays

©Silberschatz, Korth and Sudarshan2.12Database System Concepts

Entity Relationship Model

Entities (objects) E.g. customers, accounts, bank branch

Relationships between entities E.g. Account A-101 is held by customer Sridhar Relationship-set depositor associates customers with

accounts

Widely used for database design

Database design in E-R model usually converted to design in the relational model (coming up next) which is used for storage and processing

©Silberschatz, Korth and Sudarshan2.13Database System Concepts

Entity Sets customer and loan

customer-id customer- customer- customer- loan- amount name street city number

ANJU

PATTA

MARY

KUNDU

NISHU

MADAN

MUKHU

TIRUVALLA

CHE’CHERY

ANAIYUR

KESTOPUR

RANCHI

NEW DELHI

KOLKATA

©Silberschatz, Korth and Sudarshan2.14Database System Concepts

Attributes

An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.

Domain – the set of permitted values for each attribute Attribute types:

Simple and composite attributes. Single-valued and multi-valued attributes

E.g. multivalued attribute: phone-numbers Derived attributes

Can be computed from other attributes E.g. age, given date of birth

Example:

customer = (customer-id, customer-name, customer-street, customer-city)

loan = (loan-number, amount)

©Silberschatz, Korth and Sudarshan2.15Database System Concepts

Composite Attributes

©Silberschatz, Korth and Sudarshan2.16Database System Concepts

Relationship Sets

A relationship is an association among several entities

Example:Sridhar depositor A-102customer entity relationship set account entity

A relationship set is a mathematical relation among n 2 entities, each taken from entity sets

{(e1, e2, … en) | e1 E1, e2 E2, …, en En}

where (e1, e2, …, en) is a relationship

Example:

(Sridhar, A-102) depositor

©Silberschatz, Korth and Sudarshan2.17Database System Concepts

Relationship Set borrower

customer-id customer- customer- customer- loan- amount name street city number

ANJU

PATTA

MARY

KUNDU

NISHU

MADAN

MUKHU

TIRUVALLA

CHE’CHERY

ANAIYUR

KESTOPUR

RANCHI

NEW DELHI

KOLKATA

©Silberschatz, Korth and Sudarshan2.18Database System Concepts

Relationship Sets (Cont.) An attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets

customer and account may have the attribute access-date

©Silberschatz, Korth and Sudarshan2.19Database System Concepts

Degree of a Relationship Set

Refers to number of entity sets that participate in a relationship set. Relationship sets that involve two entity sets are binary (or degree two). Generally,

most relationship sets in a database system are binary. Relationship sets may involve more than two entity sets. Binary (Two), Ternary

(Three), Quaternary (Four), Quinary (Five), Senary (Six) and so on…..

Relationships between more than two entity sets are rare. Most relationships are binary. (More on this later.)

E.g. Suppose employees of a bank may have jobs (responsibilities) at multiple branches, with different jobs at different branches. Then there is a ternary relationship set between entity sets employee, job and branch

©Silberschatz, Korth and Sudarshan2.20Database System Concepts

Mapping Cardinalities

Express the number of entities to which another entity can be associated via a relationship set.

Most useful in describing binary relationship sets. For a binary relationship set the mapping cardinality must be

one of the following types: One to one

One to many

Many to one

Many to many

©Silberschatz, Korth and Sudarshan2.21Database System Concepts

Mapping Cardinalities

One to one One to many

Note: Some elements in A and B may not be mapped to any elements in the other set

©Silberschatz, Korth and Sudarshan2.22Database System Concepts

Mapping Cardinalities

Many to one Many to many

Note: Some elements in A and B may not be mapped to any elements in the other set

©Silberschatz, Korth and Sudarshan2.23Database System Concepts

Mapping Cardinalities affect ER Design

Can make access-date an attribute of account, instead of a relationship attribute, if each account can have only one customer I.e., the relationship from account to customer is many to one,

or equivalently, customer to account is one to many

©Silberschatz, Korth and Sudarshan2.24Database System Concepts

E-R Diagrams

Rectangles represent entity sets. Diamonds represent relationship sets. Lines link attributes to entity sets and entity sets to relationship sets. Ellipses represent attributes

Double ellipses represent multivalued attributes. Dashed ellipses denote derived attributes.

Underline indicates primary key attributes (will study later)

©Silberschatz, Korth and Sudarshan2.25Database System Concepts

E-R Diagram With Composite, Multivalued, and Derived Attributes

©Silberschatz, Korth and Sudarshan2.26Database System Concepts

Relationship Sets with Attributes

©Silberschatz, Korth and Sudarshan2.27Database System Concepts

Roles

Entity sets of a relationship need not be distinct The labels “manager” and “worker” are called roles; they specify how

employee entities interact via the works-for relationship set. Roles are indicated in E-R diagrams by labeling the lines that connect

diamonds to rectangles. Role labels are optional, and are used to clarify semantics of the

relationship

©Silberschatz, Korth and Sudarshan2.28Database System Concepts

Cardinality Constraints

We express cardinality constraints by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.

E.g.: One-to-one relationship: A customer is associated with at most one loan via the relationship

borrower

A loan is associated with at most one customer via borrower

©Silberschatz, Korth and Sudarshan2.29Database System Concepts

One-To-Many Relationship

In the one-to-many relationship a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower

©Silberschatz, Korth and Sudarshan2.30Database System Concepts

Many-To-One Relationships

In a many-to-one relationship a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower

©Silberschatz, Korth and Sudarshan2.31Database System Concepts

Many-To-Many Relationship

A customer is associated with several (possibly 0) loans via borrower

A loan is associated with several (possibly 0) customers via borrower

©Silberschatz, Korth and Sudarshan2.32Database System Concepts

Participation of an Entity Set in a Relationship Set

Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set E.g. participation of loan in borrower is total

every loan must have a customer associated to it via borrower Partial participation: some entities may not participate in any

relationship in the relationship set E.g. participation of customer in borrower is partial

©Silberschatz, Korth and Sudarshan2.33Database System Concepts

Alternative Notation for Cardinality Limits

Cardinality limits can also express participation constraints

©Silberschatz, Korth and Sudarshan2.34Database System Concepts

Keys

A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.

A candidate key of an entity set is a minimal super key Customer-id is candidate key of customer

account-number is candidate key of account

Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.

©Silberschatz, Korth and Sudarshan2.35Database System Concepts

Keys for Relationship Sets

The combination of primary keys of the participating entity sets forms a super key of a relationship set. (customer-id, account-number) is the super key of depositor

NOTE: this means a pair of entity sets can have at most one relationship in a particular relationship set. E.g. if we wish to track all access-dates to each account by each

customer, we cannot assume a relationship for each access. We can use a multivalued attribute though

Must consider the mapping cardinality of the relationship set when deciding the what are the candidate keys

Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key

©Silberschatz, Korth and Sudarshan2.36Database System Concepts

E-R Diagram with a Ternary Relationship

©Silberschatz, Korth and Sudarshan2.37Database System Concepts

Cardinality Constraints on Ternary Relationship

We allow at most one arrow out of a ternary (or greater degree) relationship to indicate a cardinality constraint

E.g. an arrow from works-on to job indicates each employee works on at most one job at any branch.

If there is more than one arrow, there are two ways of defining the meaning. E.g a ternary relationship R between A, B and C with arrows to B and C

could mean

1. each A entity is associated with a unique entity from B and C or

2. each pair of entities from (A, B) is associated with a unique C entity, and each pair (A, C) is associated with a unique B

Each alternative has been used in different formalisms

To avoid confusion we outlaw more than one arrow

©Silberschatz, Korth and Sudarshan2.38Database System Concepts

Binary Vs. Non-Binary Relationships

Some relationships that appear to be non-binary may be better represented using binary relationships E.g. A ternary relationship parents, relating a child to his/her father and

mother, is best replaced by two binary relationships, father and mother Using two binary relationships allows partial information (e.g. only

mother being know)

But there are some relationships that are naturally non-binary E.g. works-on

©Silberschatz, Korth and Sudarshan2.39Database System Concepts

Converting Non-Binary Relationships to Binary Form

In general, any non-binary relationship can be represented using binary relationships by creating an artificial entity set. Replace R between entity sets A, B and C by an entity set E, and three

relationship sets:

1. RA, relating E and A 2.RB, relating E and B

3. RC, relating E and C Create a special identifying attribute for E Add any attributes of R to E For each relationship (ai , bi , ci) in R, create

1. a new entity ei in the entity set E 2. add (ei , ai ) to RA

3. add (ei , bi ) to RB 4. add (ei , ci ) to RC

©Silberschatz, Korth and Sudarshan2.40Database System Concepts

Converting Non-Binary Relationships (Cont.)

Also need to translate constraints Translating all constraints may not be possible

There may be instances in the translated schema thatcannot correspond to any instance of R

Exercise: add constraints to the relationships RA, RB and RC to ensure that a newly created entity corresponds to exactly one entity in each of entity sets A, B and C

We can avoid creating an identifying attribute by making E a weak entity set (described shortly) identified by the three relationship sets

©Silberschatz, Korth and Sudarshan2.41Database System Concepts

Design Issues

Use of entity sets vs. attributesChoice mainly depends on the structure of the enterprise being modeled, and on the semantics associated with the attribute in question.

Use of entity sets vs. relationship setsPossible guideline is to designate a relationship set to describe an action that occurs between entities

Binary versus n-ary relationship setsAlthough it is possible to replace any nonbinary (n-ary, for n > 2) relationship set by a number of distinct binary relationship sets, a n-ary relationship set shows more clearly that several entities participate in a single relationship.

Placement of relationship attributes

How about doing an ER design interactively on the board?

Suggest an application to be modeled.

©Silberschatz, Korth and Sudarshan2.43Database System Concepts

E-R Diagram for a Banking Enterprise

©Silberschatz, Korth and Sudarshan2.44Database System Concepts

Summary of Symbols Used in E-R Notation

©Silberschatz, Korth and Sudarshan2.45Database System Concepts

Summary of Symbols (Cont.)

©Silberschatz, Korth and Sudarshan2.46Database System Concepts

Alternative E-R Notations

©Silberschatz, Korth and Sudarshan2.47Database System Concepts

DATABASE MODELING

©Silberschatz, Korth and Sudarshan2.48Database System Concepts

the better we understand the data,

the more effective

the discovery and retrieval will be.

©Silberschatz, Korth and Sudarshan2.49Database System Concepts

TYPES OF MODELS AVAILABLE

OBJECT BASED LOGICAL MODELS. Entity – relationship (ER) model Object oriented model Semantic data model Functional data model

RECORD BASED LOGICAL MODELS. Relational model Network model Hierarchical model

PHYSICAL MODELS. Unifying model Frame-memory model

©Silberschatz, Korth and Sudarshan2.50Database System Concepts

What is this Record Based Logical Model

To describe data at the logical and view levels. Can be used to specify the overall logical structure. Can also be used to provide higher level description of the

implementation. Fixed format records from structured database.

©Silberschatz, Korth and Sudarshan2.51Database System Concepts

Database Models

Employee2

Employee2

A

Empno Ename Etitle Dept

1

2 B

3 C

Relational Structure

Network Structure

Hierarchical Structure

Employee3

Project BProject A

Dept Dname Dloc Dmgr

A

B

C

Employee2

Employee1

Project A

Employee1

Dept A Dept BDept

Project B

©Silberschatz, Korth and Sudarshan2.52Database System Concepts

Hierarchical Database Model

• Organize data in tree structure• Hierarchy of parent-child relationship• One-to-many relationship• Restricts the child segment to having only one parent segment• Redundant data• Used for structured, routine types of transactions• Not flexible in support of databases• Cannot easily handle ad hoc requests

©Silberschatz, Korth and Sudarshan2.53Database System Concepts

Relational Database Model

• Organizes data in the form of tables consisting of rows and columns• Many-to-many relationship• Based on Relational Algebra such as the application of Boolean operators, Projection,

Cartesian Product• Redundancy in the data can be avoided by normalization rules• Used for unstructured types of transactions• More flexible in support of databases• Can easily handle ad hoc requests

©Silberschatz, Korth and Sudarshan2.54Database System Concepts

What is DBMS

D

B

M

S

DEFINE THE (TYPE OF) DATABASE

BUILD THE STRUCTURE (FOR STORAGE)

MANIPULATE DATA (FOR RETRIEVAL)

SAFETY OF THE DATASYSTEM CRASH

UNAUTHORISED INTRUSIONS

©Silberschatz, Korth and Sudarshan2.55Database System Concepts

DDL AND DML

DBMS

DATA DEFINITION PART: DATA DEFINITION LANGUAGES (DDL)COBOL, SQL, Creation of tables, Entering data,Normalisation

DATA MANIPULATION PARTDATA MANIPULATION LANGUAGES (DML)Structured query languages (SQL) QBEQuery, Report generation and writing, Checking errors, other control features

©Silberschatz, Korth and Sudarshan2.56Database System Concepts

Database Management Systems

DBMS is a set of computer programs that facilitates Creation of database by programmers Maintenance and security of database by DBAs and Application of database by end users.

DDL– Data Definition Language is for defining tables (files)

DML - Data Manipulation Language is for processing data and records (e.g., update, sort, …)

DCL - A Data Control Language is a computer language for controlling access to data in a database. Examples of DCL commands are : GRANT and REVOKE

Data dictionary- computer-based catalog or directory containing

metadata.

©Silberschatz, Korth and Sudarshan2.57Database System Concepts

DATA DICTIONARY

It is a database management catalog, prepared by database designers to help individuals to enter data.

It contains metadata, i.e. Data on data, like, why the data item is needed, how often it should be updated, on which form and reports the data appears.

It relies on a DBMS software component to manage a database of data definitions i.e. metadata about the structure, data elements, and other characteristics of a database.

It contains the names and descriptions of all types of data records and their inter relationships.

It also has information on end user’s access requirements, use of application programs, database maintenance and security.

It can be queried by database administrator. He can make some changes in data dictionary.

©Silberschatz, Korth and Sudarshan2.58Database System Concepts

©Silberschatz, Korth and Sudarshan2.59Database System Concepts

Concept of primary key

©Silberschatz, Korth and Sudarshan2.60Database System Concepts

Types of keys

Primary key

1. Simple Primary key

2. Composite Primary Key Secondary Key Foreign Key Reference Key

------------------------------------------------------------------------------

Primary key: Unique identifier attribute for an entity.

Simple Primary key : Based on single attribute

Composite Primary Key : Based on two or more attributes

Secondary Key : All attributes which is not primary

Foreign Key : Primary key in one table, which is not primary in another table.

Reference Key : Column in child table, which is referred by primary key by parent table.

©Silberschatz, Korth and Sudarshan2.61Database System Concepts

TYPES OF DATABASES

OPERATIONAL DATABASE ANALYTICAL DATABASE DATA WAREHOUSING DISTRIBUTED DATABASE HYPERMEDIA DATABASE

©Silberschatz, Korth and Sudarshan2.62Database System Concepts

OPERATIONAL DATABASE

DATABASE ON ABOUT THE OPERATION IN AN ORGANISATION. CAN BE SUB-DIVIDED INTO SUBJECT AREAS LIKE:

TRANSACTION DATABASE

MARKETING DATABASE

PRODUCTION DATABASE

STORE INFORMATION RELATED TO DAY TO DAY OPERATIONS, RELATED TO CUSTOMERS

INVENTORY

EMPLOYEES

TRANSACTION INFORMATION SYSTEM (TPS) USE THIS DATABASE.

©Silberschatz, Korth and Sudarshan2.63Database System Concepts

ANALYTICAL DATABASE

INFORMATION EXTRACTED FROM OPERATIONAL AND EXTERNAL.

INFORMATION REQUIRED FOR MIDDLE LEVEL MANAGERS TO TAKE DECISIONS.

CAN ALSO BE CALLED AS MANAGEMENT DATABASES. USE MULTI-DIMENSIONAL DATA STRUCTURE TO

ORGANISE DATA. ONLINE ANALYTICAL PROCESSING, DECISION SUPPORT

SYSTEMS, EXECUTIVE SUPPORT SYSTEMS USE THIS DATABASE.

©Silberschatz, Korth and Sudarshan2.64Database System Concepts

DISTRIBUTED DATABASE

It is database where the data is not stored in one physical location, but is distributed over many locations (computers/ servers), which are geographically dispersed and connected by networking.

Example. Advantages. Disadvanteges

©Silberschatz, Korth and Sudarshan2.65Database System Concepts

HYPERMEDIA DATABASES

Hypermedia databases are a repository of home pages and other hyperlinked pages of multimedia.

Can store text, graphics, photo or video files. World wide web uses hypermedia databases to access HTML

files, GIF files and video files.

©Silberschatz, Korth and Sudarshan2.66Database System Concepts

Data Warehousing

Large organizations have complex internal organizations, and have data stored at different locations, on different operational (transaction processing) systems, under different schemas

Data sources often store only current data, not historical data Corporate decision making requires a unified view of all

organizational data, including historical data A data warehouse is a repository (archive) of information

gathered from multiple sources, stored under a unified schema, at a single site

Greatly simplifies querying, permits study of historical trends

Shifts decision support query load away from transaction processing systems

©Silberschatz, Korth and Sudarshan2.67Database System Concepts

Definition of data warehousing

According to W.H.Inmon

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.

©Silberschatz, Korth and Sudarshan2.68Database System Concepts

©Silberschatz, Korth and Sudarshan2.69Database System Concepts

Data Warehousing

©Silberschatz, Korth and Sudarshan2.70Database System Concepts

DATA WAREHOUSING

A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site.

Important for large businesses which generate data from multiple divisions, possibly at multiple sites

Data may also be purchased externally

This has been depicted pictorially in the next slide.

Concept is much alike to physical warehouses (Godowns)

It supports online analysis of sales, inventory and other vital business data, that has been called from operational systems.

It also enables to go into the past history, to improve upon the future operations, to provide more customer satisfaction, introduce customer oriented schemes etc.,

©Silberschatz, Korth and Sudarshan2.71Database System Concepts

DATA MART

One way of putting it across what is data mart is:

“It is data warehouse that has limited or specific scope”. It contains selected information from the data warehouse such

that each separate data mart is customised for the decision support applications of a particular end-user group.

Data stored in a centralised location of data warehouse is edited, standardised, integrated and updated so that it can be used by managers for decision making.

Some of the organisations, instead of having one centralised warehouse, might opt for multiple data marts, each one one concentrating on specific aspect.

©Silberschatz, Korth and Sudarshan2.72Database System Concepts

DATA MART

Data warehouse

Data mart(Production)

Data mart(Finance)

Data mart(Marketing)

©Silberschatz, Korth and Sudarshan2.73Database System Concepts

Data Mining

Broadly speaking, data mining is the process of semi-automatically analyzing large databases to find useful patterns

Like knowledge discovery in artificial intelligence data mining discovers statistical rules and patterns

Differs from machine learning in that it deals with large volumes of data stored primarily on disk.

Some types of knowledge discovered from a database can be represented by a set of rules.

e.g.,: “Young women with annual incomes greater than $50,000 are most likely to buy sports cars”

Other types of knowledge represented by equations, or by prediction functions

Some manual intervention is usually required

Pre-processing of data, choice of which type of pattern to find, postprocessing to find novel patterns

©Silberschatz, Korth and Sudarshan2.74Database System Concepts

What is Data Mining?

Data Mining is:

(1) The efficient discovery of previously unknown, valid, potentially useful, understandable patterns in large datasets

(2) The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner

©Silberschatz, Korth and Sudarshan2.75Database System Concepts

What is Data Mining?

Very little functionality in database systems to support mining applications

Beyond SQL Querying:

SQL (OLAP) Query:- How many computers did we sell in the 1st Qtr of 1999

in Chennai vs New Delhi?

Data Mining Queries:- Which sales region (Chennai, Mumbai, Kolkata & New

Delhi) had anomalous sales in the 1st Qtr of 2005?

- How do the buyers of computers in Chennai and New Delhi differ?

- What else do the buyers of computers in Chennai buy along with computers, as compared to New Delhi?

©Silberschatz, Korth and Sudarshan2.76Database System Concepts

Relation between data mining and OLAP

Data mining can be viewed as an advanced stage of On-line Analytical Processing (OLAP).

However, data mining goes far beyond the narrow scope of summarization style analytical processing of data warehouse systems by incorporating more advanced techniques for data understanding.

©Silberschatz, Korth and Sudarshan2.77Database System Concepts

Applications of Data Mining

Prediction based on past history

Predict if a credit card applicant poses a good credit risk, based on some attributes (income, job type, age, ..) and past history

Predict if a customer is likely to switch brand loyalty

Predict if a customer is likely to respond to “junk mail”

Predict if a pattern of phone calling card usage is likely to be fraudulent

Descriptive Patterns

Associations Find books that are often bought by the same

customers. If a new customer buys one such book, suggest that he buys the others too.

Other similar applications: camera accessories, clothes, etc.

Data mining is the method used by the companies to sort and analyse information to better understand their customers, products, markets or any other aspect/phase of their business.

©Silberschatz, Korth and Sudarshan2.78Database System Concepts

©Silberschatz, Korth and Sudarshan2.79Database System Concepts

OLTP AND OLAP

ONLINE TRANSACTION PROCESSING (OLTP) And

ONLINE ANALYTICAL PROCESSING (OLAP)

ARE TWO IMPORTANT ASPECTS OF DATA MINING. OLTP: It refers to the immediate and automated response to the

requests of the user. Example: Send a query mail to HDFC or ICICI bank, you will get

immediate response (sort of acknowledgement). OLTP is designed to handle multiple concurrent transactions

from the customers. It has a fixed number of inputs (type of query) and standard

format output. OLTP is a big part of interactive e-Commerce applications.

©Silberschatz, Korth and Sudarshan2.80Database System Concepts

EXAMPLE OF OLTP

From:           [email protected]  [[email protected]]Sent:           Saturday, Jun 4 2005 2:26PMTo:             [email protected] [[email protected]]Subject:        Banker's Cheque Inquiry

Name       : MR.     T MOHANAKRISHNANCustomer ID: 7488678Account No : 0821050216585

Sir/Madam, I have deposited a US cheque for $250, about 15 days back, through Lloyds Road ATM center. I would like to get confirmation of receipt of this cheque, as well as approximate time to realise the cheque, please.Thanking you,T Mohanakrishnan.

©Silberschatz, Korth and Sudarshan2.81Database System Concepts

EXAMPLE FOR OLTP

Dear Customer,

Thank you for writing to us. This auto acknowledgement confirms the receipt of your e-mail. This interaction is being tracked through the subject line. We request you not to change the subject line.

If you are an existing account holder and have not mentioned your Customer Identification Number / Account Number in your earlier mail, please re-send your mail with the details.

Please ignore this message if you have quoted your Customer Identification Number / Account Number.

We will get back to you shortly.

Warm regards,

HDFC Bank Ltd

©Silberschatz, Korth and Sudarshan2.82Database System Concepts

OLAP

A graphical software tools that provide complex analysis of data stored in database.

Why it is called “Online” analytical processing?? Purpose of OLAP server links data and special functions. Analysis by OLAP goes beyond normal database queries. It can provide time series and trend analysis views of the

data. “what if” and “why” analysis.

©Silberschatz, Korth and Sudarshan2.83Database System Concepts

WHAT IF ANALYSIS

A CAPABILITY OF SOME INFORMATION SYSTEMS (EX: DECISION SUPPORT SYSTEM) THAT ALLOWS USER TO MAKE HYPOTHETICAL CHANGES TO THE DATA ASSOCIATED WITH THE PROBLEM AND OBSERVE HOW THESE CHANGES INFLUENCE THE RESULTS OR OUTCOME.

©Silberschatz, Korth and Sudarshan2.84Database System Concepts

WHAT IF ANALYSIS

Take the case of pay roll of a company. Basic pay, DA, other allowances. If DA is increased by some %, what is the net result or load on

ex-chequer. If Basic pay is increased, what are the effects? (HRA, DA etc.,

are % of Basic)

©Silberschatz, Korth and Sudarshan2.85Database System Concepts

OLAP

OLAP applications are found in the area of financial modeling (budgeting, planning), sales forecasting, customer and product profitability etc

Provides leverage to library managers by providing the ability to model real life projections and a more efficient use of resources

OLAP enables the organization as a whole to respond more quickly to market demands and improve revenue and profitability

©Silberschatz, Korth and Sudarshan2.86Database System Concepts

Additional Database Models

East

WestDenver

FebActual Budget

Margin

Sales TV

VCR

TV

VCR

MultidimensionalDatabase Structure

Attributes• Customer• BalanceOperations• Deposit• Withdraw

Bank Account Object

Attributes• Credit Line• Mthly StatementOperations• Calculate Interest• Print Mthly Statement

Checking AccountObject

Attributes• Credit Line• Mthly StatementOperations• Calculate Interest• Print Mthly Statement

Savings AccountObject

Object-OrientedDatabase Structure

©Silberschatz, Korth and Sudarshan2.87Database System Concepts

Multidimensional Model Enables Realistic Description of Data The multidimensional model is also a natural fit for describing

and storing complex data. Developers can create data structures that accurately represent real-world data, thus making it faster to develop applications and easier to maintain them.

Multidimensional Model

©Silberschatz, Korth and Sudarshan2.88Database System Concepts

Object-Oriented Database Model

• Offers more complex data types to overcome the restriction of normalization rules for relational databases.

• Object-Oriented databases is based on the following principles:

• Encapsulation• Inheritance• Polymorphism• Aggregation:

©Silberschatz, Korth and Sudarshan2.89Database System Concepts

OBJECT ORIENTED DATABASE

©Silberschatz, Korth and Sudarshan2.90Database System Concepts

Object-Oriented Database Model

• Encapsulation: An application (another object) can only communicate with an object via messages. The operations provided by an object define the set of messages which can be understood by it; no other operations can be applied to an object.

• Inheritance: New object classes can be derived from another class (the super-class) by inheritance. The new classes inherit the attributes and methods of the super-class and offer additional attributes and operations. The relation between a derived class and its super-class is called ``is A" relation because an instance of the derived class also is an instance of the super-class.

• Polymorphism: This feature is closely connected to inheritance. Derived classes may re-define methods of their super-class(es). This is very useful for achieving class-specific behaviour using messages already available for the super-class.

Aggregation: Composite objects may be constructed as consisting of a set of elementary objects. The container object can communicate with its contained objects via their methods. The relation between the container object and its components is called ``partOf" relation because a component is a part of the container object.

©Silberschatz, Korth and Sudarshan2.91Database System Concepts

CHARACTERISTICS OF A WELL DEFINED DATABASE SYSTEM

DATA INTEGRITY : Ability to navigate and ability to manipulate to produce results.

DATA INDEPENDENCE: Logical independence: not affected by a change in application program Physical Independence: not affected by a change in data structure

PREVENTION OF DATA REDUNDANCY & INCONSISTENCYRedundancy is proportional to storage space

SHARING OF DATAFlexibility in accessing & viewing data by different usersAvailability of data to existing as well as new applications.

DATA MAINTENANCE Checking the integrity of the system and data Provision for modification, addition, deletion of records

DATA SECURITY and REGULATION OF STANDARDSSystem failure/ crash, backup Unauthorised users

SOP, provision of pass words, DBA’s tasks