chapter 12 file processing and data management concepts

29
Chapter 12 File Processing and Data Management Concepts

Upload: jocelyn-allton

Post on 14-Dec-2015

234 views

Category:

Documents


2 download

TRANSCRIPT

Chapter 12File Processing and Data Management Concepts

Presentation Outline

I. Terminology

II. Database Technology

III. The Architecture of a Database Management System (DBMS)

IV. The Database Administrator

I. Terminology

A. Field

B. Data Occurrences

C. Fixed vs. Variable Length Records

D. Record Key

E. Sort Keys

A. Field

A field is the smallest block of data that will be stored and retrieved in the information system.

Other names for field include data item, attribute, or element.

Field 1 Field 2

B. Data Occurrences

A specific set of data values for a record in a file.

1

2

3

4

5

The above table contains 5 occurrences of account records for the general ledger account file.

C. Fixed vs. Variable Length Records

Fixed Length RecordsBoth the number of fields

and the length of each field are fixed.

Strength: Easier to manipulate records.Weakness: Must

accommodate maximum sizes.

Variable Length RecordsBoth the number of fields

and the length of each field are variable. (See Fig. 15-1

on p. 603)Strength: Less waste of

memory when maximum sizes do not have to be

accomodated.Weakness: Record manipulation is more

difficult.

D. Record Key

A record key is a field or combination of fields that uniquely identifies a particular record in a file.

1110

1500

2105

2110

E. Sort Keys Primary sort key – The first field used to sort the data

occurrences in a record set. Secondary sort key – A field used to determine relative

position among a set of data occurrences in a record set. Tertiary sort key – Additional fields beyond primary and secondary sort keys that are required to uniquely identify data

occurrences in a record set.

Last Name First Name Age

Adams Tom 25

Jones Alisa 36

Jones Julie 19

Jones Julie 21

Young Sam 22

II. Database Technology

A. The Problem of Redundancy

B. The Components of a Database

A. The Problem of Redundancy

Redundancy occurs when different areas of

an organization use the information system

to store the same information in more

than one place.Results in update

anomaly.

That is not what we show

per our records.

B. The Components of a Database Management System

1. Data Description Language (DDL)

2. Data Manipulation Language (DML)

3. Data Query Language (DQL)

1. Data Description Language (DDL)

Defines the logical structure of the database (known as the schema). Defines the

following:Name of data fields.

Type of data (numeric, alphabetic, etc.)

Number of positions (length of field).

May also define subschema (i.e., individual

user views)

2. Data Manipulation Language (DML)

The DML consists of the commands for updating,

editing, manipulating, and extracting data.

Structured query language (SQL) is a common DML

in relational settings.

Pull a trial balance.

Structure Query Language (SQL)

3. Data Query Language

A data query language is a user friendly language or interface that allows the user to request information by simply filling in blanks. Represents a special type of

DML.

Query by Example (QBE)

III. The Architecture of a Database Management System (DBMS)

A. The Database Architecture

B. The Conceptual Architecture and Entity-Relationship (ER) Diagrams

C. Logical Data Structures

D. The Physical Structure

A. The Database Architecture

ConceptualLevel

Database contents Uses of database Desired reports Information to be viewed

Logical Level

Logical data structures: Tree Network Relational

Physical Level

Access Methods: Sequential Access Indexed Files

B. The Conceptual Architecture and Entity-Relationship (ER) Diagrams

Square boxes are used for entities (separate tables).

Ellipses are used for attributes (table

columns).Diamond shaped

boxes depict relationships.

PART

PART_NO NAME

COST

STORED AT

LOCATION

WHSE ADDRESS

C. Logical Data Structures

1. Tree or Hierarchical Structure2. Network Structures3. Relational Structure

a. Selectionb. Projection

c. Join

1. Tree or Hierarchical StructureA parent record can have many children. However a child record can have only one parent.Can only model 1:1 (one-to-one) and 1:* (one-to-many) relationships.Commonly used with accounting data. Can only access data by going from a parent to child.

Balance Sheet

Assets Liabilities Equity

Current Assets

Long-term Assets

Current Liabilities

Long-term Liabilities

Revenues

Expenses

2. Network Structure

Eliminates the distinction of parent and child records. A parent can have many children and a child can have many parents.Can model 1:1 (one-to-one), 1:* (one-to-many), and *:* (many-to-many) relationships.Must know the physical structure of the data in order to access it.

3. Relational Structure

Relational databases organize and store data in two dimensional tables consisting of

rows and columns.Relationships among tables are represented

by common data values in different tables.Straight forward in terms of organizing and

searching the data. Possesses ad hoc search capabilities.

3a. SelectionProduces a horizontal subset (includes entire row) of

a relation which satisfies a boolean predicate.

Name Acct # Balance

John 123 35.75

Bill 205 3.95

Mary 707 7.95

Joe 127 4.05

Balance < 5.00(Savings)

Savings Table

Name Acct # Balance

Bill 205 3.95

Joe 127 4.05

3b. ProjectionConstructs a vertical subset of a relation. The subset is obtained by selecting specified attributes and removing

others.

Name Acct # Balance

John 123 35.75

Bill 205 3.95

Mary 707 7.95

Joe 127 4.05

Balance < 5.00(Savings)

Savings Table

Name

Bill

Joe

Name

3c. JoinA join is used to combine 2 tables. The attribute

used to join must be in both tables.

A B C

a1 b1 c1

a2 b2 c2

a2 b2 c3

a4 b2 c2

Table R

C D E

c2 d1 e1

c3 d2 e3

c2 d1 e2

Table S

A B C D E

a2 b2 c2 d1 e1

a2 b2 c2 d1 e2

a2 b2 c3 d2 e3

a4 b2 c2 d1 e1

a4 b2 c2 d1 e2

R |X| S

D. The Physical Structure

1. Sequential Access

2. Indexed Files

1. Sequential Access

Records can only be accessed in a predefined sequence. For example, if there are 100 records in a file, one must access

the first 99 records before accessing the last

record. Generally useful for

batch processing when nearly all records must

be accessed.

2. Indexed Files Any attribute can be

extracted from the records in a primary file and used to

build a new file whose purpose is to provide an index to the original file.

First, the index is searched to find a specified value of an

attribute such as an customer account number.

Second, the disk addresses are used to directly retrieve

the desired recordsSee Fig. 12-13 on p. 427.

IV. The Database Administrator

The database administrator is a

person who coordinates data

management activities such as approving the physical contents and

user views of the database.

This is not quite what we

need.

Summary

Fields and keysThree Components of a DBMS

Three Types of Database ArchitectureThe Database Administrator