database management fall 2003 data integrity chapter 18

40
Database Management Fall 2003 Data Integrity Chapter 18

Post on 22-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Database ManagementFall 2003

Data Integrity

Chapter 18

Hand out slides!

Strategies for data integrity

• Protecting existence– Preventative

» Isolation

– Remedial» Database backup and recovery

• Maintaining quality– Update authorization– Integrity constraints– Data validation– Concurrent update control

• Ensuring confidentiality– Data access control– Encryption

Strategies for data integrity

• Legal– Privacy laws

• Administrative– Storing database backups in a locked vault

• Technical– Using the DBMS to enforce referential integrity

constraint

Data Integrity Issues

• Data Consistency– Making sure the data at all times accurately

depicts the real world– Solution:

» Transaction Processing

• Concurrancy– Maintaining consistency when multiple people are

making changes to the database at once– Solutions:

» Before image for read consistency» Data locking

Transaction Processing

• Consider a banking application that processes fund transfers from one account to another.

• Steps to transfer from savings to checking:– Credit checking account– Debit savings account

• If a system problem after the first step prevents the second from completing, the account information will be inaccurate.

• Solution: encapsulate the steps in a transaction that can either be committed or rolled back (undone) as a single unit.

Transaction processing• A transaction is a series of actions to be taken on

the database such that they must be entirely completed or aborted

• A transaction is a logical unit of work (LUW)• Example

BEGIN TRANSACTION;

EXEC SQL INSERT …;

EXEC SQL UPDATE …;

EXEC SQL INSERT …;

COMMIT TRANSACTION;

• COMMIT TRANSACTION– makes the modifications permanent

• ROLLBACK TRANSACTION– returns database to the state at ‘BEGIN

TRANSACTION’

ACID

Atomicity If a transaction has two or more discrete pieces of information, either all of the pieces are committed or none are

Consistency

A transaction either creates a valid new database state, or, if any failure occurs, the transaction manager returns the database to its prior state

Isolation A transaction in process and not yet committed must remain isolated from any other transaction (read consistency)

Durability Committed data are saved by the DBMS so that, in the event of a failure and system recovery, these data are available in their correct state

Transaction Processing

• Transaction processing requires the use of a before image, also known as a ROLLBACK log or an UNDO log, and an after image, also known as a transaction or REDO log.

• The rollback and transaction logs are areas of memory or disk reserved by the database for the purpose of recording changes to the data

Rollback log or journal

• Referred to in the textbook as a before image

• Records a backup copy of all data before they are changed

• Used by the database to undo changes when a transaction is rolled back

• Can also used when another transaction executes a SELECT against the same data before the changes are committed

Read Consistency

Getrecord

Retrievedrecord

Processrecord

Updatedrecord

CPU Log updatetransaction

Log before imageof record

Log after imageof record

Outputmessage

Updatetransaction

8

9

4

5

2

1

7

Writeupdatedrecord

Obtainrecord

6

3

Periodicdatabasebackup

Read Consistency

• When a transaction changes data, no other transactions may see the changes until they are commited.

• The transaction that makes the changes does see them before commit.

• If a COMMIT occurs while another person is doing a SELECT, the SELECT will not see those changes.

Concurrent update

• The lost data problem

Time Action Database record

Part# QuantityP10 40

T1 User A receives paperworkfor a delivery of 80 units of P10

T2 User A reads P10 P10 40

T3 User B sells 20 units of P10

T4 User B reads P10 P10 40

T5 User A processes the delivery(40 + 80 = 120)

T6 User A updates the file P10 120

T7 User B processes the sales(40 - 20 = 20)

T8 User B updates the file P10 20

Locking• Locking data prevents other users from

modifying it• Different levels of locking

– Table lock– Page lock– Row lock

• Different kinds of locks• Shared lock

– Allows INSERT, UPDATE or DELETE of other rows– Allows other shared locks but not exclusive locks

• Exclusive lock– Prevents INSERT, UPDATE or DELETE– Does not allow any other locks

Concurrent update

• Avoiding the lost data problem - locking

Time Action Database recordPart# QuantityP10 40

T1 User A receives paperworkfor a delivery of 80 units of P10

T2 User A reads P10 P10 40

T3 User B sells 20 units of P10

T4 User B attempts to read P10 denied P10 40

T5 User A processes the delivery(40 + 80 = 120)

T6 User A updates the file P10 120

T7 User B reads P10 P10 120

T8 User B processes the sales(120 - 20 = 100)

T9 User B updates the file P10 100

Concurrent update

• The deadly embrace– User A’s update transaction locks record 1– User B’s update transaction locks record 2– User A attempts to read record 2 for update– User B attempts to read record 1 for update

Update transaction(User A)

Update transaction(User B)

Record 1

Record 2

Lock record 11

Lock record 22

Attempt to lock record 13

4 Attempt to lock record 2

Database BackupsWhy backup databases?• Things go wrong• Things go really wrong with computers• Hardware failure

– Disk crash– Power failure– Natural disasters

• Software data corruption– Application defects– Operating system failures– User input errors

Database update process

Database(state 1)

Database(state 2)

Database(state 3)

Database(state 4)

Database(state 2)

Updatetransaction A

Updatetransaction B

Updatetransaction C

Backup

• Database should be recoverable to any given state

Transaction or REDO logsare required to restore thedatabase to a given state

Backup options

Objective Action

Complete copy of database Dual recording of data (mirroring)

Past states of the database(also known as database dumps)

Database backupData export

Changes to the database Before image log or journalAfter image log or journalIncremental backup

Transactions that caused a change in the state of the database

Transaction log or journal

Mirroring Disks• Mirroring disk drives

– Two or more duplicate copies of database– Two-way mirror protects against disk failure but

not server failure– Mirror can be ‘broken’ to make backup copies– Mirror is ‘resilvered’ after the backup

3-way mirror

One third of the mirroris ‘broken’ to make a

backup of the databasewhile the server is

still running

Copy

Standby Database• Standby database is a duplicate database

server, preferably in a separate location• Database software keeps the databases in

sync (often using the transaction logs)• In the event of a crisis, the system will ‘fail

over’ to the standby database

Denver, CO Boston, MA

Productiondatabase

Transactionlogs

Standbydatabase

Transaction Logging

• The transaction log, or journal, is a file where the database records every data change

• If necessary, the database can automatically apply all the previous transactions (or “roll forward”) to any point in time

• Rolling forward requires having a saved copy of the database from a previous point in time

Database Dumps• A snapshot of the database at a particular

point in time• Copies of the database files are made to

another disk, server or to tape• Cold backup

– Shutdown database before copying– Faster, more reliable, but database is down

• Hot backup– Make copies while database is running– Slower, less consistent, but database stays up– Requires use of transaction logs

• Incremental backup– Just the changes since the last complete backup

Database Dumps

Cold backup

Hot backup

Day 1

Day 2

IncrementalDay 3, etc…

Transaction logs

Transaction logs

Typical backup schedule

Recovery strategies

• Switch to a duplicate database– RAID technology approach

• Backup recovery or rollback– Return to prior state by applying before-images

• Forward recovery or rollforward– Recreate by applying after-images to prior backup

• Rollback uncommited transactions– Use saved before-images to rollback

• Reprocess transactions

Database Recovery

Copy hot backup backto databaseDay 2

Day 3

Rerun transactions using transaction logs

Recover database to it’s state at 12:00 noon on day 3:

Backups and transaction logscombine to enable point-in-timerecovery

BackwardRecovery

ForwardRecovery

Data recoveryProblem Recovery ProceduresStorage medium destruction

(database is unreadable)

*Switch to duplicate database—this can be transparent with RAID

Forward recovery

Reprocess transactions

Abnormal termination of an update transaction(transaction error or system failure)

*Backward recoveryForward recovery or reprocess transactions—bring forward to the state just before termination of the transaction

Incorrect data detected(database has been incorrectly updated)

*Backward recoveryReprocess transactions(Excluding those from the update program that created incorrect data)

Database crash (power failure, etc)

Rollback transactionsForward recovery* preferred strategy

Data quality

• Definition– Data are high quality if they fit their intended

uses in operations, decision making, and planning. They are fit for use if they are free of defects and possess desired features.

• Determined by the customer• Relative to the task

Integrity constraintsType of constraint

Explanation Example

TYPE Validating a data item value against a specified data type.

Supplier number is numeric.

SIZE Defining and validating the minimum and maximum size of a data item.

Delivery number must be at least 3 digits and at most 5.

VALUES Providing a list of acceptable values for a data item.

Item colors must match the list provided.

RANGE Providing one or more ranges within which the data item must fall or must NOT fall.

Employee numbers must be in the range 1-100.

PATTERN Providing a pattern of allowable characters which define permissible formats for data values.

Department phone number must be of the form 542-nnnn (stands for exactly four decimal digits).

PROCEDURE Providing a procedure to be invoked to validate data items.

A delivery must have valid itemname, department, and supplier values before it can be added to the database. (Tables are checked for valid entries.)

CONDITIONAL

Providing one or more conditions to apply against data values.

If item type is ‘Y’, then color is null.

NOT NULL

(MANDATORY)

Indicating whether the data item value is mandatory (not null) or optional. The not null option is required for primary keys.

Employee number is mandatory.

UNIQUE Indicating whether stored values for this data item must be unique (unique compared to other values of the item within the same table or record type). The unique option is also required for identifiers.

Supplier number is unique.

Integrity constraints

Example Explanation

CREATE TABLE stock ( stkcode CHAR(3), …, natcode CHAR(3), PRIMARY KEY(stkcode), CONSTRAINT fk_stock_nation FOREIGN KEY (natcode) REFERENCES nation ON DELETE RESRICT);

Column stkcode must always be assigned a value of 3 or less alphanumeric characters. stkcode must be unique because it is a primary key.

Column natcode must be assigned a value of 3 or less alphanumeric characters and must exist as the primary key of nation.

Do not allow the deletion of a row in nation while there still exist rows in stock containing the corresponding value of natcode.

Data Security

• Three levels of security

• Authentication– Who are you?

• Authorization– What do you have privileges to do?

• Encryption– Keeping private data secret

A general model of data security

Identificationchecked

Authorizationchecked

Dataretrieved

Encryptionprocessing Database

User profilesand

authorizationtables

User

Userid

DBMS access denied

Identification data

User privilegesdata

DBMS access approved

Retrieval request

Request denied

Results of request

Request approved

Authenticating mechanisms

• Information remembered by the person– Name– Account number– Password

• Object possessed by the person– Badge– Plastic card– Key

• Personal characteristic– Fingerprint– Signature– Voiceprint– Handsize

Firewall

• A device placed between an organization’s network and the Internet

• Monitors and controls traffic between the Internet and Intranet

• Approaches– Restrict packets to those with designated IP

addresses– Restrict access to applications

Authorization tables (views)

Subject/Client Action Object Constraint

Accounting department Insert Supplier record

None

Purchase department clerk

Insert Supplier record

If quantity < 200

Purchase department supervisor

Insert Delivery record

If quantity ≥ 200

Production department Read Delivery record

None

Todd Modify Item record Type and color only

Order processing program Modify Sale record None

Brier Delete Supplier record

None

• Indicate authority of each user or group

SQL authorization

• Views• Grant

– Giving object privileges to users

• Revoke– Removing privileges

• Roles– Assigning groups of privileges to groups

of people

• System privileges– Privileges to create, alter or drop objects

Encryption

• Encryption is as old as writing• Sensitive information needs to remain

secure• Critical to electronic commerce• Encryption hides the meaning of a

message• Decryption reveals the meaning of an

encrypted message

Public key encryption

DecryptEncrypt

Receiver’spublic key

Receiver’sprivate key

Sender Receiver

• Receiver has two keys: public and private

• Sender uses public key to encrypt

• Receiver uses private (secret) key to decrypt

Signing

• Message authentication• Identifies the message sender• Requires digital certificates issued by a

certificate authority (CA), such as Verisign®

VerifySign

Sender’sprivate key

Sender’spublic key

Sender Receiver

Monitoring activity

• Audit trail analysis– Time and date stamp all transactions– Audit trail must be kept secure

• Monitor a sequence of queries– Tracker queries– Transaction log mining