Download - Introduction to dbms.pdf
➤●
What is a database?data database
information information baseknowledge knowledge base
wisdom ???philosophy !!!
??? ...
020-Intro: 1HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
What is a database
◆ A database is a collection of data items
• Usually owned by a single enterprise or organization
• Contain facts the enterprise or organization cares
about
◆ The data items can be text, numbers, dates, sound file,
music, video, among others
◆ Searched by using a key
020-Intro: 2HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database application example:Using an ATM
◆ A database in the bank keeps data about your account
◆ Passwords are verified to allow transactions to be done
on your account
◆ Transactions are recorded in the central database of the
bank
◆ Ensures that no two transactions can be done in parallel
in a way that create anomalies
020-Intro: 3HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database application example:Searching books in a library
◆ The library INNOPAC system keeps data such as book
titles, call numbers, locations, table of contents, and
user loan records
◆ The content of the database is searched when you query
it for records of the title of a book
◆ Loan status of a book and the user borrow status are
changed when you check out a book
◆ The system allows multiple database transactions to be
carried out at the same time
020-Intro: 4HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database application example:Purchasing from a supermarket
◆ The supermarket database keeps data such as product
bar codes, product names, and price
◆ Products are scanned at the checkout counter and is
looked up for the price
◆ Promotion discount information are also kept
◆ The database is also used in acquisition of products by
the supermarket
020-Intro: 5HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database application example:ICQ (uh oh!)
◆ An ICQ server is a database containing user information,
contact list (from 2001b on), and online status
◆ When you connect, your ICQ number is sent to an ICQ
server
◆ The server checks the online status of those in your
contact list and show them in your list
◆ The server also inform those in your contact list your
online status
020-Intro: 6HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database application example:Hostname resolution
◆ Host names (e.g., virtue.csis.hku.hk) need to be
resolved into IP addresses (147.8.176.10)
◆ Each machine may keep a host table in which host
name to IP addresses mappings are kept
◆ The table can be seen as a local database
◆ If not found, it may consult the Domain Name Server
(DNS)
◆ DNS may be located in a remote machine
◆ DNS may refer your request to a higher level DNS for
address resolution
020-Intro: 7HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Advanced database applications:Deriving data from databases
◆ Aggregate query:
• Given a database about your favorite singer:
◦ album titles
◦ album release date
◦ song titles in each album
• How many songs does her/his latest album contain?
• How many songs has he/she released in total?
020-Intro: 8HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Advanced database applications:Discovering information
◆ Discovering information. Remember the spectrum from
data to philosophy?
◆ Association rule mining:
• Given: a supermarket database containing transaction
information about the set of items bought together
by customers
• What combination of items are most frequently
bought together?
020-Intro: 9HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
What are the main issues of database design?
020-Intro: 10HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
effectiveefficient
× storage
retrieval
of data items
020-Intro: 11HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Effective retrieval
◆ Convenient and painless retrieval of data
◆ Special programs available to suit application
◆ Example: ATMs retrieve your account information
effectively — only a card and a few keypresses are
needed. (by the way, what does ATM stand for?)
020-Intro: 12HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Efficient retrieval
◆ Fast response of retrieval requests
◆ Enterprises maintain huge databases
• How many credit cards do you have?
• How many mobile phone numbers are there in Hong
Kong? (Visit the OFTA site for an answer)
◆ Index on data required for efficient data access
◆ Concurrent access of databases
020-Intro: 13HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Effective storage
◆ Convenient creation and modification of data
◆ Retrieved data is consistent with stored data
◆ Special programs available to suit application
◆ Example: file systems are effective in storage of digital
data
◆ What will happen if one process deletes a file while
another is accessing it in unix? Windows?
020-Intro: 14HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Efficient storage
◆ Data items use up only a limited amount of storage
space
◆ Enterprises maintain huge databases
• How much disk space is needed for a credit card
database in which each record is 64k byte in size?
◆ Reduction of redundant information needed
◆ Sharing of information via suitable database design
020-Intro: 15HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Database management systems
◆ To achieve the effectiveness and efficiency goals, we
need Database management systems (DBMSes)
◆ A DBMS should:
• Hide low-level implementation details of the database
from most users
• Provide database operations
• Implement database operations efficiently
• Allow multiple users to access the database
concurrently
020-Intro: 16HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
File systems store data effectively,
why not use flat files for database storage?
020-Intro: 17HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Flat file
◆ General purpose operating systems support file systems
of one kind or another
◆ A file can be seen as a stream of bytes
◆ Data items can be serialized and modeled as a stream of
bytes
◆ Files can be used to implement databases
020-Intro: 18HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Flat file exampleSavings Account:
array of record
accountNo: char(10); (* unique account number *)
balance: integer; (* balance *)
name: char(18); (* customer name *)
address: char(64); (* customer address *)
end record;
Current Account:array of record
accountNo: char(10); (* unique account number *)
balance: integer; (* balance *)
overdraftLimit: integer; (* overdraft limit *)
name: char(18); (* customer name *)
address: char(64); (* customer address *)
end record;
020-Intro: 19HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Flat files as byte streamsSavings Account:
accountNo: char(10);
balance: integer;
name: char(18);
address: char(64);
0102000001????Ogino Chihiro
Somewhere in Japan
0102000002????Haku
Aburaya
020-Intro: 20HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Flat files as byte streamsCurrent Account:
accountNo: char(10);
balance: integer;
overdraftLimit: integer;
name: char(18);
address: char(64);
0102000001????????Ogino Chihiro
Somewhere in Japan
0102000002????????Haku
Aburaya
020-Intro: 21HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
So what?
◆ The file formats are different
◆ Discrepancy is possible because different people may
handle different part of the database
◆ How do you synchronize data in different address books?
• Mobile phone book
• Society member/classmate list
• Little handy cards prepared by friends
• The one in your spreadsheet file
• The one in your PDA
020-Intro: 22HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Problems!◆ Data redundancy
• Wastes storage space
• May cause data inconsistency
◆ Data dependence
• Causes proliferation of application programs
• Data correctness depends on file formats
◆ Data isolation
◆ Atomicity problem
◆ Concurrent access anomalies
◆ Data access control problem
020-Intro: 23HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data redundancy ➤
Savings Account:0102000001????Ogino Chihiro Somewhere in Japan 0102000002????Haku Aburaya
Current account:0102000001????????Ogino Chihiro Somewhere in Japan 0102000002????????Haku Aburaya
◆ The same piece of information may be recorded multiple
times
• Names of account owners
• Addresses of account owners
◆ Wastes storage space
◆ May cause data inconsistency
020-Intro: 24HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data inconsistency ➤
Savings Account:0102000001????Ogino Chihiro Somewhere in Japan 0102000002????Haku Aburaya
Current account:0102000001????????OGINO Chihiro Somewhere in Japan 0102000002????????Haku Aburaya
◆ Chihiro’s name is not consistent across the two files
◆ Suppose Chihiro’s has changed her address to Tokyo,
under what occasion would the two files be inconsistent?
020-Intro: 25HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data dependence ➤
◆ List the names of all customers who live in Pokfulam:
write a special program
◆ List the names of all customers who live in Pokfulam
having more than 10000 dollars in their balance: write
another special program
◆ A special program need to be written for every query
020-Intro: 26HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
File format dependence ➤
◆ Correctness of data may depend on the file format
◆ For example, the integer 291 decimal, which is 123
hexadecimal, is stored as follows:Big endian mode: 00 00 01 23
Small endian mode: 23 01 00 00
◆ C and C++ storage depends on the byte sex of the CPU
◆ Java stores integers in big endian
◆ Correctly interpreting the integer depends on the CPU
and programming language used
◆ 587268096 is very different from 291 !
020-Intro: 27HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data access: what and how
◆ Programs tell the computer how to obtain the required
data
◆ Example: the 11th to 14th byte of the file for Savings
account contains the balance as an integer stored in big
endian format
◆ User queries specify what is needed
◆ Example: what is the address of Haku?
◆ Programmers who know the file format can transform
the “what” to the “how”
020-Intro: 28HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
More “what”, less “how”
◆ A way out: hide implementation details (file formats,
byte sex issues, etc.) from users
◆ Query languages are designed to do that — users only
need to write statements that tell a DBMS what he
wants, rather than programs that contain instructions
on how to obtain the required data
020-Intro: 29HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data isolation ➤
◆ Find the names of all customers who have both saving
and current accounts — how? �
◆ What if there are more account types?
◆ Scattering data in different files (and probably handled
by different people) makes programs that require access
of more than one file difficult to write
◆ DBMSes provide a central repository of data shared
among different users to enable avoidance of data
isolation problem
020-Intro: 30HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Atomicity problem:A fund transfer scenario ➤
◆ Suppose Chihiro wants to transfer HKD 500 from her
Savings account to Haku’s Current account
◆ A program for fund transfer can be used to handle that
◆ The program has to modify the contents of files for
both accounts
◆ Assume that files used are Savings account and
Current account
020-Intro: 31HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Atomicity problem:fund transfer steps ➤
1. Open the file Savings account
2. Open the file Current account
3. Retrieve record for Chihiro’s savings account
4. Deduce HKD 500 from Chihiro’s record
5. Write the updated record to the Savings account file
6. Retrieve record for Haku’s current account
7. Add HKD 500 to Haku’s record
8. Write the updated record to the Current account file
9. Close the Current account file
10. Close the Savings account file
020-Intro: 32HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Fund transfer troubles1. Open the file Savings account
2. Open the file Current account
3. Retrieve record for Chihiro’s savings account
4. Deduce HKD 500 from Chihiro’s record
5. Write the updated record to the Savings account file
6. Retrieve record for Haku’s current account
7. Add HKD 500 to Haku’s record
8. Write the updated record to the Current account file
9. Close the Current account file
10. Close the Savings account file
◆ Assume that the file system doesn’t do buffering; write
operations are immediately reflected in files. What if the
system crashes after Step ??? Step ??? Step ???
Step ???020-Intro: 33
HKU CSIS0278[AB] 2002-2003Introduction to Database Systems
Fund transfer troubles
◆ What if there is buffering?
020-Intro: 33HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Atomicity
◆ Computers may fail: power failure, disk crash, virus
infection, hacker intrusion, . . .
◆ These should not cause data corruption
◆ Half-executed transactions (e.g., fund transfer
operations) should be completed or undone
◆ Transactions should be atomic
◆ All-or-nothing property needed
◆ DBMSes handle database transactions atomically
020-Intro: 34HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Concurrent access:handling deposits ➤
◆ Suppose the following method is used in a bank to
handle deposits:
void deposit(Acct acct, double sum){
acct.open();double bal=acct.getBalance();bal+=sum;acct.setBalance(bal);acct.close();
}
020-Intro: 35HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
A serial deposit scenario
Deadline is far, only early birds pay the tuition fee:
Time↓
// in HKU branch HKUAcct. // in Central branchdeposit(Account.HKUAcct,21050) balance deposit(Account.HKUAcct,21050)------------------------------ -------- ------------------------------acct.open(); 0double bal=acct.getBalance(); 0bal+=sum; 0acct.setBalance(bal); 21050acct.close(); 21050
21050 acct.open();21050 double bal=acct.getBalance();21050 bal+=sum;42100 acct.setBalance(bal);42100 acct.close();
020-Intro: 36HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
A parallel deposit scenario
Today is the deadline, people rush to pay the tuition fee:
Time↓
// in HKU branch HKUAcct. // in Central branchdeposit(Account.HKUAcct,21050) balance deposit(Account.HKUAcct,21050)------------------------------ -------- ------------------------------acct.open(); 42100double bal=acct.getBalance(); 42100bal+=sum; 42100
42100 acct.open();42100 double bal=acct.getBalance();42100 bal+=sum;63150 acct.setBalance(bal);63150 acct.close();
acct.setBalance(bal); 63150acct.close(); 63150
Two people have paid, why isn’t the balance 84200?
020-Intro: 37HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Concurrency control◆ Changes in local copies of data items do not affect
others using the data item
◆ Read/Write access to files are not restricted
◆ Concurrency control needed: disallow conflicting
accesses to the same piece of data item by different
transactions
◆ A simple example: lock the record whenever it need to
be accessed
◆ Transaction may need to wait, abort, or restart
◆ Deadlock/livelock problem
020-Intro: 38HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
Data access control ➤
◆ Direct sales department of the bank only need to know
customer name and address
◆ Account balance is sensitive information
◆ No way to restrict access to only these two fields
◆ Difficult to limit access to part of a flat file
◆ DBMSes provide different views of databases (subsets of
data in the database) to different users
020-Intro: 39HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems
●➤
How can DBMSes handle all those problems?
◆ Data redundancy• Wastes storage space• May cause data inconsistency
◆ Data dependence• Causes proliferation of application programs• Data correctness depends on file formats
◆ Data isolation◆ Atomicity problem◆ Concurrent access anomalies◆ Data access control problem
Let’s see what DBMSes can offer.
020-Intro: 40HKU CSIS0278[AB] 2002-2003
Introduction to Database Systems