DatabasesDatabasesDatabasesDatabases
INTD 388 – Spring 2008INTD 388 – Spring 2008
by A.Guercio and E. Walker
Definitions • A database is a collection of related data
• A database can be of any size and of varying complexity
• A database management system (DBMS) is a collection of programs that enables users to define, create and maintain a database
DatabaseNomenclature
Data Mart: A searchable databasesystem organized according to the user's needs. It has data that focus onspecific to a particular research project
Data Repository: A database used as an information storage facility, with minimal analysis or querying functionality.Used to mirror existing databasefor concurrent access to data.
Data Warehouse: A central database,frequently large, that can provide authenticated researchers with access to all of an institution's information. It is populated of data from a variety of non-compatible sources.
Data Life Cycle
Data Dictionary: Defines data format standards.Ex. Date of Birth: MM/DD/YYis tranformed to the standardData of Birth: DD/MM/YY
Creation & Acquisition:collection and insertionof data in the system
Use: Many, including simulation,statistical analysis, visualization,communication. Of course private data(i.e. not releasable) should be protected at this stage.
Archiving: Making data available for future use.Issues: Space Requirements, Completeness of Data, Longevity, Security, Archival Process, …
Repurposing: When data are archived, available data can be used for different purposes.Ex. Linear sequences captured to discover new genes are commonly repurposed to support 3D visualization of protein structures
Data Life Cycle
Centralized: Concentrates all organizational activity in one location.Disadvantages: When the central system crashes all the data are not available. Distributed: Supports a
variety of hardware and software and uses physically disparate storage media, each computationally independentDisadvantages: We need to develop interfaces between databases (n DB require 2n-1 interfaces)
Database Architecture
What is a Database System?
• Computerized System for the management of records.– Example: Oracle, Sybase, SQL Server, DB2…
• The user of the system can perform various operations:
•add a new file to the database;•insert new data in an existing file;•search for data;•update data;•remove data; remove file.
Advantages of Database Systems
• Redundancy Control• Data Consistency and Integrity• Data Sharing and Integration • Security• Improved Maintenance• Concurrency (without data loss)• Backup & Recovery services
Disadvantages…• Complexity• Size• Cost (Hardware & Software)• Performance • Conversion• Risk of failure
ExampleBin Wine Producer Year Bottles Ready
2 Chardonnay Buena Vista 88 1 91
3 Chardonnay Louis Martini 89 5 90
21 Fume Blanc Robt. Mondavi
88 4 91
22 Fume Blanc Mirassou 87 2 90
72 Gamay Robt. Mondavi
85 2 90
SearchSearchSearchSearch
SELECT WINE, BIN, PRODUCERSELECT WINE, BIN, PRODUCER
FROM CELLAR WHEREFROM CELLAR WHERE
READY=91;READY=91;
InsertionInsertionInsertionInsertion
INSERT INTO CELLAR INSERT INTO CELLAR
VALUES (S3, ‘PINOT NOIR’,VALUES (S3, ‘PINOT NOIR’,
‘‘SAINTSBURY’, 87, 1, 93);SAINTSBURY’, 87, 1, 93);
UpdateUpdateUpdateUpdate
UPDATE CELLAR UPDATE CELLAR
SET BOTTLES=4SET BOTTLES=4
WHERE BIN=3;WHERE BIN=3;
The Relational Data Model
• The relational model was introduced by E.F. Codd in 1970
• The model is based on the mathematical structure known as the relation – Informally, a relation resembles a table– More precisely, a relation is a set of n-tuples r = {t1, t2, …, tm} (i.e. a set of rows) and there is no order in the rows or in the tuples.Therefore in order to distinguish one row from another you need a unique key
•Ex. of a customer record:S.S. First_Name Last_NameAddress
– S.S. is the unique key that distinguishes each customer record
Parts of a Relational Database
Schema
Table(relation)
name address age
Column*(attribute)
Row (tuple)
* Values of an attribute are chosen from its domain,e.g. the domain of “age” is integers > 0
Relational Model Constraints
• No two tuples are equal, thus some set of attributes must be unique, they are the key– If there are multiple keys in a schema, they are called candidate keys
– One of the candidate keys may be designated as a primary key and be used to identify tuples in the relation
– A foreign key is an attribute in a relation which is the primary key of another relation Ex. Customer(S.S., First_Name, Last_Name, Address)Checking Account(Account_Number, S.S., Balance)
Relational Model Constraints
• The entity integrity constraint states that no primary key value can be null– We don’t store descriptions of entities we can’t identify• Ex. We do not allow to insert a customer without inserting the S.S.
• The referential integrity constraint is used to maintain consistency among the tuples of two relations
• Ex. We do not insert the checking account of a customer for whom there is no data in the Customer table
• The integrity constraints may be displayed diagrammatically by drawing an arrow from the relation with the foreign key to the relation with the primary key
What is good in relational DB?
• Advantage: We have data organized efficiently and in a way that saves space.– Ex. We do need to repeat all the information about the customer who has 500 accounts in our bank!
– Flat files with their independent tables would not allow that.
Flat File (One Table)
Name Class Location Professor
Minnie Mouse
English 101
BC H. Higgins
Minnie Mouse
Psych 240
Gerst. 120 Lucy van Pelt
Minnie Mouse
Accounting 110
Hins. 209 Scrooge McD
Goofy Psych 240
Gerst. 120 Lucy van Pelt
Goofy English 101
BC H. Higgins
Relational Database Has Multiple Tables
Class Location
Prof
English 101
BC H. Higgins
Psych 240
Gerst. 120
Lucy van Pelt
Accounting 110
Hins. 209
Scrooge McD
Name Class
Minnie Mouse
English 101
Minnie Mouse
Psych 240
Minnie Mouse
Accounting 110
Goofy Psych 240
Goofy English 101
Schedule Enrollment
SQL A Relational DB Language
• SQL (Structured Query Language) was developed at IBM as part of an experimental relational system called System R
• Along with QUEL (based on the relational calculus), SQL is one of the earliest DB languages
• SQL includes statements to define, manipulate and interrogate the DB
• Ex: create table Student ( Name char[20], GradYear integer, )
Basic SQL Select statement
• SELECT attribute• FROM table(s)• WHERE conditions
• Examples– Select Class from Enrollment where Name = ‘Minnie Mouse’
– Select Class from Schedule where Location = ‘BC’
More Complex Queries
• Multiple Tables joined together• More conditions in where, joined by and, or, not
– Select Name, Class, Locationfrom Schedule, Enrollmentwhere Name = Goofy and
Schedule.Class = Enrollment.Class
– Select Prof, Namefrom Schedule, Enrollmentwhere Prof = ‘Scrooge McD’ and
Schedule.Class = Enrollment.Class
More WHERE conditions
SELECT staffNO, fName, lName, salaryFROM StaffWHERE…
…salary BETWEEN 20,000 and 30,000…position IN {‘Manager’, ‘Supervisor’}…fName LIKE {‘Al%n’} Allen or Alan or Allison …
…lName LIKE {‘_ _ _ _’} all 4-letter names…mName IS NOT NULL has a middle name
Sorting the Results• List all rooms, sorted by price (most expensive first)SELECT hotelName, roomNo, roomType, priceFROM Hotel, RoomWHERE Hotel.hotelNo = Room.hotelNoORDER BY price DESC;
• Multiple sort keys can be specified, e.g.…ORDER BY lName, fName ASC;