relational database design good database design principles
TRANSCRIPT
Relational Database Design 1
RELATIONAL DATABASE DESIGN
Basic ConceptsBasic Concepts• a databasedatabase is an collection of logically related records• a relational databaserelational database stores its data in 2-dimensional
tables• a tabletable is a two-dimensional structure made up
of rows (tuples, records) and columns (attributes, fields)• example: a table of students engaged in sports activities,
where a student is allowed to participate in at most one activity
• each row is unique and stores data about one entity• row order is unimportant• each column has a unique attribute nameattribute name• each column (attribute) description (metadata) is stored in
the database• Access metadata is stored and manipulated via the Table Design View grid
• column order is unimportant• all entries in a column have the same data type
•Access examples: Text(50), Number(Integer), Date/Time
• each cell contains atomic data: no lists or sub-tables
Table CharacteristicsTable Characteristics
StudentID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50
Relational Database Design 2
RELATIONAL DATABASE DESIGN
Primary KeysPrimary Keys• a primary keyprimary key is an attribute or a collection of attributes
whose value(s) uniquely identify each row in a relation• a primary key should be minimal: it should not contain unnecessary attributes
• we assume that a student is allowed to participate in atmost one activity
• the only possible primary key in the above table is StudentIDStudentID•• Sometimes there is more than one possible choice; each possible choice is called a candidate key• what if we allow the students to participate in more than
one activity?
StudentID Activity Fee 100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65
• now the only possible primary key is the combined value of (StudentIDStudentID, ActivityActivity), • such a multi-attribute primary key is called a composite keyor concatenated key
StudentID Activity Fee 100 Skiing 200 150 Swimming 50 175 Squash 50 200 Swimming 50
Relational Database Design 3
RELATIONAL DATABASE DESIGN
Composite KeysComposite Keys• a table can only have one primary key• but sometimes the primary key can be made up of several fields• concatenation means putting two things next to one another: the concatenation of “burger” and “foo” is “burgerfoo”.• consider the following table of cars
• LicensePlateLicensePlate is not a possible primary key, because two different cars can have the same license plate number if they’re from different states • but if we concatenate LicensePlateLicensePlate and StateState, the resulting value of ((LicensePlateLicensePlate, State), State) must be unique:
•• example: example: ““LVR120NJLVR120NJ”” and and ““LVR120CTLVR120CT””•• therefore, ((LicensePlateLicensePlate, State) is a possible primary key, State) is a possible primary key(a candidate key)• Sometimes we may invent a new attribute to serve as a primary key (sometimes called a synthetic key)
• if no suitable primary key is available• or, to avoid composite keys• in Access, “Autonumber” fields can serve this purpose
LicensePlate State Make Model Year LVR120 NJ Honda Accord 2003 BCX50P NJ Buick Regal 1998 LVR120 CT Toyota Corolla 2002 908HYY MA Ford Windstar 2001 UHP33X NJ Nissan Altima 2006
Relational Database Design 4
RELATIONAL DATABASE DESIGN
Foreign KeysForeign Keys• a foreign key is an attribute or a collection of attributes whose value are intended to match the primary key of some related record (usually in a different table)• example: the STATE and CITY table below
STATE table:
State Abbrev
StateName
Union Order
StateBird
State Population
CT Connecticut 5 American robin 3,287,116 MI Michigan 26 robin 9,295,297 SD South Dakota 40 pheasant 696,004 TN Tennessee 16 mocking bird 4,877,185 TX Texas 28 mocking bird 16,986,510 CITY table:
State Abbrev
CityName
City Population
CT Hartford 139,739 CT Madison 14,031 CT Portland 8,418 MI Lansing 127,321 SD Madison 6,257 SD Pierre 12,906 TN Nashville 488,374 TX Austin 465,622 TX Portland 12,224
• primary key in STATE relation: StateAbbrevStateAbbrev• primary key in CITY relation: (StateAbbrevStateAbbrev, CityNameCityName)• foreign key in CITY relation: StateAbbrevStateAbbrev
Relational Database Design 5
RELATIONAL DATABASE DESIGN
Outline NotationOutline NotationSTATE(StateAbbrev, StateName, UnionOrder,
StateBird, StatePopulation)CITY(StateAbbrev, CityName, CityPopulation)
StateAbbrev foreign key to STATE
• Underline all parts of each primary key• Note foreign keys with “attribute foreign key to TABLE”
EntityEntity--Relationship DiagramsRelationship Diagrams
• one-to-many relationships: to determine the direction, always start with “one”
• “one city is in one state”• “one state contains many cities”
• the foreign key is always in “the many” – otherwise it could not be atomic (it would have to be a list)• We will study other kinds of relationships (one-to-one and many-to-many) shortly
Relational Database Design 6
RELATIONAL DATABASE DESIGN
Functional DependencyFunctional Dependency• attribute B is functionally dependentfunctionally dependent on attribute A if
given a value of attribute A, there is only one possiblecorresponding value of attribute B
• that is, any two rows with the same value of A must have the same value for B
• attribute A is the determinantdeterminant of attribute B if attribute B is functionally dependent on attribute A
• in the STATE relation above, StateAbbrevStateAbbrev is a determinant of all other attributes
• in the STATE relation, the attribute StateNameStateName is also a determinant of all other attributes
• so, StateAbbrevStateAbbrev and StateNameStateName are both candidate keys for STATE• in the CITY relation above, the attributes (StateAbbrevStateAbbrev, , CityNameCityName)) together are a determinant of the attribute CityPopulationCityPopulation• in the CITY relation, the attribute CityNameCityName is not a
determinant of the attribute CityPopulationCityPopulation because multiple cities in the table may have the same name
Relational Database Design 7
RELATIONAL DATABASE DESIGN
Dependency DiagramsDependency Diagrams• a dependency diagram or bubble diagram is a pictorial representation of functional dependencies
• an attribute is represented by an oval• you draw an arrow from A to B when attribute A
is a determinant of attribute B• example: when students were only allowed one sports activity, we have ACTIVITY(StudentID, Activity, Fee)
• example: when students can have multiple activities, we have ACTIVITY(StudentID, Activity, Fee)
StudentID
Fee
Activity
Relational Database Design 8
RELATIONAL DATABASE DESIGN
• a partial dependencypartial dependency is a functional dependency whose determinant is part of the primary key (but not all of it)
• example: ACTIVITY(StudentID, Activity, Fee)
Partial DependenciesPartial Dependencies
Transitive DependenciesTransitive Dependencies• a transitive dependennsitive dependency is a functional dependency whose determinant is not the primary key, part of the primary key, or a candidate key• example: ACTIVITY(StudentID, Activity, Fee)
StudentID
Fee
Activity
Relational Database Design 9
RELATIONAL DATABASE DESIGN
Database AnomaliesDatabase Anomalies• anomaliesanomalies are problems caused by bad database design
example: ACTIVITY(StudentID, Activity, Fee) StudentID Activity Fee
100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65
• an insertion anomalyinsertion anomaly occurs when a row cannot be added to a relation, because not all data are available (or one has to invent “dummy” data)
• example: we want to store that scuba diving costs $175, but have no place to put this information until a student takes up scuba-diving (unless we create a fake student)
• a deletion anomalydeletion anomaly occurs when data is deleted from a relation, and other critical data are unintentionally lost
• example: if we delete the record with StudentID = 100, we forget that skiing costs $200
• an update anomalyupdate anomaly occurs when one must make many changes to reflect the modification of a single datum
• example: if the cost of swimming changes, then all entries with swimming Activity must be changed too
Relational Database Design 10
RELATIONAL DATABASE DESIGN
Cause of AnomaliesCause of Anomalies• anomalies are primarily caused by:
• data redundancy: replication of the same field inmultiple tables, other than foreign keys
• Functional dependencies whose determinants are not candidate keys, including
• partial dependency• transitive dependency
• example: ACTIVITY(StudentID, Activity, Fee) StudentID Activity Fee
100 Skiing 200 100 Golf 65 175 Squash 50 175 Swimming 50 200 Swimming 50 200 Golf 65
• ActivityActivity by itself is not a candidate key, so we get anomalies (in this case, from a partial dependency)
StudentID
Fee
Activity
Relational Database Design 11
RELATIONAL DATABASE DESIGN
Fixing Anomalies (Normalizing)Fixing Anomalies (Normalizing)• Break up tables so all dependencies are from primary (or candidate) keys
PARTICIPATING(StudentID, Activity)Activity foreign key to ACTIVITIES
ACTIVITY(Activity, Fee)
StudentID Activity Activity Fees 100 Skiing Skiing 200 100 Golf Golf 65 150 Swimming Swimming 50 175 Squash Squash 50 175 Swimming ScubaDiving 200 200 Swimming200 Golf
Relational Database Design 12
RELATIONAL DATABASE DESIGN
• the above relations do not have any of the anomalies• we can add the cost of diving in ACTIVITIES
even though no one has taken it in STUDENTS• if StudentIDStudentID 100 drops Skiing, no skiing-related data
will be lost• if the cost of swimming changes, that cost need
only be changed in one place only (the ACTIVITIES table)
• the ActivityActivity field is in both tables, but that’s needed to relate (“join”) the information in the two tables
StudentID Activity Activity Fees 100 Skiing Skiing 200 100 Golf Golf 65 150 Swimming Swimming 50 175 Squash Squash 50 175 Swimming ScubaDiving 200 200 Swimming200 Golf
Relational Database Design 13
RELATIONAL DATABASE DESIGN
Good Database Design PrinciplesGood Database Design Principles1. no redundancyno redundancy
• a field is stored in only one table, unless it happens tobe a foreign key
• replication of foreign keys is permissible, because they allow two tables to be joined together
2. no no ““badbad”” dependenciesdependencies• in the dependency diagram of any relation in the database, the determinant should be the whole primary key, or a candidate key. Violations of this rule include:
• partial dependencies• transitive dependencies
normalizationnormalization is the process of eliminating “bad”dependencies by splitting up tables and linking them with foreign keys
• “normal forms” are categories that classify how completely a table has been normalized• there are six recognized normal forms (NF):
First Normal Form (1NF)Second Normal Form (2NF)Third Normal Form (3NF)Boyce-Codd Normal Form (BCNF)Fourth Normal Form (4NF)Fifth Normal Form (5NF)
Relational Database Design 14
RELATIONAL DATABASE DESIGN
First Normal FormFirst Normal Form• a table is said to be in the first normal form (1NF)first normal form (1NF)if all its attributes are atomic. Attributes that are not atomic go by the names
• Nested relations, nested tables, or sub-tables• Repeating groups or repeating sections• List-valued attributes
• example of a table that is not in first normal form:
ClientID
Client Name VetID VetName PetID PetName PetType
2173 Barbara Hennessey 27 PetVet 123
SamHooberTom
BirdDogHamster
4519 Vernon Noordsy 31 PetCare 2 Charlie Cat8005 Sandra Amidon 27 PetVet 1
2BeeferKirby
DogCat
8112 Helen Wandzell 24 PetsRUs 3 Kirby Dog
CLIENT(ClientD, ClientName, VetID, VetName, PET(PetID, PetName, PetType) )
• This kind of nested or hierarchical form is a very natural way for people to think about or view data.• However, the relational database philosophy claims that it may not be a very good way for computers to store some kinds of data. • Over the years, a lot of information systems have stored data in this kind of format – but they were not relationaldatabases
Relational Database Design 15
RELATIONAL DATABASE DESIGN
• In order to eliminate the nested relation, pull out the nested relation and form a new table• Be sure to include the old key in the new table so that you can connect the tables back together.
CLIENT(ClientD, ClientName, VetID, VetName)PET(ClientID, PetID, PetName, PetType)
ClientID foreign key to CLIENT
ClientName
VetName
PetName
PetID
CLIENT
PET
VetID
PetType
ClientID
ClientID
• In this particular example, note that PetIDPetID is only unique within sets of pets with the same owner.
Relational Database Design 16
RELATIONAL DATABASE DESIGN
Second Normal Form
StudentID
Fee
Activity
• Recall: a partial dependency occurs when• You have a composite primary key• A non-key attribute depends on part of the primary key, but not all of it
• A table in 1NF is said to be in the second normal form second normal form (2NF)(2NF) if it does not contain any partial dependencies. • Example of a partial dependency:
ACTIVITY(StudentID, Activity, Fee) on pages 6, 7, and 9
• Our new CLIENT-PET database does not have any partial dependencies• So, it already in second normal form • But it still has a transitive dependency :
ClientName
VetNameVetIDClientID
Relational Database Design 17
RELATIONAL DATABASE DESIGN
Third Normal FormThird Normal Form• Recall: a transitive dependency happens when a non-key attribute depends on another non-key attribute, and that attribute could not have been used as an alternative primary key (or the same thing for a composition of several attributes).• A table of 2NF is said to be in the third normal form (3NF)third normal form (3NF) if it does not contain any transitive dependencies, • In order to eliminate transitive dependency, we split the CLIENTS table again:
CLIENTS(ClientID, ClientName, VetID)VetID foreign key to VET
PETS(ClientID, PetID, PetName, PetType)ClientID foreign key to CLIENT
VETS(VetID, VetName)
ClientID
ClientName
VetName
PetName
PetID
CLIENT
PET
VetID
PetType
VET
ClientID
VetID
Relational Database Design 18
RELATIONAL DATABASE DESIGN
Third Normal Form (Cont.)Third Normal Form (Cont.)• CLIENTS-PETS-VETS database in third normal form:
VetID VetName27 PetVet31 PetCare24 PetsRUs
Client ID
Client Name
VetID
2173 Barbara Hennessey 27 4519 Vernon Noordsy 31 8005 Sandra Amidon 27 8112 Helen Wandzell 24
Client ID
PetID PetName PetType
2173 1 Sam Bird 2173 2 Hoober Dog 2173 3 Tom Hamster 4519 2 Charlie Cat 8005 1 Beefer Dog 8005 2 Kirby Cat 8112 3 Kirby Dog
• the database consists of three types of entities, stored as distinct relations in separate tables:
• clients (CLIENTS)• pets ( PETS)• vets (VETS)
• there is no redundancy (only foreign keys are replicated)• there are no partial and transitive dependencies
with MS Access table relationships
Relational Database Design 19
RELATIONAL DATABASE DESIGN
Normal Forms and NormalizationNormal Forms and Normalization
• The distinctions between third normal form (3NF), Boyce-Codd normal form (BCNF), fourth normal form (4NF), and fifth normal form (5NF) are subtle.• They have to do with overlapping sets of attributes that could be used as primary keys (composite candidate keys).• For our purposes, it’s enough to know about 3NF.
• You need to be able to put a database in 3NF.• That is more important than recognizing 1NF and 2NF
• Key factors to recognize 3NF:
• All attributes atomic – gives you 1NF.
• Every determinant in every relationship is the whole primary key (or could have been chosen as an alternative primary key) – guarantees no partial or transitive dependencies.
• Redesigning a database so it’s in 3NF is called normalization.
Relational Database Design 20
RELATIONAL DATABASE DESIGN
Example With Multiple Candidate KeysExample With Multiple Candidate Keys
• The dependencies SocialSecuritySocialSecurity## → GenderGender and SocialSecuritySocialSecurity## → BirthDateBirthDate are not considered transitive because we could have chosenSocialSecuritySocialSecurity## as the primary key for the table.• This kind of design will not give rise to anomalies.
DRIVER(License#, SocialSecurity#, Gender, BirthDate)
License# SocialSecurity#
Gender
BirthDate
Relational Database Design 21
RELATIONAL DATABASE DESIGN
Normalization Example: Hardware Store Normalization Example: Hardware Store DatabaseDatabase
• the ORDERS table :
Order Numb
Cust Code
Order Date
Cust Name
ProdDescr Prod Price
Quantity
10001 5217 11/22/94 Williams Hammer $8.99 2 10001 5217 11/22/94 Williams Screwdriver $4.45 1 10002 5021 11/22/94 Johnson Clipper $18.22 1 10002 5021 11/22/94 Johnson Screwdriver $4.45 3 10002 5021 11/22/94 Johnson Crowbar $11.07 1 10002 5021 11/22/94 Johnson Saw $14.99 1 10003 4118 11/22/94 Lorenzo Hammer $8.99 1 10004 6002 11/22/94 Kopiusko Saw $14.99 1 10004 6002 11/22/94 Kopiusko Screwdriver $4.45 2 10005 5021 11/23/94 Johnson Cordlessdrill $34.95 1
• Note: in practice, we would also want to have product codes as well as descriptions, and use the product codes as keys to identify products. Here, we’ll identify products by their ProdDescr to keep the number of fields down.
Relational Database Design 22
RELATIONAL DATABASE DESIGN
Example: Hardware Store Database (Cont.)Example: Hardware Store Database (Cont.)ORDERS(OrderNum, ProdDescr,
CustCode, OrderDate, CustName, ProdPrice, Quantity)
• Conversion of the hardware store database to 2NFQUANTITY(OrderNum, ProdDescr, Quantity)
OrderNum foreign key to ORDERSProdDescr foreign key to PRODUCTS
PRODUCTS(ProdDescr, ProdPrice)ORDERS(OrderNum, CustCode, OrderDate, CustName)
Quantity
OrderNum
ProdDescr
ProdDescr ProdPrice
OrderDate
CustNameCustCode
Transitive
OrderNum
Relational Database Design 23
RELATIONAL DATABASE DESIGN
Example: Hardware Store Database (Cont.)Example: Hardware Store Database (Cont.)• conversion of the ORDERS relation to 3NF
QUANTITY(OrderNum, ProdDescr, Quantity)OrderNum foreign key to ORDERSProdDescr foreign key to PRODUCTS
PRODUCTS(ProdDescr, ProdPrice)ORDERS(OrderNum, CustCode, OrderDate)
CustCode foreign key to CUSTOMERSCUSTOMERS(CustCode, CustName)
Quantity
OrderNum
ProdDescr
ProdDescr ProdPrice
OrderDate
CustName
CustCode
OrderNum CustCode
Relational Database Design 24
RELATIONAL DATABASE DESIGN
CustomerID
Phone LastName
FirstName
Address City State ZipCode
1 502-666-7777 Johnson Martha 125 Main St. Alvaton KY 421222 502-888-6464 Smith Jack 873 Elm St. Bowling
GreenKY 42101
3 502-777-7575 Washington Elroy 95 Easy St. Smith’sGrove
KY 42171
4 502-333-9494 Adams Samuel 746 Brown Dr. Alvation KY 421225 502-474-4746 Steinmetz Susan 15 Speedway Dr. Portland TN 37148….. ……. …… …… …… ….. ….. …..
Trans ID
Rent Date
Customer ID
Video ID
Copy# Title Rent
1 4/18/95 3 1 2 2001:SpaceOdyssey $1.50 1 4/18/95 3 6 3 Clockwork Orange $1.50 2 4/18/95 7 8 1 Hopscotch $1.50 2 4/18/95 7 2 1 Apocalypse Now $2.00 2 4/18/95 7 6 1 Clockwork Orange $1.50 3 4/18/95 8 9 1 Luggage of the Gods $2.50 ….. ……. …… …… …… ….. …..
• a customer can rent multiple videos as part of the same transaction
• multiple copies of the same video exist• the copy#copy# field stores the number of the copy – unique only with copies of that same video• one customer cannot rent two copies of the same video at the same time
• although it has two tables, the database still contains some anomalies
Example: Video Store DatabaseExample: Video Store Database• the CUSTOMER relation:
• the RENTALFORM relation:
Relational Database Design 25
RELATIONAL DATABASE DESIGN
Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)• relations for the video store database
• CUSTOMER(CustomerID, Phone, Name, Address, City, State, ZipCode)
• RENTALFORM(TransID, RentDate, CustomerID, VideoID, Copy#, Title, Rent)
• dependency diagram for the video store database
Copy#
RentTitle
Phone Name Address
City State Zip
Customer ID
RentDate Customer ID
VideoID
TransID
Relational Database Design 26
RELATIONAL DATABASE DESIGN
Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)• video store database after eliminating partial and transitive dependencies
CUSTOMER(CustomerID, Phone, Name, Address, City, State, ZipCode)
RENTAL(TransID, RentDate, CustomerID)CustomerID foreign key to CUSTOMER
VIDEO(VideoID, Title, Rent)VIDEOSRENTED(TransID, VideoID, Copy#)
TransID foreign key to RENTALVideoID foreign key to VIDEO
Copy#
Phone Name Address
City State Zip
Customer ID
VideoID
RentDate
Customer ID
RentTitleVideoID
TransID
TransID
Relational Database Design 27
RELATIONAL DATABASE DESIGN
Example: Video Store Database (Cont.)Example: Video Store Database (Cont.)
• table relationships for the video store database
Relational Database Design 28
RELATIONAL DATABASE DESIGN
Summary of Guidelines for Database DesignSummary of Guidelines for Database Design• identify the entities involved in the database• identify the fields relevant for each entity and define the
corresponding relations• determine the primary key of each relation• avoid data redundancy, but have some common fields so
that tables can be joined together• ensure that all the required database processing can be
done using the defined relations• normalize the relations by splitting them into smaller ones