Download - Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000
Lecture #3Functional Dependencies
NormalizationRelational Algebra
Thursday, October 12, 2000
Administration
• Homework #1 due today.
• Project descriptions & groups due today.
• Homework #2 available today.
• Exam date is looking like December 7th
– Complaints?
• Projects: tell us if you need to use the lab.
Functional DependenciesDefinition:
If two tuples agree on the attributes
A , A , … A 1 2 n
then they must also agree on the attributes
B , B , … B 1 2 m
Formally:
A , A , … A 1 2 n
B , B , … B 1 2 m
Motivating example for the study of functional dependencies:
Name Social Security Number Phone Number
Examples
• EmpID Name, Phone, Position
• Position Phone
• but Phone Position
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer
In General
• To check A B, erase all other columns
• check if the remaining relation is many-one (called functional in mathematics)
… A … BX1 Y1X2 Y2… …
Example
EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer
More Examples
Product: name price, manufacturerPerson: ssn name, ageCompany: name stock price, president
Key of a relation is a set of attributes that:
- functionally determines all the attributes of the relation - none of its subsets determines all the attributes.
Superkey: a set of attributes that contains a key.
Finding the Keys of a Relation
Given a relation constructed from an E/R diagram, what is its key?
Rules:
1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set.
address name ssn
Person
Rules for Binary Relationships
Several cases are possible for a binary relationship E1 - E2:
1. Many-many: the key includes the key of E1 together with the key of E2.
What happens for:
2. Many-one:
3. One-one:
PersonbuysProduct
name
price name ssn
Keys in Multiway Relationships
If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key.
Purchase
Product
Person
Store
Payment Method
Rules in FD’s
A , A , … A 1 2 n B , B , … B 1 2 m
A , A , … A 1 2 n 1
Is equivalent to
B
A , A , … A 1 2 n 2B
A , A , … A 1 2 n mB
…
Splitting rule and Combing rule
Splitting/Combining Rule:
Rules in FD’s (continued)
A , A , … A 1 2 n iA Always holds.
Trivial Dependency
Why ?
Rules in FD’s (continued)
A , A , … A 1 2 n
Transitive Closure Rule:
B , B , … B1 2 m
A , A , … A 1 2 n
1B , B …, B
2 m
1C , C …, C
2 p
1C , C …, C
2 p
If
and
then
Why ?
Closure of a set of Attributes
Given a set of attributes {A1, …, An} and a set of dependencies S.Problem: find all attributes B such that:
any relation which satisfies S also satisfies:A1, …, An B
The closure of {A1, …, An}, denoted {A1, …, An} ,is the set of all such attributes B
+
Closure Algorithm
Start with X={A1, …, An}.
Repeat until X doesn’t change do:
if is in S, and
C is not in X
then
add C to X.
B , B , … B 1 2 nC
B , B , … B 1 2 n
are all in X, and
ExampleA B CA D E B DA F B
Closure of {A,B}: X = {A, B, }
Closure of {A, F}: X = {A, F, }
Why Is the Algorithm Correct ?
• Show the following by induction:– For every B in X:
• A1, …, An B
• Initially X = {A1, …, An} -- holds• Induction step: B1, …, Bm in X
– Implies A1, …, An B1, …, Bm– We also have B1, …, Bm C– By transitivity we have A1, …, An C
• This shows that the algorithm is sound; need to show it is complete
Relational Schema Design
Main idea:
• Start with some relational schema
• Find out its FD’s
• Use them to design a better relational schema
Relational Schema Design
Name SSN Phone Number
Fred 123-321-99 (201) 555-1234
Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000
Problems:
- redundancy - update anomalies - deletion anomalies
Recall set attributes (persons with several phones):
Note: SSN is NOT a key here
Relation Decomposition
SSN Name
123-321-99 Fred
909-438-44 Joe
SSN Phone Number
123-321-99 (201) 555-1234
123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000
Break the relation into two:
Decompositions in GeneralA , A , … A 1 2 n
Let R be a relation with attributes
Create two relations R1 and R2 with attributes
B , B , … B 1 2 m C , C , … C 1 2 l
Such that:B , B , … B 1 2 m C , C , … C 1 2 l
A , A , … A 1 2 n
And -- R1 is the projection of R on
-- R2 is the projection of R on
B , B , … B 1 2 m
C , C , … C 1 2 l
Incorrect Decomposition
• Sometimes it is incorrect:Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
DoubleClick 29.99 Camera
Decompose on : Name, Category and Price, Category
Incorrect Decomposition
Name Category
Gizmo Gadget
OneClick Camera
DoubleClick Camera
Price Category
19.99 Gadget
24.99 Camera
29.99 Camera
Name Price Category
Gizmo 19.99 Gadget
OneClick 24.99 Camera
OneClick 29.99 Camera
DoubleClick 24.99 Camera
DoubleClick 29.99 Camera
When we put it back:
Cannot recover information
Boyce-Codd Normal Form
A simple condition for removing anomalies from relations:
A relation R is in BCNF if and only if:
Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R.
A , A , … A 1 2 n
BA , A , … A 1 2 n
In English (though a bit vague):
Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
BCNF Decomposition
Find a dependency that violates the BCNF condition:
A , A , … A 1 2 n B , B , … B 1 2 m
A’sOthers B’s
R1 R2
Heuristic: choose B , B , … B “as large as possible”
1 2 m
Decompose:
Find a 2-attribute relation that isnot in BCNF.
Continue untilthere are noBCNF violationsleft.
Example Decomposition
Name SSN Age EyeColor PhoneNumber
Functional dependencies: SSN Name, Age, Eye Color
What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy
Person:
BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber)
Other Example
• R(A,B,C,D) A B, B C
• Key: A, D
• Violations of BCNF: A B, A C, A BC
• Pick A BC: split into R1(A,BC) R2(A,D)
• What happens if we pick A B first ?
Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C)
R1(A,B) R2(A,C)
R’(A,B,C) = R(A,B,C)
R’ is in general larger than R. Must ensure R’ = R
Decomposition Based on BCNF is Necessarily Lossless
Attributes A, B, C. FD: A C
Relations R1(A,B) R2(A,C)
Tuple in R: (a,b,c)Tuples in R1: (a,b), (a,b’)Tuples in R2: (a,c), (a,c’)
Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’)
Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?
ExampleName SSN Phone Number
Fred 123-321-99 (201) 555-1234
Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000
What are the dependencies?
What are the keys?
Is it in BCNF?
And Now?
SSN Name
123-321-99 Fred
909-438-44 Joe
SSN Phone Number
123-321-99 (201) 555-1234
123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000
3NF: A Problem with BCNFUnit Company Product
Unit Company
Unit Product
FD’s: Unit -> Company; Company, Product -> UnitSo, there is a BCNF violation, and we decompose.
Unit Company
No FDs
So What’s the Problem?
Unit Company Product
Unit Company Unit Product
Galaga99 UW Galaga99 databasesBingo UW Bingo databases
No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:
Galaga99 UW databasesBingo UW databases
Violates the dependency: company, product -> unit!
Solution: 3rd Normal Form (3NF)
A simple condition for removing anomalies from relations:
A relation R is in 3rd normal form if and only if:
Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R, or B is part of a key.
A , A , … A 1 2 n
BA , A , … A 1 2 n
What happened to first and second normal forms?
Will we have more normal forms?
Multi-valued Dependencies
SSN Phone Number Course
123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341123-321-99 (206) 432-8954 CSE-444123-321-99 (206) 432-8954 CSE-341
The multi-valued dependencies are:
SSN Phone Number SSN Course
Definition of Multi-valued Dependecy
Given R(A1,…,An,B1,…,Bm,C1,…,Cp)
the MVD A1,…,An B1,…,Bm holds if:
for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp
Definition of MVDs Continued
Equivalently: the decomposition into
R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp)
is lossless
Note: an MVD A1,…,An B1,…,Bm
Implicitly talks about “the other” attributes C1,…Cp
Rules for MVDs
If A1,…An B1,…,Bm
then A1,…,An B1,…,Bm
Other rules in the book
4th Normal Form (4NF)
R is in 4NF if whenever:
A1,…,An B1,…,Bm
is a nontrivial MVD, then A1,…,An is a superkey
Same as BCNF with FDs replaced by MVDs
Confused by Normal Forms ?
3NF
BCNF
4NF
In practice: (1) 3NF is enough, (2) don’t overdo it !
Querying the Database• How do we specify what we want from our
database? Find all the employees who earn more than
$50,000 and pay taxes in New Jersey.• We design high-level query languages:
– SQL (used everywhere)– Datalog (used by database theoreticians, their students,
friends and family)
• Relational algebra: a basic set of operations on relations that provide the basic principles.
Relational Algebra at a Glance• Operators: relations as input, new relation as output • Five basic RA operators:
– Basic Set Operators• union, difference (no intersection, no complement)
– Selection:– Projection: – Cartesian Product: X
• Derived operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)
• When our relations have attribute names:– Renaming:
Set Operations
• Binary operations• Union: all tuples in R1 or R2
– R1 U R2– Example:
• ActiveEmployees U RetiredEmployees
• Difference: all tuples in R1 and not in R2– R1 – R2– Example
• AllEmployees - RetiredEmployees
Selection
• Unary operation: returns a subset of the tuples which satisfy some condition
• Notation: (R)
• c is a condition:– =, <, >, and, or, not
• Find all employees with salary more than $40,000:– (Employee)
c
Salary > 40000
Selection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name DepartmentID Salary888888888 Alice 2 45,000
Find all employees with salary more than $40,000.
Projection
• Unary operation: returns certain columns
• Eliminates duplicate tuples !
• Notation: (R)
• Example: project social-security number and names:– (Employee)
A1,…,An
SSN, Name
Projection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name999999999 John777777777 Tony888888888 Alice
Cartesian Product
• Binary Operation• Result is tuples combining any element of
R1 with any element of R2, for R1 X R2• Schema is union of Schema(R1) &
Schema(R2)• Notation: R1 x R2• Example: Employee x Dependents• Very rare in practice; but joins are very
common.
Cartesion Product Example
EmployeeName SSNJohn 999999999Tony 777777777
DependentsEmployeeSSN Dname999999999 Emily777777777 Joe
Employee_DependentsName SSN EmployeeSSN DnameJohn 999999999 999999999 EmilyJohn 999999999 777777777 JoeTony 777777777 999999999 EmilyTony 777777777 777777777 Joe
Join (Natural)• Most important, expensive and exciting.
• Combines two relations, selecting only related tuples
• Equivalent to a cross product followed by selection
• Resulting schema has all attributes of the two relations, but one copy of join condition attributes
Join Example
EmployeeName SSNJohn 999999999Tony 777777777
DependentsEmployeeSSN Dname999999999 Emily777777777 Joe
Employee_DependentsName SSN DnameJohn 999999999 EmilyTony 777777777 Joe
Other Joins and Renaming
• Theta join: the join involves a predicate– R S
• Semi-join: the attributes of one relation are included in the other.
• Renaming:
Complex Queries
Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)
Find phone numbers of people who bought gizmos from Fred.
Find telephony products that somebody bought
Exercises
Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)
Ex #1: Find people who bought telephony products.Ex #2: Find names of people who bought American productsEx #3: Find names of people who bought American products and did not buy French productsEx #4: Find names of people who bought American products and they live in Seattle.Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.
Operations on Bags (and why we care)
Basic operations:
Projection Selection Union Intersection Set difference Cartesian product
Join (natural join, theta join)