cs 564 database management systems: design and implementation lecture 2: relational model; er to...

56
CS 564 Database Management Systems: Design and Implementation Lecture 2: Relational Model; ER to Relational Chapter 3 in Cow Book Slide ACKs: AnHai Doan, Jeff Naughton, and Jignesh Patel Arun Kumar 1

Upload: russell-copeland

Post on 04-Jan-2016

234 views

Category:

Documents


0 download

TRANSCRIPT

1

CS 564Database Management Systems: Design and Implementation

Lecture 2: Relational Model; ER to Relational

Chapter 3 in Cow Book

Slide ACKs: AnHai Doan, Jeff Naughton, and Jignesh Patel

Arun Kumar

2

@Wait list students:

You must have gotten an invite to enroll

to Sec 2. If not, email me before EOD!

3

General Dos and Do NOTs

Do: Raise your hand before asking questions during Lectures Participate in class discussions and use our Piazza page Use “[CS564]” as subject prefix for all emails to me/TAs

Do NOT: Take this class if you cannot attend on Fridays also Use laptops, tablets, mobile phones, or any other electronic

devices during Lectures Use email as primary communication mechanism for

doubts/questions instead of Office Hours and Discussions Record/quote my anecdotes outside of class!

4

Hands up if you have watched and

rated movies on Netflix!

5

Surprise ER Review Exercise!

Cool. Now, please draw a full-fledged ER diagram for Netflix’s

“movie recommendation system” with two Entities: Users and

Movies, and one Relationship: Ratings.

Attach the following attributes appropriately (reuse allowed):

UserID, MovieID, RatingID, NumStars, Name, Timestamp, Age,

Director, ReleaseDate, JoinDate

6

Review Exercise Answer

User

Name

AgeUserID

Movie

NameDirectorMovieID

Rating

TimestampRatingID

NumStars

JoinDate ReleaseDate

7

Database Design Process: ~6 steps

1. Requirements Analysis

2. Conceptual Database Design

3. Logical Database Design

4. Schema Refinement

5. Physical Database Design

6. Application and Security Design

EntityRelationshipModeling

RelationalModel andNormalization

Indexing, etc.

8

Relational data model in a nutshell

Basically, Relation:Table :: Pilot:Driver (okay, a bit more)

The model formalizes “operations” to manipulate relations

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 19 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

Ratings

9

Ugh! Another data model! What is the

difference with the ER Model?

10

ER Model vs. Relational Model

Key purposeEase of capturing user app requirements vs.

Ease of (semi-)automated management by computer

Concepts and structureMany concepts in a rich, complex graph vs.

Single, simple, “flat” concept: “relation”

Data management functionalityNo notion of “operations” vs. Rich “algebra” of relational operations

11

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

What is a Relation?

A glorified table!

What are Attributes?

These things

What are Domains?

The mathematical “domains” for the attributes

Integers Real …

What is Arity?

Number of attributes

Ratings

12

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

What are Tuples?What is Cardinality?

These thingsNumber of tuples

Ratings

13

Referring to “tuples”: Two notations

1. Without using attribute names (positional/sequence)

2. Using attribute names (named/set)

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

A tuple

Ratings (R)

t[1] = 3.5

t.NumStars = 3.5

14

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

What is Schema?

The relation name, and the name and logical

descriptions of the attributes (including domains)

Aka “metadata”

Ratings

15

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

2 4.0 07/20/15 4232 293

3 2.5 08/02/15 54551 846

… … … … …

What is an Instance?

A given relation populated with a set of tuples

(loose analogy: schema:instance::type:value in PL)

Instance 1

RatingID NumStars Timestamp UserID MovieID

3292 1.5 06/27/14 794 10

294122 4.0 07/10/14 232 329

74423 0.5 03/08/14 8451 846

… … … … …

Instance 2

Ratings

16

Relational Model: Basic Terms

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

… … … … …

What is a Relational Database?

UserID Name Age JoinDate

79 Alice 23 01/10/13

… … …

MovieID Name ReleaseDate Director

20 Inception 07/13/2010 Christopher Nolan

… … …

A collection of relations; similarly, schema vs. instance

Ratings

Users

Movies

17

Wait a minute! Did we not see this

schema earlier (the Netflix question)?

18

Spot six differences!

User

Name

AgeUserID

Movie

NameDirectorMovieID

Rating

TimestampRatingID

NumStars

JoinDate ReleaseDate

19

Spot six differences!

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

… … … … …

UserID Name Age JoinDate

79 Alice 23 01/10/13

… … …

MovieID Name ReleaseDate Director

20 Inception 07/13/2010 Christopher Nolan

… … …

Ratings

Users

Movies

20

ER Model vs. Relational Model

Key purposeEase of capturing user app requirements vs.

Ease of (semi-)automated management by computer

Concepts and structureMany concepts in a rich, complex graph vs.

Single, simple, “flat” concept: “relation”

Data management functionalityNo notion of “operations” vs. Rich “algebra” of relational operations

21

“Write Operations” on a Relation

Insert

Add tuples to a relation

Delete

Remove tuples from a relation (typically based on

“predicate” matches, e.g., “NumStars <= 4.5”

Modify

Logically, deletes + inserts, but typically

implemented as in-place updates to a relation instance

22

“Read Operations” on a Relation

“Select”

Select all tuples from Ratings with “UserID == 19”

“Project”

Select only Director attribute from Movies

“Aggregate”

Select Average of all NumStars in Ratings

And a few more formal operations …

(Sneak peak: SQL to express both the write/read ops!)

23

Bottomline:

ER model is for conceptual schema

modeling; no notion of operations

Relational model includes operations on

data; implementable as fast software

24

So, how do we go from ER model to a

relational model in an application?

25

ER to Relational: Entity as a Table

User

Name

AgeUserID

JoinDate

Movie

NameDirectorMovieID

Rating

TimestampRatingID

NumStars

ReleaseYear

UserID Name Age JoinDate

79 Alice 23 01/10/13

… … …

Users

Entity Name → Relation Name

Attribute Names → Attribute Names

Entity Set → Relation Instance (Tuples)

26

ER to Relational: Key Constraint

UserID uniquely identifies a User

Underline it in the relation too!

“Primary Key”

User

Name

AgeUserID

JoinDate

UserID Name Age JoinDate

79 Alice 23 01/10/13

… … …

Users

27

ER to Relational: More Examples

MovieID Name ReleaseDate Director

20 Inception 07/13/2010 Christopher Nolan

… … …

Movies

Movie

NameDirectorMovieID

ReleaseDate

28

It is nice pictorially, but how do we

define/create a relation precisely?

29

Introducing SQL

Structured English QUEry Language (SEQUEL);

TL;DR name is SQL Invented at - you guessed it - IBM!

30

What is SQL?

SQL is a querying language for relational data Simple English-based syntax, but precise, formal

semantics (compiled down to relational algebra) Key advantages:

Physical Data Independence (“how” data is

stored on machine independent of “what”, i.e., SQL

queries)

Logical Data Independence (notion of views in

SQL enables simpler queries on same schema)

31

Major SQL Components

Data Definition Language (DDL)

Data Manipulation Language (DML)

Embedded and Dynamic SQL

Triggers and Cursors

Security

Transaction Management

Remote Database Access

32

Creating a table for an Entity in SQL

MovieID Name ReleaseDate Director

20 Inception 07/13/2010 Christopher Nolan

… … …

Movies

CREATE TABLE Movies (

MovieID INTEGER,

Name CHAR(30),

ReleaseDate DATE,

Director CHAR(20),

PRIMARY KEY (MovieID))

33

Relationship as a Relation?

User

Name

AgeUserID

Movie

NameDirectorMovieID

Rating

TimestampRatingID

NumStars

JoinDate ReleaseYear

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

… … … … …

“Foreign Keys”Ratings

34

Table for Relationships in SQL

CREATE TABLE Ratings( RatingID INTEGER, Numstars REAL, Timestamp DATE, UserID INTEGER, MovieID INTEGER, PRIMARY KEY (RatingID), FOREIGN KEY (UserID) REFERENCES Users(UserID), FOREIGN KEY (MovieID) REFERENCES Movies(MovieID))

RatingID NumStars Timestamp UserID MovieID

1 3.5 08/27/15 79 20

… … … … …

Ratings

35

Q: Why not this?

UserID

Name

Age

JoinDate

RatingID

NumStars

Timestamp

MovieID

Name ReleaseYEar

Director

79 Alice 23 01/10/13

1 3.5 08/27/15

20 Inception

2010 Christopher

Nolan… … … … … … … … … … …

AllStuff

(Sneak peak: Redundancy in the data!

Waste of storage; causes write anomalies!

Mitigated by Schema normalization)

36

How to represent self-relationships?

Mention

Movie

NameDirectorMovieID

ReleaseYear

MovieID MentionMovieID

20 313

… …

Mention

2 Foreign Keys

Q: How to express in SQL?

Q: What is the primary key?

37

NULL Values in Relations

User

Name

AgeUserID

JoinDate

(Sneak peak: A headache for SQL query processing!)

UserID Name Age JoinDate

79 Alice 23 01/10/13

48 John NULL 04/08/15

… … …

Users

NULL “stands in” for attribute

values that are missing/unknown

SQL has “NOT NULL”, e.g., “Age REAL NOT NULL”

38

What about many-to-one?

Student

Name AgeSID

Department

Name AddressDID

Major

SID Name Age

79 Alice 19

13 Bob 21

48 John NULL

… … …

StudentsDID Name Addre

ss

CS Computer Sciences

1210 …

ST Statistics Blah

… … …

Department

SID DID

79 CS

48 ST

… …

Major

39

What about many-to-one?

Student

Name AgeSID

Department

Name AddressDID

Major

SID Name Age MajorDID

79 Alice 19 CS

13 Bob 21 NULL

48 John NULL ST

… … … …

Students Foreign Key?DID Name Addre

ss

CS Computer Sciences

1210 …

ST Statistics Blah

… … …

Department

40

More Advanced Stuff

Integrity constraints

Key constraints

Referential integrity and participation constraints

Weak entity set

“Is A” hierarchy

41

Integrity Constraint (IC)

A logical condition (invariant) that must hold true

on any instance of a given database schema

A legal relation instance satisfied all ICs

Overuse/abuse of ICs is a danger!

Part of schema; cannot infer from data exactly!

Two main types:

Key Constraint

Referential Integrity Constraint

42

Key Constraint in SQL

Key vs. Superkey

Primary key vs. Candidate key vs. Alternate key

MovieID Name ReleaseDate Director IMDB_URLMovies

CREATE TABLE Movies (MovieID INTEGER,Name CHAR(30),ReleaseDate DATE,Director CHAR(20),IMDB_URL CHAR(20),PRIMARY KEY (MovieID),UNIQUE (IMDB_URL))

43

Referential Integrity Constraint (RIC)

A Foreign Key value should not be NULL!

Student

Name AgeSID

Department

Name AddressDID

Major

SID Name Age MajorDID

79 Alice 19 CS

13 Bob 21 NULL

48 John NULL ST

… … … …

Students Foreign Key?DID Name Addre

ss

CS Computer Sciences

1210 …

ST Statistics Blah

… … …

Department

44

What if a Department tuple is

deleted?!

45

Enforcing RIC

We have 3 options:

Refuse to allow the deletion! Delete all tuples in Students that reference the

deleted DID in Department

Set the corresponding DID in Students to some

default value, or in the worst case, NULL

46

Enforcing RIC in SQL

Refuse to allow the deletion!

CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)

ON DELETE NO ACTION)

47

Enforcing RIC in SQL

Delete all tuples in Students that reference the

deleted DID in Department

CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)

ON DELETE CASCADE)

48

Enforcing RIC in SQL

Set the corresponding DID in Students to some

default value, or in the worst case, NULL

CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)

ON DELETE SET DEFAULT)

49

Participation Constraint in SQL

Student

Name AgeSID

Department

Name AddressDID

Major

CREATE TABLE Student( SID INTEGER, Name CHAR(30), Age INTEGER, DID INTEGER NOT NULL, PRIMARY KEY (SID), FOREIGN KEY (DID) REFERENCES Department(DID)

ON DELETE NO ACTION)

50

Translating a Weak Entity Set

Floor

Number NumRooms

Department

Name AddressDID

PartOf

Number NumRooms DID

1 14 CS

2 12 CS

… … …

FloorDID Name Addre

ss

CS Computer Sciences

1210 …

ST Statistics Blah

… … …

Department

Q. What ICs needed on Floor?

51

Translating an “Is A” Hierarchy

Student

Name AgeSID

Undergrad DoctoralMasters

IsAIsHonors QualScore

ByThesis

52

Translating an “Is A” Hierarchy

We have 3 options:

“OOPL Approach”: separate relations for each

Entity Set; an entity present in exactly one

“TrueER Approach”: relations for “subclasses”

have foreign key to parent

“Truly Terrible Approach”: AllStuff with NULL!

53

Translating an “Is A” Hierarchy

“OOPL Approach”: separate relations for each

Entity Set; an entity present in exactly one

SID Name AgeStudents

SID Name Age IsHonorsUndergrad

SID Name Age IsThesisMasters

SID Name Age QualScoreDoctoral

54

Translating an “Is A” Hierarchy

“TrueER Approach”: relations for “subclasses”

have foreign key to parent

SID Name AgeStudents

SID IsHonorsUndergrad

SID IsThesisMasters

SID QualScoreDoctoral

Q. How do the ICs here differ with OOPL?

55

Translating an “Is A” Hierarchy

“Truly Terrible Approach”: AllStuff with NULL!

AllStuffSID Name Age IsHonors IsThesis QualScore

Q. How do the ICs here differ with the other 2?

NULL is awful!

56

Review: Relational; ER to Relational

1. Basic Terms of Relational Model

2. Introduction to SQL; CREATE TABLE

3. Translating ER to Relational

4. Integrity Constraints

5. Weak Entities; “IsA” Hierarchies