relational model & algebra zachary g. ives university of pennsylvania cis 550 – database &...

34
Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Post on 22-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Relational Model & Algebra

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

September 13, 2005

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2

Administrivia

Homework assignments will normally be given out on Thursdays, due the following Thursday unless otherwise directed

Start thinking about which project you want to do, who you might work with Will need to form groups and pick a project by

the end of next week (I’ll announce more) We will soon be holding an extra session on the

project, as well as essential tools and skills (stay tuned)

3

Thinking Back to Last Time…

There are a variety of ways of representing data, each with trade-offs Free text Classes and subclasses Shapes/points in space “Objects” with “properties”

In general, our emphasis will be on the last item … though there are spatial databases, OO

databases, text databases, and the like…

4

The Relational Data Model (1970)

Lessons from the Codd paper Let’s separate physical implementation from logical Model the data independently from how it will be used

(accessed, printed, etc.) Describe the data minimally and mathematically

A relation describes an association between data items – tuples with attributes

We generally think of tables and rows, but that’s somewhat imprecise

Use standard mathematical (logical) operations over the data – these are the relational algebra or relational calculus

How does this model relate to objects, properties? What are its abilities and limitations?

5

Why Did It Take So Many Years to Implement Relational Databases?

Codd’s original work: 1969-70 Earliest relational database research:

~1976 Oracle “2.0”: 1979 Why the gap?

1. “You could do the same thing in other ways”2. “Nobody wants to write math formulas”3. “Why would I turn my data into tables?”4. “It won’t perform well”

What do you think?

6

Getting More Concrete: Buildinga Database and Application

1. Start with a conceptual model “On paper” using certain techniques we’ll discuss next

week We ignore low-level details – focus on logical

representation

2. Design & implement schema Design and codify (in SQL) the relations/tables Do physical layout – indexes, etc.

3. Import the data4. Write applications using DBMS and other tools

Many of the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)

7

Conceptual Design for CIS Student Course Survey

STUDENT COURSETakes

namesid cid name

PROFESSOR

Teaches

semester

fid name

exp-grade

“Who’s taking what, and what grade do they expect?”

This design is independent ofthe final form of the report!

8

Example Schema

Our focus now: relational schema – set of tables

Can have other kinds of schemas – XML, object, …

sid name

1 Jill

2 Qun

3 Nitin

fid name

1 Ives

2 Saul

8 Martin

sid exp-grade

cid

1 A 550-0105

1 A 700-1005

3 C 500-0105

cid subj sem

550-0105 DB F05

700-1005 AI S05

501-0105 Arch F05

fid cid

1 550-0105

2 700-1005

8 501-0105

STUDENT Takes COURSE

PROFESSOR Teaches

9

Some Terminology

Columns of a relation are called attributes or fields The number of these columns is the arity of the

relation The rows of a relation are called tuples Each attribute has values taken from a domain,

e.g., subj has domain string

Theoretically: a relation is a set of tuples; no tuple can occur more than once Real systems may allow duplicates for efficiency or other

reasons – we’ll ignore this for now Objects and XML may also have the same content with

different “identity”

10

Describing Relations

A schema can be represented many ways To the DBMS, use data definition language

(DDL) – like programming language type definitions

In relational DBs, we use relation(attribute:domain)

STUDENT(sid:int, name:string)Takes(sid:int, exp-grade:char[2], cid:string)COURSE(cid:string, subj:string, sem:char[3])Teaches(fid:int, cid:string)PROFESSOR(fid:int, name:string)

11

More on Attribute Domains

Relational DBMSs have very limited “built-in” domains: either tables or scalar attributes – int, string, byte sequence, date, etc.

But more generally: We can have “nested relations” Object-oriented, object-relational systems allow

complex, user-defined domains – lists, classes, etc. XML systems allow for XML trees (or lists of trees) that

follow certain structural constraints

Database people, when they are discussing design, often assume domains are evident to the reader:STUDENT(sid, name)

12

Integrity Constraints

Domains and schemas are one form of constraint on a valid data instance

Other important constraints include:Key constraints:

Subset of fields that uniquely identifies a tuple, and for which no subset of the key has this property

May have several candidate keys; one is chosen as the primary key

A superkey is a subset of fields that includes a keyInclusion dependencies (referential integrity constraints):

A field in one relation may refer to a tuple in another relation by including its key

The referenced tuple must exist in the other relation for the database instance to be valid

13

SQL: Structured Query Language

The standard language for relational data Invented by folks at IBM, esp. Don Chamberlin Actually not a great language… Beat a more elegant competing standard,

QUEL, from Berkeley

Separated into a DML & DDL

DML based on relational algebra & (mostly) calculus, which we discuss this week

14

Table Definition:SQL-92 DDL and Constraints

CREATE TABLE Takes (sid INTEGER, exp-grade CHAR(2), cid STRING(8),

PRIMARY KEY (sid, cid),FOREIGN KEY (sid) REFERENCES STUDENT,FOREIGN KEY (cid) REFERENCES COURSE

)

CREATE TABLE STUDENT (sid INTEGER, name CHAR(20),

)

15

Example Data Instance

sid name

1 Jill

2 Qun

3 Nitin

fid name

1 Ives

2 Saul

8 Martin

sid exp-grade

cid

1 A 550-0105

1 A 700-1005

3 C 500-0105

cid subj sem

550-0105 DB F05

700-1005 AI S05

501-0105 Arch F05

fid cid

1 550-0105

2 700-1005

8 501-0105

STUDENT Takes COURSE

PROFESSOR Teaches

16

From Tables SQL Application

<html><body> <!-- hypotheticalEmbeddedSQL: SELECT *

FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid --></body></html>

C -> machine code sequence -> microprocessorJava -> bytecode sequence -> JVMSQL -> relational algebra expression -> query execution engine

17

Codd’s Relational Algebra

A set of mathematical operators that compose, modify, and combine tuples within different relations

Relational algebra operations operate on relations and produce relations (“closure”)f: Relation Relation f: Relation x Relation

Relation

18

Codd’s Logical Operations: The Relational Algebra

Six basic operations: Projection (R) Selection (R) Union R1 [ R2

Difference R1 – R2

Product R1 £ R2

(Rename) (R) And some other useful ones:

Join R1 ⋈ R2

Semijoin R1 ⊲ R2

Intersection R1 Å R2 Division R1 ¥ R2

19

Data Instance for Operator Examples

sid name

1 Jill

2 Qun

3 Nitin

4 Marty

fid name

1 Ives

2 Saul

8 Martin

sid exp-grade

cid

1 A 550-0105

1 A 700-1005

3 A 700-1005

3 C 500-0105

4 C 500-0105

cid subj sem

550-0105 DB F05

700-1005 AI S05

501-0105 Arch F05

fid cid

1 550-0105

2 700-1005

8 501-0105

STUDENT Takes COURSE

PROFESSOR Teaches

20

Projection,

21

Selection,

22

Product X

23

Join, ⋈: A Combination of Productand Selection

24

Union

25

Difference –

26

Rename,

The rename operator can be expressed several ways: The book has a very odd definition that’s not

algebraic An alternate definition:

(x) Takes the relation with schema Returns a relation with the attribute

list

Rename isn’t all that useful, except if you join a relation with itself

Why would it be useful here?

27

Mini-Quiz

This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these: The names of students named “Bob” The names of students expecting an “A” The names of students in Milo Martin’s 501

class The sids and names of students not enrolled

28

Deriving Intersection

Intersection: as with set operations, derivable from difference

A-B B-A

A B

A Å B≡ (A [ B) – (A – B) – (B – A)≡ (A - B) – (B - A)

29

Division

A somewhat messy operation that can be expressed in terms of the operations we have already defined

Used to express queries such as “The fid's of faculty who have taught all subjects”

Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

30

Division Using Our Existing Operators

All possible teaching assignments: Allpairs:

NotTaught, all (fid,subj) pairs for which professor fid has not taught subj:

Answer is all faculty not in NotTaught:

fid,subj (PROFESSOR £ subj(COURSE))

Allpairs - fid,subj(Teaches COURSE)⋈fid(PROFESSOR) - fid(NotTaught)

´ fid(PROFESSOR) - fid(fid,subj (PROFESSOR £ subj(COURSE)) -fid,subj(Teaches COURSE))⋈

31

Division: R1 R2

Requirement: schema(R1) ¾ schema(R2) Result schema: schema(R1) – schema(R2) “Professors who have taught all courses”:

What about “Courses that have been taught by all faculty”?

fid (fid,subj(Teaches ⋈ COURSE) subj(COURSE))

32

The Big Picture: SQL to Algebra toQuery Plan to Web Page

SELECT * FROM STUDENT, Takes, COURSE

WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid

STUDENT

Takes COURSE

Merge

Hash

by cid by cidOptimizer

ExecutionEngine

StorageSubsystem

Web Server / UI / etc

Query Plan – anoperator tree

33

Hint of Future Things: OptimizationIs Based on Algebraic Equivalences

Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics

They may be different in cost of evaluation!

c Ç d(R) ´ c(R) [ d(R)

c (R1 £ R2) ´ R1 ⋈c R2

c Ç d (R) ´ c (d (R))

Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

34

Next Time: An Equivalent, ButVery Different, Formalism

Codd invented a relational calculus that he proved was equivalent in expressiveness Based on a subset of first-order logic –

declarative, without an implicit order of evaluation

More convenient for describing certain things, and for certain kinds of manipulations

… And, in fact, the basis of SQL!