from the calculus to the structured query language zachary g. ives university of pennsylvania cis...

27
From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005 me slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Upload: jayson-caulder

Post on 14-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

From the Calculus to the Structured Query Language

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

September 22, 2005

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

2

Administrivia

Homework 1 due Tuesday Homework 2 will also be handed out

Will involve writing SQL Oracle set up on eniac.seas.upenn.edu (also

eniac-l.seas.upenn.edu) Go to:

www.seas.upenn.edu/~zives/cis550/oracle-faq.html

Click on “create Oracle account” link

Enter your login info so you’ll get an Oracle account

Page 3: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

3

Tuple Relational Calculus (in More Detail)

Queries of form:

{T | p}

Predicate: boolean expression over Tx attribs Expressions:

Tx R TX.a op TY.b TX.a op const const op TX.a T.a = Tx.a

where op is , , , , , Tx,… are tuple variables, Tx.a, … are

attributes

Complex expressions: e1e2, e1e2, e, and e1e2

Universal and existential quantifiers

predicate

Page 4: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

4

Domain Relational Calculusto Tuple Relational Calculus

{<subj> | 9 cid, sem, cid, sid (<cid, subj, sem> 2 COURSE Æ <sid, “C”, cid> 2 Takes}

{<cid> | 9 s1, s2 (<cid, s1, s2> 2 COURSE Æ 9 cid2, s3, s4 (<cid2, s3, s4> 2 COURSE Æ (cid > cid2)))}

Page 5: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

5

Mini-Quiz on the Relational Calculus

How do you write: DRC: Which students have taken more than

one course from the same professor?

TRC: Which faculty teach every course?

Page 6: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

6

Algebra vs. Calculus

We’ve claimed thatthe calculus (when safe)and the algebra areequivalent

Thus (core) SQL => calculus algebramakes sense

Let’s look moreclosely at this… SELECT *

FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid

STUDENT

Takes COURSE

Calculus

Page 7: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

7

Translating from RA to DRC

Core of relational algebra: , , , x, - We need to work our way through the

structure of an RA expression, translating each possible form. Let TR[e] be the translation of RA expression e

into DRC.

Relation names: For the RA expression R, the DRC expression is {<x1,x2, …, xn>| <x1,x2, …, xn> R}

Page 8: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

8

Selection: TR[ R]

Suppose we have (e’), where e’ is another RA expression that translates as:

TR[e’]= {<x1,x2, …, xn>| p} Then the translation of c(e’) is

{<x1,x2, …, xn>| p’}where ’ is obtained from by replacing each attribute with the corresponding variable

Example: TR[#1=#2 #4>2.5R] (if R has arity 4) is

{<x1,x2, x3, x4>|< x1,x2, x3, x4> R x1=x2 x4>2.5}

Page 9: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

9

Projection: TR[i1,…,im(e)]

If TR[e]= {<x1,x2, …, xn>| p} then TR[i1,i2,…,im

(e)]=

{<x i1,x i2

, …, x im >| xj1,xj2

, …, xjk.p},

where xj1,xj2

, …, xjk are variables in x1,x2, …, xn

that are not in x i1,x i2

, …, x im

Example: With R as before,#1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R}

Page 10: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

10

Union: TR[R1 R2] R1 and R2 must have the same arity For e1 e2, where e1, e2 are algebra

expressionsTR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}

Relabel the variables in the second:TR[e2]={< x1,…,xn>|q’}

This may involve relabeling bound variables in q to avoid clashesTR[e1e2]={<x1,…,xn>|pq’}.

Example: TR[R1 R2] = {< x1,x2, x3,x4>| <x1,x2, x3,x4>R1 <x1,x2, x3,x4>R2

Page 11: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

11

Other Binary Operators

Difference: The same conditions hold as for unionIf TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}

Then TR[e1- e2]= {<x1,…,xn>|pq}

Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}

Then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq}

Example: TR[RS]= {<x1,…,xn, y1,…,ym >|

<x1,…,xn> R <y1,…,ym > S }

Page 12: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

12

Relational Algebra vs. Calculus

Can translate relational algebra into relational calculus

Given syntactic restrictions that guarantee safety of calculus query, can translate back to relational algebra

These are the principles behind initial development of relational databases SQL is close to calculus; query plan is close to algebra

But SQL can do other things (recursion, aggregation that RA/RC can’t)

Great example of theory leading to practice!

Page 13: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

13

Basic SQL: A Friendly FaceOver the Tuple Relational Calculus

SELECT [DISTINCT] {T1.attrib, …, T2.attrib}FROM {relation} T1, {relation} T2, …WHERE {predicates}

Let’s do some examples, which will leverage your knowledge of the relational calculus… Faculty ids Course IDs for courses with students expecting a

“C” Courses taken by Jill

select-list

from-list

qualification

Page 14: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

14

Our Example Data Instance

sid name

1 Jill

2 Qun

3 Nitin

fid name

1 Ives

2 Saul

8 Martin

sid exp-grade

cid

1 A 550-0105

1 A 700-1005

3 C 501-0105

cid subj sem

550-0105 DB F05

700-1005 AI S05

501-0105 Arch F05

fid cid

1 550-0105

2 700-1005

8 501-0105

STUDENT Takes COURSE

PROFESSOR Teaches

Page 15: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

15

Some Nice Features

SELECT * All STUDENTs

AS As a “range variable” (tuple variable): optional As an attribute rename operator

Example: Which students (names) have taken more than

one course from the same professor?

Page 16: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

16

Expressions in SQL

Can do computation over scalars (int, real or string) in the select-list or the qualification Show all student IDs decremented by 1

Strings: Fixed (CHAR(x)) or variable length (VARCHAR(x)) Use single quotes: ’A string’ Special comparison operator: LIKE Not equal: <>

Typecasting: CAST(S.sid AS VARCHAR(255))

Page 17: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

17

Set Operations

Set operations default to set semantics, not bag semantics:(SELECT … FROM … WHERE …){op}(SELECT … FROM … WHERE …)

Where op is one of: UNION INTERSECT, MINUS/EXCEPT

(many DBs don’t support these last ones!)

Bag semantics: ALL

Page 18: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

18

Exercise

Find all students who have taken DB but not AI Hint: use EXCEPT

Page 19: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

19

Nested Queries in SQL

Simplest: IN/NOT IN

Example: Students who have taken subjects that have (at any point) been taught by Martin

Page 20: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

20

Correlated Subqueries

Most common: EXISTS/NOT EXISTS Find all students who have taken DB but not AI

Page 21: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

21

Universal and Existential Quantification

Generally used with subqueries: {op} ANY, {op} ALL Find the students with the best expected

grades

Page 22: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

22

Table Expressions

Can substitute a subquery for any relation in the FROM clause:

SELECT S.sidFROM (SELECT sid FROM STUDENT WHERE sid = 5) SWHERE S.sid = 4

Notice that we can actually simplify this query!

What is this equivalent to?

Page 23: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

23

Aggregation

GROUP BY

SELECT {group-attribs}, {aggregate-operator}(attrib)FROM {relation} T1, {relation} T2, …WHERE {predicates}GROUP BY {group-list}

Aggregate operators AVG, COUNT, SUM, MAX, MIN DISTINCT keyword for AVG, COUNT, SUM

Page 24: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

24

Some Examples

Number of students in each course offering

Number of different grades expected for each course offering

Number of (distinct) students taking AI courses

Page 25: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

25

What If You Want to Only ShowSome Groups?

The HAVING clause lets you do a selection based on an aggregate (there must be 1 value per group):

SELECT C.subj, COUNT(S.sid)FROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid = C.cidGROUP BY subjHAVING COUNT(S.sid) > 5

Exercise: For each subject taught by at least two professors, list the minimum expected grade

Page 26: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

26

Aggregation and Table Expressions

Sometimes need to compute results over the results of a previous aggregation:

SELECT subj, AVG(size)FROM (

SELECT C.cid AS id, C.subj AS subj, COUNT(S.sid) AS sizeFROM STUDENT S, Takes T, COURSE CWHERE S.sid = T.sid AND T.cid =

C.cidGROUP BY cid, subj)

GROUP BY subj

Page 27: From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005

27

Something to Ponder

Tables are great, but… Not everyone is uniform – I may have a cell

phone but not a fax We may simply be missing certain information We may be unsure about values

How do we handle these things?