copyright © 2003-2008 curt hill queries in sql more options

Copyright © 2003-2008 Curt Hill

Queries in SQLMore options


Duplicates• A select usually joins several

tables creating large unique tuples• Temporary table has an

unspecified key• If the select removes portions of

the key, then duplicates can occur• Consider the query that links

faculty to the students taking any of their classes

The query SELECT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naid

This produces 238 rowsWhat is the key?


The Key• Does not need to be specified• In this case it is the linking fields

– F_naid (or ct_naid)– Ct_dept– Ct_number– S_id

• Since some of these fields will be removed by the Select duplicates occur


Removing duplicates• In this query duplicates occurs

when a student takes multiple classes from the teacher

• The result is not a set (which eliminates duplicates) but a multi-set (which allows duplicates)

• Placing the reserved word DISTINCT immediately after the Select removes these

• The new query follows:Copyright © 2003-2008 Curt Hill

Revised query SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naid

This produces 213 rows


How does this work?• Removing duplicates is not trivial• There are several ways, but all are

work• One possibility is to sort the tuples

– Duplicates then must be adjacent

• Another is to hash them– Duplicates have the same key

• Small queries could be done in memory, larger ones cannot

• We will consider sorting and hashing later


Deception• The difference between the two

queries is just one keyword• That keyword forces the DBMS to

do substantial extra work• Looks like no big deal but actually

is• Hence the query is deceptively

different• However, make the database do its

jobCopyright © 2003-2008 Curt Hill


All

• The opposite of the Distinct is the All

• Specifies that duplicates should not be eliminated

• Since elimination is expensive, it is usually not done– Thus All gives same result whether

present or absent

Order• The order of the output table is

dependent on many unpredictable things

• Different DBMSs may give different orderings, even with same data– Based on how they process the data

• The order of the above queries is different on Oracle and MySQL

• Worse yet neither will put all the students from one faculty together



Order by clause • Order by follows the Where• It specifies a sort order for the

output• May specify one or more fields• Fields do not have to be displayed

Sorted query 1 SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_name, S_name


Sorting• The default behavior is to sort:

– Case sensitive way– Ascending order (lowest to highest)

• Usually we sort on the display values– Oracle only allows this– SQL Server and MySQL allow sorts on

other fields


Sorted query 2 SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_naid, S_id


Sort Order• The default is sort in ascending

order for all sort keys• The key may be followed by ASC or

DESC• ASC makes ascending order• DESC is descending order• These may not be spelled out• If left out ASC is default


Sorted query 3SELECT DISTINCT f_name, s_nameFROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidORDER BY f_name DESC, s_name ASC


Aggregate operations• We can collapse several rows into one• This produces a summary report• Several rows of table become one row

of output• This requires the Group By clause with

Aggregate functions• The Group By follows Where• Aggregate functions are in Select


Group By and Aggregate functions

• Each of these Aggregate functions specify a field:– Count– Avg– Sum– Max– Min

• Usually used with Group by but not always

• Group by follows Where• Specifies the groups as changes in

fieldsCopyright © 2003-2008 Curt Hill

Grouped Query 1 SELECT f_name, count(s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name

This produces 16 rows



Commentary• Group by forces a sort• This is only means to ensure that the

items are together• The DISTINCT keyword may be used

within aggregate functions:– Count– Avg– Sum

Grouped Query 2 SELECT f_name, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name

This produces 16 rows but different counts



Secondary Selection• The Where does an initial selection

– It eliminates numerous combinations of tuples of no interest

• We may also wish to remove aggregated rows

• This must occur after the Where but before final table

• This is done with the HAVING clause of the GROUP BY

Having• The Having clause follows the

Group By fields• It gives a selection criteria for rows• Usually based upon the aggregate

functions• Form:

Having comparison• See following


Grouped Query 3 SELECT f_name, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name HAVING count(*)>10


Commentary• This produces 9 rows • Notice the * is the parameter

of count• Other Aggregate functions

could be used as well• A Having without a Group By is

like a Where


Ungrouped Query• Suppose we just want a count or

sum• Then we can use an aggregate

function without Group By• This will generally collapse the

entire table into a single row• Consider the next screen


Aggregates• Counting rows:Select count(*)from faculty– Results in one row with count of 19

• Sum of student balances:Select sum(s_balance)from students– Results in one row with the sum:

93240.34


Variations• Recall this query SELECT f_name, count(DISTINCT s_name)…GROUP BY f_name

• Suppose f_naid were included in the SelectSELECT f_name, f_naid,

• In Oracle and SQL Server it would also have to be part of the Group By– But not in MySQL


Bad Oracle QuerySELECT f_name, f_naid, count(DISTINCT s_name)FROM faculty, c_teach, students, gradesWHERE f_naid = ct_naid AND ct_dept = g_dept AND ct_number = g_course AND s_id = g_naidGROUP BY f_name– Receives an error:

ORA-00979: not a GROUP BY expression


copyright © 2003-2008 curt hill queries in sql more options

Documents