introductory databases 2016-17 - university of birminghamrjh/courses/introtodbs/2016-17... · 3...

Introductory Databases 2016-17

SSC Introduction to DataBases 1

Introductory Databases

[email protected] Office: 236

Classes

Lectures:

Tuesday 3:00 Arts Main Lecture Theatre

Friday 2:00 G15 Muirhead

Assessment

Assessment:

80% Examination

20% coursework

Class test

Java Exercise

Provisional Exercise Timetable

Exercise Title Date Weight

1 SQL Class test Week 10%

2 Java & DB End week 10%

Exercise 1 is an unseen class test

Exercise 2 is assessed by viva

All work must be submitted through Canvas

Introductory Databases

Textbooks

Databases: Ramakrishnan & Gehrke, Data Base Management Systems

Java & DBs

Horstman & Cornell, Core Java, Vol 2

Horstman, Big Java

[Note: not all of these books!]

mailto:[email protected]



DataBase Systems

In the School: PostgreSql:

See: http://supportweb.cs.bham.ac.uk/documentation/postgres

A ‘proper’ DBMS which is freely available

On your own machines:

Postgres can be installed or you can use your favourite system … but be careful with non-standard features!

Or you can use ssh or vpn to connect remotely

Coursework must be demonstrated in the lab.!!!!! …. So it must work with Postgres

[Also: note that some implementations include non-standard features – so take care!]

Course Content

An Introduction to design and use of Database systems

Background, alternatives and justification of DBMS

Relational Databases: Relational model

Introduction to SQL creating & manipulating DBs

Introduction to Transactions and concurrency

Database Design – ER diagrams and mapping to DB

Java & SQL – using a DB through JDBC

[Focus on using DBMS]

Database Management Systems - Background

How can we store (and share) data?

Store? Share?

How can we store (and communicate) data?

Store Share

In memory:

Persistently?

Java objects

Flat/Structured data files:

XML

Interchange formats

….

Shared memory

Share files locally

Share files remotely

Document Repository

Data Server –

fetch data as required

……

All of these have their applications:

Small, volatile datasets

Documents, data files

Diaries, address books ….

Email

Media libraries

…..

Typically: Small

Limited shared access

Lowmedium data security requirements

Lowmedium consistency

Lowmedium availability requirements

…

Light weight solutions are fast, adequate, cheap, agile ….

http://supportweb.cs.bham.ac.uk/documentation/postgres



… but what if we need:

Integration with existing (DBMS-based) systems?

Add new functions

Add web clients

Add new functionality eg eCommerce

….

Very large amounts of data

High availability

High levels of data integrity

Concurrent access from a large number of users

High performance

Lecture 2

What is a DataBase?

What properties should they have?

What models are there?

Relational Databases

How does this fit into the bigger picture?

A bad example

So – what is a DBMS?

Core data base Representation of

data

Retrieval and maintenance of data

Collection of tools Design and build and

maintenance

E.g. standard applications

Provides: An abstraction from

implementation

Efficient implementation

Integrity, fault tolerance, security …

Standard (and bespoke) interfaces For instance SQL for

querying DB

DB models

There are different models available:

Network/graph model

Hierarchical model

Relational model

… and many more

Relational databases are the pre-eminent choice for general purpose data base systems:

Especially for commercial/ enterprise systems

Consistency, reliability etc. are critical

DB models

We don’t always need all the benefits of a relational database Consistency, integrity,

reliability, fault tolerance etc.

We may not even write to or update our database

We may need other things: Performance:

Fast

Low power consumption

Scaleability Very large number of users

Flexibility

So other technologies may be better

Data-warehousing and data-mining

Geographical systems

Network infrastructure

Mobile devices/IoT

Search engines

Financial trading

……

But: These often look like proper

RDBMS (on the surface) SQL

Databases & Desiderata

An organised collection of data

For a purpose

To facilitate some set of activities

Organised so that it can be accessed and maintained efficiently

Desirable properties Data independence

From internal or physical representation

Minimise redundancy Store data once

Maximise consistency One underlying

representation

Enable integration and sharing

Facilitate change

Logical organisation [of course these apply to most things!]



ACID

Necessary properties:

Atomicity – a transaction happens as a whole (or not at all)

Consistency – a DB is always in a consistent state (according to the rules defined)

Isolation – the effect of two operations happening in parallel is the same as if they had happened sequentially

Durability – once something is stored it won’t disappear

All this, even if the plug is pulled!

ANSI/SPARC DB Architecture

View or external data model

A view onto (a subset of) conceptual model

Conceptual data model

An abstract model independent of implementation

Physical (internal) data model

Sales Accounts Manufacture Views

Conceptual model

Physical model

DB Applications

Users are usually shielded from the underlying DB:

Application programs

Web interfaces

For convenience and

security …

DB program

Web server & servlets

Browser

Web Browser Web Server HTTP

Requests

resources

DataBase

Server

JDBC

?

Firewall

Actually more complex than this, but aim is to restrict access to critical components

Mechanisms are general – not just http and browsers


Requests

resources

DataBase

Server

JDBC

?

Firewall

Actually more complex than this, but

aim is to restrict access to critical components

Mechanisms are general – not just

http and browsers

Relational Databases

Main model for data base systems

Data in form of sets and relations

Data connected using basic set theory

Selecting, combining etc.

Normally viewed as tables

Queries specify result not how it is computed

declarative

Example – just for illustration!

Student sid dob login course

1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs

5 8/7/1989 edf Se Marks sid mid mark

1 277 23

2 277 65

1 264 78

2 266 52

3 277 99

Data Definition Language

CREATE TABLE Student (

sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)

Query Language

SELECT sid, login

FROM Student

WHERE course=‘Se’

Tuple

(or row or record)

Column

(or field or attribute)

Table name Field or attribute names

Defines DataBase

Manipulates Data





1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs


1 277 23

2 277 65

1 264 78

2 266 52

3 277 99



sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)

Query Language

SELECT sid, login

FROM Student


Tuple

(or row or record)

Column



What’s bad about this?



1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs


1 277 23

2 277 65

1 264 78

2 266 52

3 277 99



sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)

Query Language

SELECT sid, login

FROM Student


Tuple

(or row or record)

Column






1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs


1 277 23

2 277 65

1 264 78

2 266 52

3 277 99



sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)

Query Language

SELECT sid, login

FROM Student


Tuple

(or row or record)

Column




Actually, we need a much more sophisticated model

than this!

Lecture 3

Some basics

To help you read the texts

SQL

A start ….

A digression - Sets

Sets are collections of items Usually we are interested in sets where the items

are of the same type Numbers, dates etc.

Sets are unordered [Actually, here, we use ordered sets]

Sets do not contain duplicates

Written as: {2, 4, 6, 8, 10}

{2n | 1<= n <=5}

Sets (contd.)

The items in a set are its elements 3 ∊ {3, 5, 7, 9, 11}

4 ∉ {3, 5, 7, 9, 11}

If all the elements of a set are also elements of a second set then the first set is a subset of the second (and the second is a superset of the first. {3, 5} ⊆ {3, 5, 7, 9}

If the second set has extra elements then the first is a proper subset of the second.

{3,5} ⊂ {3, 5, 7, 9}



Sets (contd)

Union – the set made up of all elements of the two sets: {1,2}∪{2,3}={1,2,3}

Intersection – The set whose elements are in both operand sets: {1,2}∩{2,3}={2}

Difference – the sets

whose elements are in the first but not the second:

{1,2}-{2,3}={1}

The empty set

{}

Why is this important?

You must be clear about the logic you are trying to express in a query

Then you can map it to an actual query

E.g:

Give the phone number of everyone who has not made a call to London:

????????

Relational Definitions (to help follow texts)

Domain – D

An arbitrary (non-empty) set of atomic values

Attribute name – A

A symbol with an associated domain –

dom(A)

Relational Schema – R

A finite set of attribute names

Tuple – t of a Relational Schema R A mapping from the

attributes A of a relational schema R to the union of their domains dom(A) st t(A) ∈ dom(A)

Relation – r – of a relational schema R A finite set of tuples of

Relational Schema R

… and more

Degree (Arity) of a relation schema

Number of attributes

Cardinality of a relation

Number of tuples

That’s the formal definition – the next side should make it clear what this means!

Example - revisited


1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs

5 8/7/1989 edf Se

Tuple

(or row or record)

Column



Relation

Arity = 4

Cardinality = 5

Relational Schema

Keys

Superkey – a set of attributes that can always be used to differentiate one tuple from another (within a relation)

Key – a minimal superkey

Concatenated key – a key with more than one attribute

Candidate key – any key

Primary key – one of the candidate keys

Foreign key – an attribute of the relation which is a key for another relation



SQL

SQL provides the mechanism that is normally used to manipulate a (relational) data base.

There are several components to it, but we will only look at:

Data Definition Language – used to create, modify

and delete parts of the definition of the data base

Data Manipulation Language (DML) used to manipulate the data

SQL

Conventions

Case and whitespace are not significant

… but there are conventions that you

should follow

SQL keywords in upper case

Table names capitalised

Field names lower case

Layout

DDL - CREATE

Creates a table

Name

Fields – name and

type

Constraints on values

… CREATE TABLE Student (

sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)


1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs

5 8/7/1989 edf Se

DDL - CREATE

Constraints on the data can (should) be included NOT NULL

UNIQUE

One field or across several

It’s usual to define constraints when you define the table, but:

They can be added later

[Sometimes this is necessary]


sid INTEGER NOT NULL UNIQUE,

dob CHAR(10),

login CHAR(20) UNIQUE,

course CHAR(10)

)

DDL – DROP&ALTER

DROP deletes the table DROP TABLE Student

Create, drop, alter can be applied to other things

ALTER modifies the table definition

Adding nulls to the table

ALTER TABLE Student

ADD COLUMN year_of_study INTEGER

See documentation for full details

DDL – DROP&ALTER

DROP deletes the table DROP TABLE Student

Create, drop, alter can be applied to other things

ALTER modifies the table definition

Adding nulls to the table

ALTER TABLE Student

ADD COLUMN year_of_study INTEGER

See documentation for full details

Remember data bases are persistent:

•They may be business critical

•So, we cannot rebuild them

•They may be ‘live’ for a very long time

•So: we ‘evolve’ them ‘on the fly’



Types

BOOLEAN – TRUE, FALSE or NULL!

CHAR(size) or CHARACTER(size)

Strings E.g. VARCHAR

INTEGER or INT Several variations

REAL, FLOAT or DOUBLE

NUMERIC or DECIMAL

DATE

TIME - A time of

day value.

… and many more

Lecture 4

More on Constraints

Constraints place restrictions on the values (or set of values) that can be inserted into the data base. They are hard constraints

The DBMS will check and enforce them

They are metadata used to: Make explicit domain constraints

An age must be >= 0

Maintain the integrity of the data base Not just locally to a table:

i.e. Only record marks for students who are registered

Constraints – a note

This is similar to the checks made in Java code to maintain consistency & correctness i.e. (trivially) in set

methods

But …

declarative - so no need to write code

Guaranteed to be enforced

But:

Not an excuse for laziness ….

Important to distinguish between:

Hard constraints

Every student has a unique sid

Desirable constraints

Every student has a login

Constraints

We have already seen some examples:

Domain e.g. INTEGER

NOT NULL

UNIQUE

We can also add

Restrictions on the range of values

Information about keys

These are especially important to ensure integrity across tables

Arbitrary checks

If constraints are violated then the operation fails

Keys

A primary key constraint PRIMARY KEY sid

Relations written: Student(sid, dob, login, course)

Enforces: UNIQUE

NOT NULL

Usually a single column but can be any key

Often (actually, usually) a synthetic UID

DBMS may be able to optimise

Note: signals importance to DB structure rather than just a data

constraint



Primary Key example


sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

PRIMARY KEY (sid)

)

Or


sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10),

CONSTRAINT StudentsKey PRIMARY KEY (sid)

)

If we name a constraint then we can manipulate it later:

•Remember, we don’t want to rebuild our database!

Keys

Foreign Key Defines a link to another

table Usually specifies the

primary key in that other table (default)

Any row inserted into this table must satisfy the constraint that: The foreign key in this

table must be matched by a key (usually the primary key) in the other table


sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

PRIMARY KEY (sid)

)

CREATE TABLE Marks (

sid INTEGER,

mid INTEGER,

mark INTEGER,

FOREIGN KEY (sid) REFERENCES Student(sid),

FOREIGN KEY (mid) REFERENCES Course(cid)

)

Keys

Foreign Key Defines a link to another

table Usually specifies the

primary key in that other table (default)

Any row inserted into this table must satisfy the constraint that: The foreign key in this

table must be matched by a key (usually the primary key) in the other table.


sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

PRIMARY KEY (sid)

)


sid INTEGER,

mid INTEGER,

mark INTEGER,


FOREIGN KEY (mid) REFERENCES Modules(mid)

)

This isn’t just checked on insertion:

•It is guaranteed to always be true!

So we can have multiple rows in Marks referencing a single row in Student

What happens if we modify the Student table (so that a referenced sid is removed? ON DELETE RESTRICT

ON DELETE NO ACTION

ON DELETE CASCADE

ON DELETE SET DEFAULT/NULL

Keys CREATE TABLE Student (

sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

PRIMARY KEY (sid)

)


sid INTEGER,

mid INTEGER,

mark INTEGER,



)

The integrity of our database is

broken if we have marks for students or courses that don’t

exist!

Keys


sid INTEGER,

cid INTEGER,

mark INTEGER,

FOREIGN KEY (sid) REFERENCES Student(sid)

ON DELETE CASCADE

ON UPDATE CASCADE,


)

So, the integrity of the DB is maintained:

We typically either:

Forbid the operation (RESTRICT or NO ACTION)

Delete rows that reference the deleted key (CASCADE)

NO ACTION is the default: so, we cannot delete/change cid in Courses (IFF it is referenced!)

Setting a default/NULL is rarely a good idea!

Of course, we probably want to be cleverer than this!

Keys


sid INTEGER,

mid INTEGER,

mark INTEGER,


ON DELETE CASCADE

ON UPDATE CASCADE,


)

So, the integrity of the DB is maintained:

We typically either:

Forbid the operation (RESTRICT or NO ACTION)

Delete rows that reference the deleted key (CASCADE)

NO ACTION is the default: so, we cannot delete/change cid in Courses

Setting a default/NULL is rarely a good idea!

Of course, we probably want to be cleverer than this!

RESTRICT & NOACTION differ upon whether the test is done: before or after the action

•RESTRICT is before the operation

•No ACTION is after

This means that:

•Changes to sid in the Students table propagate through to the Marks table

•Changes to cid in the Courses table

are forbidden

• The default is NOACTION

It’s a bit more complicated than we’ve described

•It doesn’t just apply to primary keys

•There are various syntactic forms for expressing this

What to do is a design decision!



Keys

Notes:

Constraints are specified in the ‘child’ table and can affect operations on both ‘parent’ and ‘child’

Operations are atomic See transactions, later

The constraints are guaranteed not to be violated Operations succeed or

fail

Student

Marks Contact

Relationship

Notes:

Not all databases fully support all of this properly:

E.g. some versions of MySQL

Typically the DBMS will create an index to do the checks

efficiently

The efficiency of this varies across implementations

One practice is: For major operations

(e.g. to archive graduated students) Remove constraints

Perform operation

Add constraints

That’s why naming constraints is useful

Domain Constraints

We can also enforce additional constraints over values:

Local constraints

Constraints over more than one table

These can be arbitrarily complex (to be revisited)

See Triggers

CHECK (age >= 0 AND age <=120)


sid INTEGER,

mid INTEGER,

mark INTEGER,


ON DELETE CASCADE

ON UPDATE CASCADE,

FOREIGN KEY (mid) REFERENCES Modules(mid),

CHECK (mark>=0 AND mark <= 100)

)

Constraints: Summary

Constraints express metadata about our domain:

They are declarative:

So no need to write code

They are guaranteed to be enforced:

Not just checks on insertion!

When designing:

Capture all hard domain constraints:

… and express them in the DB

Do not include constraints which are desirable but not inviolable.

Note: This is really hard to do in Java

Lecture 5

Setting up Postgres

SQL

INSERT

UPDATE

DELETE

SELECT

Selection criteria

Aggregate functions

….. and more

Using PostgreSQL

>psql -h mod-intro-databases -d <username>

<username=> SELECT * FROM Books;

See module page for a cribsheet

The name of DB host The name of

DB



Manipulating Data: INSERT

INSERT

Inserts data into a table

INSERT INTO Student VALUES

(28,’28/1/09’, ‘reh’, ‘CS’)

INSERT INTO

Student (sid, dob, login, course)

VALUES (28,’28/1/09’, ‘reh’, ‘CS’)

In general we can specify columns, rows and values to insert

Manipulating Data: UPDATE

UPDATE

Modify values in rows:

UPDATE Student

SET course=‘CS’

WHERE sid=23

Specify table and rows and attributes and their new values

All rows that meet the condition will be updated

Manipulating Data: DELETE

DELETE

Deletes rows from the table

DELETE FROM Student

WHERE sid=23

So, we specify rows to be deleted.

All rows meeting the condition will be deleted

Data Manipulation

In general:

We can use any SQL command within these statement

Arbitrary operations

All these statements are subject to constraint checking:

If the constraints would be violated then:

The operation fails

An error is raised

Queries: SELECT

SELECT is used to retrieve data from tables:

One table or (more usually) combining data from several

Result is a table:

that meets the specification in the SELECT statement

SELECT allows the specification of:

Columns in the result

Conditions on rows

Which can be arbitrarily complex

Ordering of results

Aggregate functions

… and much more

Simple form: selecting from one table

SELECT *

FROM Student

SELECT sid, login

FROM Student

WHERE course=‘cs’

SELECT sid, login

FROM Student

SELECT Student.sid, Student.login

FROM Student

WHERE Student.course=‘cs’ Note: can have expressions rather than just column names



Additionally

DISTINCT will remove duplicate rows: SELECT DISTINCT course

FROM Student

AS attaches a label to a column

Aggregate functions can be applied to the table: COUNT, SUM, AVG, MAX, MIN …

SELECT COUNT (DISTINCT course) AS NoC FROM Student Or we might include selection criteria

.. And more

ORDER BY allows the order of rows to be specified:

Uses the natural order of the column

ASC/DESC

Can specify multiple columns

SELECT *

FROM Student

ORDER BY course, login

Remember: Tables represent ordered sets.

The default order is a vestige of the retrieval algorithm

.. And yet more

Can also specify a window onto result

Start row

End row

… and a few other things, which we’ll

ignore

Retrieval Criteria

The selection criteria in the WHERE clause can involve arbitrarily complex expressions:

AND/OR

Comparison operators:

<, <=, =, <>

IS [NOT] NULL IS [NOT] TRUE

[NOT] IN Course IN (‘Cs’, ‘Se’)

Arithmetic

[NOT] BETWEEN Age BETWEEN 18 AND 65

Note: There are some nasties with some of this!

LIKE

LIKE provides pattern matching on strings _ matches any

character

% matches 0 or more characters

[most implementations also allow regexps]

Name LIKE ‘_ob%’

Other operators

There are, typically, a large number of functions and operators

The set (and semantics) may vary with the implementation

Mathematical functions

String operations

Date manipulation

Bitwise operations

Formatting

And so on



Lecture 6

Joins

INNER JOIN

Correlation names

Other JOINs

Nested queries

GROUP BY & HAVING

Example

Joins

Selecting data from a single table is obviously limiting

In general we need to join data from several tables

A join returns a table: The columns are

specified in the head of the SELECT clause

The rows are a function of the data in the joined rows

SELECT *

FROM Student, Marks

Or more sensibly:

SELECT *

FROM Student, Marks

WHERE Student.sid=Marks.sid

Or:

SELECT Student.sid, Marks.mid, Marks.mark

FROM Student, Marks


Joins

There are several different flavours of join:

First we will look at the inner join – which is the most common

Others follow the same pattern

SELECT Student.sid, Marks.cid, Marks.mark

FROM Student, Marks


Can also be written:


FROM Student

INNER JOIN Marks

ON Student.sid=Marks.sid

The examples that follow will use the first style:

It’s compact

Referred to as ‘implicit notation’

House styles may prefer one or the other

Join – inner join

Conceptually:

Form cross product

Of the set of tables

Select columns

Listed in the select

Select rows

Only those that meet the WHERE condition

[if DISTINCT then remove duplicates]

Notes

The result of a join is a table

The WHERE clause is an arbitrary expression:

If it evaluates to true then the row is included

The columns are specified in the head of the SELECT:

Often these are just selected from the columns of the joined tables

… but we can specify arbitrary expressions

Notes

If there is ambiguity in a field name (even if they are known to be equal) they must be fully

qualified

One style is to always fully qualify


FROM Student, Marks

Where Student.sid=Marks.sid



Correlation names

A name can be given to the table

Sometimes as a style

Briefer and clearer

Sometimes it is necessary

[Also see later example]

SELECT S.sid, M.cid, M.mark

FROM Student S, Marks M

Where S.sid=M.sid

SELECT S1.sid, S2.sid, S1.dob

FROM Student S1, Student S2

Where S1.dob=S2.dob

What does this try to do?

What is wrong with it?

Correlation names

A name can be given to the table

Sometimes as a style

Briefer and clearer

Sometimes it is necessary

[Also see later example]

SELECT S.sid, M.cid, M.mark

FROM Student S, Marks M

Where S.sid=M.sid

SELECT S1.sid, S2.sid, S1.dob

FROM Student S1, Student S2

Where S1.dob=S2.dob

What does this try to do?

What is wrong with it?

Every student shares a birthday with himself

If student A shares a birthday with student B then we will also be told student B shares a

birthday with student A

How can we fix this?

Other joins

LEFT JOIN

This is just like an INNER JOIN, but

Includes rows in the left table that are not matched

NULLs are added

SELECT sid, marks

FROM Students

LEFT JOIN Marks


Other joins

RIGHT JOIN


Includes rows in the right table that are not matched

NULLs are added

SELECT sid, marks

FROM Students

RIGHT JOIN Marks


Other joins

FULL JOIN


Includes rows in the left & right tables that are not matched

NULLs are added

SELECT sid, marks

FROM Students

FULL JOIN Marks


Other joins

EQUI JOIN

This is just an INNER JOIN where the condition only includes equality

NATURAL JOIN

Here the equality test is implicit based on the names of attributes

Don’t use it!

CROSS JOIN

This is just the default without any selection

It gives every permutation of the rows in each table



Notation

Another syntactic form is:

This is equivalent to:


FROM Student

INNER JOIN Marks

USING (sid)


FROM Student

INNER JOIN Marks


UNION, INTERSECT, EXCEPT

A query can be formulated as a set operation on two results: This can be easier and

clearer than a complex condition

It can be more efficient

Note: the only constraint is that the tables must be compatible in number and type of attributes

SELECT S1.sid, S1.course

FROM Student S1

Where S1.course=‘Cs’

UNION

SELECT S2.sid, S2.course

FROM Student S2

Where S2.course=‘Se’

This is not obviously very useful in this example (although it might be faster) – but there are many cases where a query is much simpler, clearer and more efficient.

Nested Queries/Subqueries

It is possible to have subqueries whose results are then used in the outer query:

Easier to formulate

Clearer

More efficient

There are many operators that can be used with the sub-result:

[NOT] IN

[NOT] EXISTS

<op> ANY

<op> ALL

Example

SELECT S1.sid

FROM Student S1

WHERE S1.dob > ALL (SELECT S2.dob

FROM Student S2

WHERE S2.course=‘Cs’)

Notes:

We can reference the outer row within the nested query ie. reference S1 within the nested query

Notice, that we don’t need to use different names for Student in the two queries – but if the inner query references the outer, then we must.

Nesting can be arbitrarily deep

Allows operations over groups of rows GROUP BY specifies

grouping of rows SELECT customerName, SUM(orderQuantity)

FROM Sales GROUP BY customerName

HAVING specifies selection criteria for groups

SELECT customerName, SUM(orderQuantity) FROM Sales GROUP BY customerName HAVING SUM(orderQuantity) > 5

Can specify fields and aggregate operations over the groups

GROUP BY/HAVING

So, what this does is:

Join tables

Filter rows using WHERE clause

Group the table using the attribute(s) specified in the GROUP BY clause

Returning one row per group

Remove rows where the HAVING clause does not return TRUE

Not a problem!

Notes:

The columns must either be: Specified in the GROUP

BY clause

An aggregate applied over the group

[Guaranteed to be unique (because of a functional dependency)]

This is useful: In it’s own right

As a nested query

GROUP BY/HAVING

Exercise:

Return a table which lists each student registered on course ‘CS’ with their average mark over all modules: Extend to only include

students with average >=70

and all marks >= 40



Remember:

We can usually use an arbitrary expression where a value is required

This can be:

Useful

Or obscure

select b.isbn,

(select count(borrowerid)

from loans

where loans.isbn=b.isbn ) as NoLoans

from books b;

This is not the right way to do this!

How should it be done?

Views

Views provide a mechanism whereby a user/programmer can be given an alternative view

onto the underlying data base

For

Convenience

Security

Views

A view is specified in terms of the underlying tables

Derived tables vs Base tables

Operations are still (conceptually) performed on the base tables

This may have implications for some operations

The specification of a view can be arbitrarily complex

Views - Motivation

For convenience

Rather than working with a large and complex system the data can be repackaged to provide a simpler and more appropriate view of the data

Filtering

Re-organising

Abstracting

Security

Hide data that is sensitive

Apply security policies to views

Whilst this seems like a good idea sometimes we may prefer to do this in a ‘batch’ mode.

SQL - summary

We have covered the core of SQL There are many

more features Triggers

Transactions – to be revisited

Security

Implementation & efficiency

And more ….

The features we have covered can be combined to build arbitrarily complex statements

But as always there are techniques for managing complexity

Think! Get the logic right. Encode in SQL!

Student Records Database: Done right!



Lecture 9&10

Java and SQL

Basics

Data Manipulation

How to do it – patterns etc.

Transactions

Summary

JDBC

JDBC provides A mechanism for

connection to database systems

An API for: Managing this

connection

Sending statements to the DB

Receiving results from the DB

As usual, the API is:

easy to use

.. but general and complex if you need it

Although we focus on relational databases this is not a restriction

And other languages provide equivalent features.

Java DB JDBC

Requests

Results

•Make connection

•Issue requests/ process results

•Close connection

•Connections to multiple databases

•Connections from multiple programs

Mechanism

•Architecture will be more complex

Some Warnings

Java is fairly strongly typed but …: The interaction with

the data base is in terms of strings and resultsets: It is the programmers

responsibility to check they are well formed and type compatible

There are more sophisticated (and ultimately easier) ways to manage the mapping between

java objects and data base relations: E.g Hibernate

Style

Java & DBMSs have different strengths:

Use each for the appropriate tasks:

Java for interaction, complex algorithms, etc.

DB for data manipulation

Don’t fetch all the

data and then search it!

Notes

Whilst you can have a Java application which: Creates tables

Populates DB

Performs operations

Drops tables

[and this is a good pattern for development!]

In general, the DB will be persistent. So:

Java application should connect to an existing DB

For development/ debugging convenience, either:

Use DBMS separately to setup

Use separate java programs to create and destroy your DB



Notes - testing

The usual techniques apply to testing.

Both:

Java code

DB code

In general the point of using a DB is that there is a lot of data.

You can use a Java program to:

Read data files and insert into DB

Generate synthetic data

Or, you can use other languages

Using jdbc

One pattern:

Setup JDBC

Establish DB connection

Repeat

Execute query

Process result

Close DB connection

Note:

Establishing the connection can be slow:

Inefficient to make/close a connection per statement

Connection pooling is a more elegant solution

Preparing for jdbc

Packages

import java.sql.*

import javax.sql.*

In practice your IDE will do most of this for you

Drivers

System.setProperty(“jdbc.drivers”,”org.postgresql.Driver”)

Note: see notes on jdbc at:

http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build

/tutorials/jdbc/jdbc.html

DB server

Define location of server:

String dbName = “jdbc:postgresql://mod-intro-databases/<name>”

Establish a connection:

Connection dbConn =

DriverManager.getConnection(dbName, <User>, <Password>”)

Note: There are variations on this. In particular, it is common to use a property file (especially for user and password)

Name of

database

Should be fully

qualified

DB server

Define location of server:

String dbName =

“jdbc:postgresql://dbteach/<name>”

Establish a connection:

Connection dbConn =

DriverManager.getConnection(dbName, <User>, <Password>”)

Note: There are variations on this. In particular, it is common to use a property file (especially for user and password)

try {

Class.forName("org.postgresql.Driver");

} catch (ClassNotFoundException ex) {

System.out.println("Driver not found");

}

System.out.println("PostgreSQL driver registered.");

Connection conn = null;

try {

conn = DriverManager.getConnection(url, username, pass);

} catch (SQLException ex) {

ex.printStackTrace();

}

if (conn != null) {

System.out.println("Database accessed!");

} else {

System.out.println("Failed to make connection");

}

Connection

It is usual to embed statements in a try-catch-finally block

Where the finally closes the connection

SQLException - later

Connection dbConn …..

try

{

<code to use DB>

}

finally

{

dbConn.close()

}

http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/jdbc/jdbc.html

http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/jdbc/jdbc.html



Executing Statements

To execute a statement you create a Statement object

Or

PreparedStatement object

PreparedStatements are preferred

These provide methods which allow you to manipulate and query the data base.

Query is specified: Statement: when

executed

PreparedStatement: when object created

createStatement

Statement stmt=dbConn.createStatement()

This creates a new Statement object

The SQL statement is provided as a string parameter later (in the call)

It is common to re-use the Statement object for successive SQL statements

But you obviously lose the result of the previous request

Can have multiple Statement objects

Executing statements

execute(String)

Executes any SQL

statement executeUpdate(String)

For insert, update, delete etc.

executeQuery(String)

For queries

It is more convenient to use executeUpdate or executeQuery as appropriate

See API for details

UPDATE, DELETE, INSERT + DDL

Result is the number of rows affected

Of course, we would build this string dynamically, in practice.

Using values read or computed

Statement stmt=dbConn.createStatement();

n = stmt.executeUpdate(“INSERT INTO Students VALUES (28,’28/1/09’, ‘reh’, ‘CS’)”

Or (better)

n = stmt.executeUpdate(“INSERT INTO Students”

+” VALUES (28,’28/1/09’, ‘reh’, ‘CS’)”)

Or (even better)

String SQLcommand =

“INSERT INTO Students”

+ “ VALUES (28,’28/1/09’, ‘reh’, ‘CS’)”;

n=stmt.executeUpdate(SQLCommand);

executeQuery

Takes the query as a string parameter

Returns a resultSet object Table

Iterate through rows using next() Note: usage of next

Access field using position or name Note: position starts at

1

Note: <Type>

ResultSet rs = stmt.executeQuery(

“SELECT *”

+ “ FROM Students”);

while (rs.next())

{ sid=rs.getInt(1);

sid=rs.getInt(“sid”)

}

executeQuery

Takes the query as a string parameter

Returns a resultSet object Table

Iterate through rows using next() Note: usage of next

Access field using position or name Note: position starts at

1

Note: <Type>


“SELECT *”

+ “ FROM Students”

+ “ WHERE sid = “

+ selectedSID);

while rs.next()

{ sid=rs.getInt(1);

sid=rs.getInt(“sid”)

}



PreparedStatements

Allows a SQL statement to be parameterised:

More efficient

If it’s run many times

More secure

May be more convenient:

Ie. generic command with parameters supplied by user

quotes

Provide the SQL statement when the object is created

Use’?’ to mark

parameter

Use set methods to set parameter

Type defined in name

Specify by position

setInt (1, 123)

Example

PreparedStatement studentQuery = dbConn.prepareStatement(

“SELECT *”


+ “ WHERE sid = ?”);

studentQuery.setInt (1, 123);

ResultSet rs = studentQuery.executeQuery();

In all respects this is the same as a statement Except:

SQL statement defined in constructor

So, method calls do not specify

Analysed only once, when created

Nb: SQL injection

SQL Injection

User types in their sid: 12345

Executes: SELECT * FROM Students WHERE sid =‘12345’

What if they type: 12345’ OR ‘1’=‘1

12345’; DROP TABLE Students; SELECT * …….’

12345’ --

Estimates: ~90% data breaches due to

SQL injection

60-80% of websites found to be vulnerable

There are similar problems with other tools e.g. Javascript


“SELECT *”


+ “ WHERE sid = ‘ “

+ stringReadFromUser

+ “’ ”);

SQL Injection & PreparedStatements

If you use PreparedStatements then you are safe from SQL injection

Unless you contrive to make it vulnerable

E.g. by constructing the command string using user input

PreparedStatements are a win-win:

Safer

Faster

Clearer

Metadata

It is possible to obtain metadata about:

The DBMS e.g. its capabilities

The DB e.g. the tables used

A resultSet e.g. the number of rows and columns, their types or names

Cf. reflection in Java

This can be useful for many reasons: To build applications that

are robust to differences between DBMS

To build general purpose tools

To allow for changes to the DB e.g. Changes to attribute

names

Changes to the number of attributes

To adapt to changes or to detect and report an error

Scrollable, updateable, dynamic & cached resultSets

The default resultSet only allows forward movement through a static table: This may be inconvenient

It may be possible to have more convenient access But it may not:

Depends on DBMS/drivers and query

And it may be inefficient!

Scrollable – move to

any position

Updateable – modify

resultSet and write changes back into DB

Dynamic – have

resultSet updated to reflect changes in DB

CachedRowSets –

allows work offline



SQLException

Errors are signalled as Exceptions as you would expect: And should be

caught and handled appropriately

You should have multiple catch clauses for different exception classes

SQLExceptions are unusual (compared to most java Exceptions):

Ideally, you use the information in the exception object to decide what action to take.

Close

You can (should) explicitly close resultSets, statements and connections when

they are no longer needed

Closing a connection closes all its statements

Closing a statement closes all its resultsets cached rowsets

Transactions

There are some compound operations which must be performed in their entirety (as though they were atomic) E.g. move money from

one account to another

If they are partially completed then: The DB is inconsistent

There are real-world consequences

But:

What if the application crashes part way

Or a communications failure occurs

The DB server crashes

Another user interferes

e.g. concurrently accesses these accounts

We cannot complete the operation

etc

The problem

Find Balance

Acc1

Subtract

transfer

Store new

value

Find Balance

Acc2

Add

transfer

Store new

value

Server

Crash

The problem (2)

Find Balance

Acc1

Subtract

transfer

Store new

value

Find Balance

Acc2

Add

transfer

Store new

value

Find Balance

Acc1

Store new

value

Subtract

Transfer

Find Balance

Acc2

Store new

value

Subtract

Transfer

So, if we can have multiple concurrent

accesses then we can have a problem

Transactions

The DBMS will provide support through transactions Guarantees correct

behaviour ACID properties

Implements mechanisms that ensure correct behaviour and robustness Possibly through locks and

logs

… there are other ways

No need (usually) to worry about implementation!

Note: this abstraction from the underlying mechanism has some similarity to the way Java provides synchronisation of threads

In JDBC:

By default each operation is an independent transaction: Conn.setAutoCommit (false)

The program must now explicitly handle transactions

To commit a transaction Conn.commit()

The operations are now permanent

To abort a transaction: Conn.rollback()

The operations since the last commit are undone

(As usual) there is the option for finer grain control e.g. named checkpoints



Transactions – why?

Why do we need transactions?

Concurrent access Across multiple

applications

Maintain correct behaviour

Robustness against failure Hardware

Software

Communications

Queries (updates etc) may fail part way through:

E.g. Violated constraints

An application may abort a transaction part way through

E.g. payment failure

Transactions – how?

Typically:

Each transaction maintains it’s own copy of the database.

(conceptually)

If the transaction is successful:

Write the changes through:

They are committed in the permanent database

If the transaction is unsuccessful:

These changes are rolled back

Failure can be for several reasons:

Explicit or implicit failure

Conflict with other transactions

Deadlock detection

Transactions & performance

Classically, the handling of transactions is divided as:

Optimistic

We assume there won’t be a problem:

If there is we detect & fix it

E.g. we detect dirty reads (that data has changed) and repeat the transaction

Pessimistic: We assume there is

likely to be a conflict: So we prevent it

happening

E.g. we lock parts of our database until the transaction is committed

[And there are stages in between]

Transactions & performance (2)

The best DBMSs use very sophisticated techniques to:

Deliver correct behaviour

High performance

[The implication of this is that some use naïve methods which perform poorly or incorrectly]

Usually, the programmer can use their knowledge to achieve finer grain control:

To improve performance

Transactions & Performance (3)

Transactions inevitably come with a performance overhead:

If you don’t need them turn them off:

Read only databases

Maintenance operations

E.g. batching updates will be much quicker than individual transactions for each update [Actually, this is for

several reasons not just transaction overhead]

Transactions (finally)

Transactions provide an abstraction from how the DBMSs implements the correct behaviour.

The efficiency with

which different systems implement this varies

The programmer can have fine grain control over how this works

But implementations vary



An aside ….

Different DBMS implementations have different goals: So you need to

choose appropriately

You also need to consider what is important: Cost?, Speed?

Robustness? ….

One critical factor is the profile of your data & operations:

Large amount of data?

Frequent writes?

Large amount of concurrency?

Of what type?

Availability requirements etc.

And another …

For reliability:

Ups, multiple power/network connections …

Hot backups

…..

For performance:

Multiple servers

Distributed DB?

How?

Partitioned database?

Summary

JDBC provides a convenient and efficient mechanism to access databases from Java programs

Use the strengths of Java & DBMS appropriately

JDBC programs are (fairly) portable across DBMSs So stick to standards

Java programs can be used to populate databases with real or synthetic data for testing etc.

Connection can be expensive and connections are precious

Prepared statements are preferred to statements security

There is limited type checking so greater care is needed

Metadata is accessible

There may be more convenient ways to perform some operations

There are better ways of doing this than raw JDBC hibernate

Care should be taken to make code readable

See online documentation for much more detail.

Lecture 12,13,14 DataBase Design

Background

What is involved? Phases

Methodologies and notations

Data Base Design What is involved?

The bigger picture

ER Diagrams For large data bases

Mapping to DB Schema

Example

Background

Data bases are part of many information systems: Existing data bases

New data bases

They may serve one purpose: … but often they serve

several

… and often their integrating role is as important

It is necessary to describe data bases:

As part of the design process

To document an existing system

As part of the modification and extension of existing systems

What is Involved?

There are many aspects e.g.:

Requirements analysis

Design

Conceptual/architectural/detailed …

Coding

Testing

Maintenance

Documentation

Project management

These may be: Undertaken at different

levels of abstraction/detail

Iterative

The emphasis will vary depending on the application

There is little that is special about DB design So lessons & techniques

carry across



Methodologies and Notations

Methodologies: Approaches to the task

of: Analysing the problem

Creating solutions

Testing the result

Etc.

Usually include some heuristics and some metrics

Different methodologies for different phases/classes of problem etc

Notations:

A means of describing:

The problem

The solution

The process

Etc.

Provide a means to:

Record

Communicate

Database Design

A database is usually part of some larger system: Capture data requirements

Capture constraints

Non-functional requirements: Reliability, consistency,

extensibility, security, performance …

These drive design, analysis, testing etc. of DB component and integrated systems

Need to generate a description of the data, its organisation and its mapping onto the data base system

There are, obviously, many different ways to meet a set of requirements. Aim is to optimise the soft constraints on the design: Maximise

generality/extensibility

Maximise maintainability

Minimise redundancy

… And many more

The balance is problem specific

DB Design

There are many different methodologies that can drive this process and

notations that can be used

Almost universally a data design is described using: Entity Relationship

(ER) diagrams Capture conceptual

design

Data model

These can be easily mapped to data base schema

They may be partitioned for large complex designs

Entity Relationship Diagrams

Components of ER diagrams: Oval – attributes

Rectangle – entity set

Diamond – relation set

Lines – showing links

Also: Labels on lines

indicating role

Cardinality

Double outlined boxes show weak entity set

Notes

There are a lot of detail variations Cardinality and

optionality Explicitly

As annotations to the diagram

Some different conventions about details E.g. primary key

There are additional features which can be used: Colour coding

Type information

Derived values

Super/subclasses

Constraints

…. But keep it simple for now

Attributes & Entity (sets)

Attributes written in ovals

Primary key underlined Dashed in weak entity sets

Can have composite attributes e.g. address

May be derived e.g. age Written with dashed oval

May be multivalued drawn with double outline

Entity set written in rectangle

[Notice: we are not interested in some details at this stage!]

sid

Student

login dob course

sid

Student

login dob course age

address

road city country



Relationships

Association between entities

Name inside diamond

Lines linking to:

Entity sets

Attributes

Relations

Roles can label the links if necessary:

sid

Student

login dob course

tutors

tutors Tutored-by

Cardinality Restrictions

We can describe constraints on the participation of entities in a relation. For instance: A student must have at

least 1 and possibly several tutors

A tutor can tutor 0 or several students

A car must have exactly 1 keeper (but that keeper may have >1 cars)

Every driver must have exactly one licence

A customer can have 0 or more accounts

What you are trying to express is: Whether an entity must

participate (total) or whether it is optional (partial)

If it participates, what are the limits on the number of times it can participate – max

So we normally end up specifying: Minimum – 0 or 1

(typically)

Maximum – 1 or many (typically)

Cardinality in ER Diagrams

There are three main ways these are typically used:

Cardinality Representation:

Connectivity Ratio Representation:

Crow’s Feet: Same interpretation as connectivity ratio, different

syntax

BOOK PUBLISHER PUBLISHED (0,N) (1,1)

BOOK PUBLISHER PUBLISHE

D 1 N

BOOK PUBLISHER PUBLISHED

Cardinality

There are also (many) other ways to describe this.

The clearest way is to explicitly label the maximum and

minimum participation of the entity in the relation

SB (1,L)

(0,M)

TB

(1,N)

T

R

S SA

TA

X U

UB

UA

Weak entity sets

A weak entity set is one where the entities cannot (necessarily) be distinguished – there is no key

Drawn using double lined boxes (for entity set and relationship)

They can only be identified uniquely in the context of an owner: E.g. dependents of

an employee

Emergency contacts for a student

Note:

The critical point is an overview of the data design Additional detail tends to

obscure this And can be stored

separately e.g. in the design tool or separate documentation

But, more importantly, we want to describe the conceptual design!

There will be several ways to solve the problem Consider alternatives

Choose the most appropriate

There is more than we have covered.

Remember

Don’t duplicate data:

Space etc

Consistency

Unpack data:

E.g. address

Drive by needs of:

operations/users

Design for the future



Mapping from ER diagrams to database design

The mapping from ER diagrams to a DB schema can be straightforward:

Almost mechanical

Some judgement & refinement is needed

The rules that follow should be seen as guidelines – you

may want to preserve more of the structure at a cost in terms of space & computation

As usual there are different ways to do this

The design is mapped to:

Tables

Constraints within those tables

Reflect modifications back into the ER diagram

Entity sets

These map straight into tables

We know

The table name

Attributes and their names

Primary key

For composite include the simple components

Or, maybe create a table

We should also know:

Domain of each attribute

Any other constraints

For weak entity sets

As for the normal case but:

Include the key of the owner entity

Note:

Since the weak entity cannot exist without the owner:

Deletion of the owner must be propagated to the weak entity:

ON DELETE CASCADE

Relationship sets

Here the obvious approach is to map to a table: This is necessary in some

cases

This does have the virtues of: Maintaining the

structure

Avoiding unnecessary attributes in other tables

But there can be a cost in the operations

The ‘standard’

approach is to avoid creating an unnecessary table if possible

1:1 relationship

For each 1:1 relationship

add key of one entity as foreign key of other.

If possible, choose the recipient to be totally participating in the relationship.

Move attributes of the relationship to that entity

(1,1)

(0,1)

T

R

S SA

SB

TA

X

TB

S(SA, SB)

T(TA, TB) T(TA, TB)

Foreign

key

S(SA, SB, X, TA)

1:N relationship

For each 1:N relationship

add key of the entity on the N side as foreign key of the other.

Move attributes of the relationship to that entity

(1,N)

(1,1)

T

R

S SA

SB

TA

X

TB

S(SA, SB)

T(TA, TB)

S(SA, SB)

Foreign

key

T(TA, TB, X, SA)



M:N relationship

For each M:N relationship add a new relation

containing the keys of the

participating entities as foreign keys

The attributes of the relationship

The key of the new relation is the combination of the foreign keys

Foreign

keys

R(SA, TA, X)

(1,N)

(1,M)

T

R

S SA

SB

TA

X

TB

S(SA, SB)

T(TA, TB)

T(TA, TB)

S(SA, SB)

N-ary relations

For each n-ary relationship

as M:N relationships

The key is all foreign keys.

If one of the values is 1

the key can be reduced to just that foreign key

Foreign keys

S(SA, SB)

T(TA, TB)

S(SA, SB)

T(TA, TB)

R(SA, TA, UA, X)

U(UA, UB)

SB (1,L)

(0,M)

TB

(1,N)

T

R

S SA

TA

X U

UB

UA

U(UA, UB)

Mapping multi-valued attributes

For each multivalued attribute add a new table

containing: an attribute

representing the attribute itself

The containing entity’s primary key as a foreign key

The key is the foreign key and one or more of the attributes

Foreign

key

S(SA, SB)

S(SA, SB)

SM(SA, SM)

S

SA

SB

SM

Summary

ER diagrams are the standard way to describe the conceptual structure of a data base.

There are many detail variations on the representation

There is more than we have covered

But this is the core

Mapping from ER diagrams to a database design is straightforward

There are rules that can drive it

Refinement of the design may occur during this mapping

Remember: this is only part of a larger process

Lecture 15 & 16

Case Study

A data base for a car hire system

Example - revisited

Car Hire Data Base

A car hire company needs to build a data base to manage its rentals:

Every rental must record the date and time at which the rental started and finished, the details of the driver and the details of the car rented. Each car has its registration number recorded, the date it was purchased and a record of all maintenance work undertaken. For a driver, the company needs to have their name, driver’s licence number, address and at least one telephone number.



A solution

start DT end DT

Purchase Date Driver Car Rental

Maint

Reg No

DL number

(0,n) (0,n)

Maint.Action

(0,n)

(1,1)

Nos

Address

Street

Cost Provider

Family Name

City Country

Name

Description

First Name Title

A solution

start DT end DT

Purchase Date Driver Car Rental

Maint

Reg No

DL number

(0,n) (0,n)

Maint.Action

(0,n)

(1,1)

Nos

Address

Street

Cost Provider

Family Name

City Country

Name

Description

First Name Title

Is this a weak

entity or should we have an Id:

Mid?

Is it sufficient to specify driver & car? Do we need a unique id?

There could be Others!

Is this okay?

Shall we Generate

a unique Id?

Map this to tables Example – Let’s do it better!


1 23/2/1989 ttt Cs

2 11/3/1989 rfd Se

3 22/7/1998 ggg Se

4 7/11/1998 hjk M/Cs


1 277 23

2 277 65

1 264 78

2 266 52

3 277 99



sid INTEGER,

dob CHAR(10),

login CHAR(20),

course CHAR(10)

)

Query Language

SELECT sid, login

FROM Student


Tuple

(or row or record)

Column



Summary

This was an introduction to database design (really only ER diagrams and mapping to DB), SQL and JDBC. See Database course

Documentation etc

SE methodologies

There are a lot of details which cannot be covered But ….

Data bases are just a tool for data representation They should serve the

larger system rather than drive it

Databases are part of many ISs

Because they are already there

Because they provide a useful tool

DBMSs provide a lot of essential services

so using them is sensible

SQL is declarative and efficiently implemented

Use SQL rather than Java

JDBC is well designed

Simple to use

Powerful and flexible

… but there are probably better,

more abstract, approaches which

can be used on large problems

Lecture 17 & 18

Introduction to Web applications

Introduction

HTTP

Servlets

JSP

Session Tracking

Security issues



Client Server Systems

Consider running Java programs:

Application

Applet

Client-Server models

Application

Installed on local machine

Runs on local machine

Accesses resources on local machine

[Of course - it’s never as simple as this!]

Applet

Runs in ‘Browser’

Runs on local machine

Downloaded from server

Accesses remote resources [and possibly local ones]

Once held up as the paradigm for delivering applications

Appropriate when need local processing and interactivity

Issues when [local or remote] data must be manipulated (e.g. security)

Client Server Systems

Split applications Client

user of services

Server provider of services

At its simplest Client interacts with user

Server provides services

Classic business model is 3 tier - client/server/DB

Examples

File-server

Name server

Web server

Display server e.g. Xwindows etc.

Mail server

DB server

Etc. etc.

Web services


Requests

resources



... And usually:


Requests

resources

DataBase

Server

... And usually:


Requests

resources

DataBase

Server

JDBC

?

Firewall

Actually more complex than this, but aim is to restrict access to critical components

Mechanisms are general – not just http and browsers

HTTP

Protocol Requests:

get, post, put, head, delete, trace, options, connect

URL

Parameters

Deliver resources: Text (e.g. HTML)

Images

Javascript

Etc

Again, it is usually more complex than this

Webserver

Receives requests: Parses request

Processes request

Returns response: Deliver local resources

Re-direct

Error

Often there is complex processing: Build response:

Computation

DB access

Presentation

Webserver

Initially delivered static resource

File – HTML, Images etc

Need to be able to generate dynamic content:

dependent on state

for efficiency

Need to process input

Server side processing

Historically, CGI provided a mechanism to integrate server side processing Web server could invoke arbitrary programs

Typically used a scripting language – e.g. Perl

Could integrate with databases etc

But this is expensive: Load & start application for each request

Request is, typically, very small



Java server side processing

Java provides two primary mechanisms for implementing server side processing

Servlets

JSP

These are actually just different views on the same mechanism

Provide full power of Java

Classes to support this application

Efficient for light weight processing

[Note: there are many alternatives – both in Java and other frameworks]

Servlets

Java code that implements server side functions

Invoked by web server to handle requests

Specified by URL

Access to parameters of HTTP request

Performs processing

(Typically) generates a page as response

[Actually much more general than this]

What happens when servlet is invoked?

Lifecycle:

[Initialise] init (ServletConfig config)

called once to initialise servlet

Call servlet to service request service (ServletRequest request, ServletResponse response)

typically runs in a new thread

[Cleanup and close] destroy()

So, it’s lightweight:

little overhead for each request

So …..

The servlet is initialised once - typically when first invoked

Each request is handled within a separate thread So code should be thread safe!

See SingleThreadModel

Cleanup takes place when servlet no longer required

The aim:

To create a lightweight mechanism

With the full power of Java

To create a general mechanism

Easy to build standard applications:

Framework

Handling of low-level details provided:

Request loop, dispatching, io objects …

Parameter parsing …

So, how do we do it?

Create a class to implement the servlet This must extend one of:

GenericServlet

HttpServlet

These implement the interface Servlet

These two (abstract) classes provide default implementations of methods

Over-ride the definition of some methods to implement required function



Interface Servlet

void init (ServletConfig config)

ServletConfig getServletConfig()

String getServletInfo() void service (ServletRequest request, ServletResponse response)

void destroy()

HttpServlet

Provides default implementations of methods

Methods to handle requests void doGet(HttpServletRequest request, HttpServletResponse response)

void doPut(HttpServletRequest request, HttpServletResponse response)

etc

At its simplest:

over ride implementations of request handler(s)

Parameters

HttpServletRequest interface has methods to obtain parameters

Enumeration getParameterNames()

String getParameter(String name)

String[] getParameterValues (String name)

and many more ……..

Output

HttpServletResponse interface provides methods to handle the response from the servlet void setContentType (String type)

PrintWriter getWriter()

ServletOutputStream getOutputStream()

and many more ……..

Skeleton for servlet

import javax.servlet.*

import javax.servlet.http.*

import java.io.*

public class myServlet extends HttpServlet

{

void doGet (HttpServletRequest request, HttpServletResponse response)

throws ServletException, IOException

{

response.setContentType (“text/html”);

PrintWriter out =response.getWriter();

.

.

out.close()

}

}

import java.io.*;

import javax.servlet.*;

import javax.servlet.http.*;

public final class Hello extends HttpServlet

{

public void doGet(HttpServletRequest request, HttpServletResponse response)

throws IOException, ServletException

{

response.setContentType("text/html");

PrintWriter writer = response.getWriter();

writer.println("<html>");

writer.println("<head>");

writer.println("<title>Test Servlet</title>");

writer.println("</head>");

writer.println("<body>");

writer.println("<h1>Test Servlet</h1>");

writer.println("This is a minimal test servlet.");

writer.println("</body>");

writer.println("</html>");

}

}



Output

<html>

<head>

<title>Test Servlet</title>

</head>

<body>

<h1>Test Servlet</h1>

This is a minimal test servlet.

</body>

</html>

Test Servlet

This is a minimal test servlet.

Accessing ServletRequest object

public class HelloDB extends HttpServlet {

protected void doGet(HttpServletRequest request, HttpServletResponse response)

throws ServletException, IOException

{

response.setContentType("text/html");

PrintWriter writer = response.getWriter();

writer.println("<html>");

writer.println("<head>");

writer.println("<title>Test Parameter Servlet</title>");

writer.println("</head>");

writer.println("<body>");

writer.println("<h1>Test Parameter Servlet</h1>");

writer.println("This is a servlet to test getParameter.");

writer.println("<br><br>It outputs the value of a parameter (testparam)");

writer.println("<br> <br> Parameter: " + request.getParameter("testparam") + "<br>");

Simple Pattern: Request-response cycle

User calls URL

Web page is delivered including HTML form

User fills and submits form to servlet address

Servlet processes request and generates web page as a response

HTML Form

<HTML>

<HEAD>

<TITLE>A Sample FORM using GET

</TITLE>

</HEAD>

<BODY>

<H1 ALIGN="CENTER">A sample form</H1>

<FORM ACTION=“TestForm“ METHOD="GET">

Student Id: <INPUT TYPE="TEXT" NAME=“FirstParameter">

<BR>

<CENTER>

<INPUT TYPE="SUBMIT" VALUE="Submit Lookup">

</CENTER>

</FORM>

</BODY>

</HTML>

HTML Form

<HTML>

<HEAD>

<TITLE>A Sample FORM using GET

</TITLE>

</HEAD>

<BODY>

<H1 ALIGN="CENTER">A sample form</H1>

<FORM ACTION=“TestForm“ METHOD="GET">

Student Id: <INPUT TYPE="TEXT" NAME=“FirstParameter">

<BR>

<CENTER>

<INPUT TYPE="SUBMIT" VALUE="Submit Lookup">

</CENTER>

</FORM>

</BODY>

</HTML>

This specifies the URL and

method Type and name

of parameter

Submit button

Lecture 18

JSP

Scriptlets

Directives

Tags



JSP

Underlying mechanism is the same as servlets

JSP is translated into a servlet before execution

Oriented towards output rather than algorithms Output (e.g. html) with embedded code

rather than code generating output

What happens when JSP is invoked?

Translate into servlet

Initialise

jspInit

Call servlet to service request

_jspService

Cleanup and close

jspDestroy

So what is in a JSP?

Unless special action is signalled contents of page are output

Scriptlets provide a mechanism to execute java code

Directives instruct JSP container to perform actions

Tags provide a higher level mechanism to access java functionality

A minimal JSP

<html>

<head>

<title>Test JSP</title>

</head>

<body>

<h1>Our first JSP</h1>

This is a minimal (and pointless) JSP.

</body>

</html>

Scriptlets

<%= ……….%> encloses a Java expression

the expression is evaluated at execution time and its string representation is output

<%= new java.util.Date() %>

This creates a new date object and outputs its string representation as part of the page

Example with a simple scriptlet

<%@page contentType="text/html" pageEncoding="UTF-8"%>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>JSP Page</title>

</head>

<body>

<h1>Hello World!</h1>


</body>

</html>



Example with a simple scriptlet Generated Servlet

<%@page contentType="text/html" pageEncoding="UTF-8"%>

<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<title>JSP Page</title>

</head>

<body>

<h1>Hello World!</h1>


</body>

</html>

out.write("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n");

out.write(" \"http://www.w3.org/TR/html4/loose.dtd\">\n");

out.write("\n");

out.write("\n");

out.write(" \n");

out.write(" <html>\n");

out.write(" <head>\n");

out.write(" <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n");

out.write(" <title>JSP Page</title>\n");

out.write(" </head>\n");

out.write(" <body>\n");

out.write(" <h1>Hello World!</h1>\n");

out.write(" ");

out.print( new java.util.Date() );

out.write("\n");

out.write("\n");

out.write(" </body>\n");

out.write("</html>\n");

out.write("\n");

Scriptlets

<% ……….%>

encloses a fragment of Java code

this code is copied into the _jspService method

can include arbitrary Java (but obviously must be correct!)

Scriptlet in fragment of JSP

<body>

Is A greater than B?

The value of A is <%= a %>

The value of B is <%= b %>

A is

<% if (a<=b) { %>

not

<% } %>

greater than b

</body>

Scriptlet in fragment of JSP

<body>




A is

<% if (a<=b) { %>

not

<% } %>

greater than b

</body>

Fix it!

Scriptlets in fragment of JSP generated fragment of servlet

<body>




A is

<% if (a<=b) { %>

not

<% } %>

greater than b

</body>

out.write(" <body>\n");

out.write(" ");

out.write("\n");

out.write(" ");

out.write("\n");

out.write("Is A greater than B? <br> <br>\n");

out.write("The value of A is ");

out.print( a );

out.write("<br>\n");

out.write("The value of B is ");

out.print( b );



out.write("A is\n");

if (a<=b) {

out.write("\n");

out.write("not\n");

}

out.write("\n");

out.write("greater than b\n");

out.write("</body>\n");

out.write("</html>\n");

Scriptlets

<%! ………..%>

encloses a Java declaration

Variables are declared as instance variables of the servlet class

Methods belong to the servlet class



Comments

<%-- ……….. --%>

JSP comments

Java comments can be used inside scriptlets

HTML comments can be used to generate comments within output page

Directives

These are like compiler directives

They control translation time options

Indicated by

<%@ …….. %>

Directives

page set attributes of this JSP e.g. errorpage, isErrorPage, contentType

include substitute the contents of the file

<%@ include file = “ foo.html” %>

taglib specifies additional tag libraries to use

At compile time!

Tags

Provide a convenient mechanism to support common actions

There are standard tag libraries to support standard actions

A programmer can define their own tag libraries

Use eg. <jsp:action> …….. </jsp:action>

Tag library - jsp

<jsp:include>

<jsp:forward>

<jsp:param>

<jsp:useBean>

……..

Jsp:include

Includes dynamic content

<jsp:include page=URL flush=true/>

This will include the content returned from the specified resource

E.g. a servlet

Dynamically at execution time



jsp:forward

Forwards the request to the specified URL

<jsp:forward page=URL/>

Terminates the current JSP and transfers the request to the specified resource

jsp:param

Supplies parameters to the tag

<jsp:param name=“ParamName”

value=“ParamValue”/>

Parameters

<jsp:forward page=“foo.jsp”>

<jsp:param name=“mode”

value=“brief” />

</jsp:forward>

Implicit objects

There are a several objects which are implicitly created and which the jsp can access

application

config, exception, out, response …

request

session

JSP vs Servlets

Use Servlets when you are mostly writing algorithms

Use JSP when mostly generating output

A typical application will utilise both Servlets and JSP Separate logic and presentation

Model 2

Example of MVC pattern

Lecture 19

Feedback on Exercise 1

Discussion of Exercise 2



Lecture 20

Session Tracking

Hidden forms

URL rewriting

Cookies

Session Tracking

HTTP

HTTP is a stateless protocol No session – each request is independent

Appropriate when just retrieving resources: Request

Response

Efficient and simple: Server (and client) do not need to manage

connections

A good solution for web browsing!

Requirements for more sophisticated systems

Need to track a user within a session: Transaction requires multiple requests

Need to pass state between requests

e.g. User login

Non-trivial transactions

shopping basket

Need to track a user across sessions: Personalise

Track usage

How?

Four main options [Hidden] forms

URL rewriting

Cookies

[Sessions]

Notes:

only first two are guaranteed to work

This is not Java specific

JavaScript, Applets, images and many other techniques can also be used to track users

For example: Registration & Login

Used to:

Provide some security

Identify user

Can be counter productive:

Reluctance to register

Not (necessarily) secure

Really needs user tracking over session

We don’t want to login to every page

URL rewriting

Embed state information in the URL of links of the response:

Usually include parameters to carry state

Can use Address in simple cases

e.g. language

Reliable - doesn’t rely on protocol version,

browser, security settings

Cumbersome, session only, some problems (e.g. back)



Hidden forms

Embed state in hidden form in response: This information is submitted with next

request

Reliable - should (nearly) always work

Session only

Again, some problems with e.g. back, cumbersome, session only

Cookies

Cookies are embedded in the HTTP request and response

Cookies are stored on the client

Cookies are transmitted with each request from the client (e.g. browser)

Cookies are name value pairs

The value is a string

Cookies

A cookie is stored with:

Domain

Normally the domain that set the cookie

Path

By default, the path that set the cookie

A cookie will be sent with requests that match its domain and path

Cookies

Convenient

May not always work: Security concerns:

Browser, proxy, firewall

Stored locally – so machine specific

Requires cookie support

So should allow for failure

Can work across sessions

Java support for cookies

Servlets and JSP provide convenient support for cookies

Accessible through HTTPServletRequest

To get cookies

Accessible through HTTPServletResponse

To set cookies

Java support for cookies

Cookie myCookie (“Language”, “English”)

response.addCookie(myCookie)

Cookie[] myCookies = request.getCookies()



Methods for Cookies

getName()

getValue()

getDomain()

getPath()

getMaxAge()

…...

setName(String)

setValue(String)

setDomain(String)

setPath(String)

setMaxAge(int)

……….

Sessions

Sessions provide a reliable abstraction to support session tracking:

Normally implemented through cookies

Fallback to URL re-writing

Session is identified by a session ID

Java support for sessions

Sessions are manipulated through an HttpSession object

HttpSession mySession=request.getSession()

Attributes can be objects (not just strings)

setAttribute(name, value)

getAttribute(name)

getAttributeNames()

Using Sessions and Cookies

HttpSession session = request.getSession(true);

ShoppingCart currentItems = (ShoppingCart)session.getValue(“currentItems");

Sessions have a finite life

For ‘standard’ applications (e.g. Shopping Carts) you should probably use a class library rather than implementing your own

Lecture 21

Normalisation

A word about query execution

Conclusions

Data Base Normalisation

In lectures 15 & 16 we went through some examples of how to design a good database

This took an approach of good software design

Data base normalisation is a more formal approach to the same problems:

If you are a good designer you should end up withy a normalised database



Why should we normalise our database?

The key point is that we want to remove functional dependencies from a table

The problems that can arise without normalisation are usually categorised as:

Insertion anomalies

Update anomalies

Deletion anomalies

Example

Empid name job salary project manager plength pcost

1 Jones Engineer 40000 P1 Hands 5 50000

2 Smith PR 35000 P1 Hands 5 50000

1 Jones Engineer 40000 P2 Michaels 8 40000

2 Smith PR 35000 P3 Lead 2 100000

5 Adams Sec 20000 P4 Walker 11 200000

6 Watson Engineer 50000 P4 Philips 11 200000

Insertion anomalies

What happens if we insert a new employee?

But they:

Are not allocated to a project

They don’t have a manager

Update anomalies

What happens if we change:

Jones Salary?

Deletion Anomalies

What happens if:

Jones leaves

Smith leaves?

So what are normal forms?

1NF

2NF

3NF

BCNF

4NF

5NF

…….

Typically, these all work by:

Identifying

Removing

Functional dependencies that exist in the data

By creating new tables & unpacking the data



Is this a good idea?

Of course!

Data is only represented once

Consistency is improved

Maintenance is easier

……

No!

It slows everything down:

Need more joins

Data modelling is more complex

I don’t need it!

…..

So what’s the truth?

It all depends …..

For a large, corporate system

At least 3NF is essential

E.g. HR system

Many insertions, deletions, updates

Consistency is essential

For small systems?

Is the cost of design worth the benefits?

For systems without updates?

Where efficiency is critical

So, de-normalisation is often important

Efficiency

What actually happens when we run a query?

Consider a Java example

for (int i=1;i<10000;i++)

x=3*12*17*42;

The compiler will:

• Analyse code

• Generate byte code

• Generate machine code

We can optimise based on:

• Static analysis

• Empirical knowledge

• Knowledge of target architecture

• Remember

• optimisation comes with several costs!

• Performance will vary with implementation & switches

So, for a Data Base query?

Again:

• Analysis

• Query planning

• Optimisation

• Execution

Query planning & optimisation:

• [Remember SQL is (mostly) declarative]

• Static analysis

• Optimise to minimise computation

• Cost estimation:

• Compare estimates for different query plans • On this database

Tuning

The performance of a query will depend on many things:

• DBMS implementation

• Tables

• Size

• Constraints, transactions and so on

Techniques to improve performance

DBMS dependent

So:

query A may run much faster than query B on system 1

But the opposite on system 2



Choosing a technology

For corporate data bases:

Oracle

SQL server

Postgres

Will usually have best performance, correctness, robustness

For more specialised applications:

Read only

Real time/power constraints

Massive data

Then others will be better

NoSQL, Hadoop etc.

…. finally

Good design is key:

But this depends on:

Requirements

Fast

Consistent

Reliable

Scaleable

Data size

requests

……

Environment

Whilst conventional DBMSs are a no-brainer for most corporate applications:

Other technologies are more appropriate elsewhere.

See 3rd year module