cs 405g: introduction to database systems lecture 4: relational model instructor: chen qian

30
CS 405G: Introduction to Database Systems Lecture 4: Relational Model Instructor: Chen Qian

Upload: gabriel-tyler

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

CS 405G: Introduction to Database Systems

Lecture 4: Relational ModelInstructor: Chen Qian

04/19/23 2

Review

A data model is a group of concepts for describing data.

What are the two terms used by ER model to describe a miniworld? Entity Relationship

What makes a good conceptual database design1010111101

?

204/19/23

04/19/23 3

Today’s Outline

Relational Model

Relational Model and Relational Database Schemas Informal definition, not so formal, and formal

Relational Model Constraints

04/19/23 4

Why Study the Relational Model?

Most widely used model. “Legacy systems” in older models

e.g., IBM’s IMS Object-oriented concepts merged in

“Object-Relational” model Early work done in POSTGRES research project at Berkeley

XML features in most relational systems Can export XML interfaces Can embed XML inside relational fields

04/19/23 5

Historically

The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper:"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.

The above paper caused a major revolution in the field of Database management and earned Ted Codd the coveted ACM Turing Award.

The picture is from wikipedia

04/19/23 6

Database Design

04/19/23 7

Relational Model Concepts

Relational database: a set of relations. Relation: made up of 2 parts:

Schema : specifies name of relation, plus the name and type of each attribute. E.g. Students(sid: string, name: string, login:

string, age: integer, gpa: real) Instance : a table, with rows and columns.

#rows = cardinality #fields = degree / arity

04/19/23 8

Relation

RELATION: A table of values A relation may be thought of as a set of rows (table view). Each row represents a fact that corresponds to a real-world

entity or relationship. Each row has a value of an item or set of items that uniquely

identifies that row in the table.

04/19/23 9

Relation

Sometimes row-ids or sequential numbers are assigned to identify the rows in the table.

A relation may alternately be thought of as a set of columns (schema view).

Each column typically is called by its column name or column header or attribute name.

04/19/23 10

A (Slightly) Formal Definition

A database is a collection of relations (or tables) Each relation is identified by a name and a list of

attributes (or columns) Each attribute has a name and a domain (or type)

Such as SID:string Set-valued attributes not allowed

Simplicity is a virtue!

Schemas

11

Relation schema = relation name + attributes + types of attributes in order Example: Beers(name, manf) or Beers(name: string, manf:

string)

Database = collection of relations. Database schema = set of all relation schemas in the

database.

04/19/23 12

Schema versus instance

Schema (metadata) Students(sid: string, name: string, login: string, age:

integer, gpa: real) Specification of how data is to be structured logically Defined at set-up; Rarely changes

Instance Content Changes rapidly, but always conforms to the schema

Compare to type and objects of type in a programming language

Entity and entity type?

04/19/23 13

Example

Schema Student (SID integer, name string, age integer, GPA float) Course (CID string, title string) Enroll (SID integer, CID integer)

Instance { 142, Amy, 20, 3.3, 123, Bob, 22, 3.1, ...} { CS405G, Intro. to Database Systems, ...} { 142, CS405G, 142, CS314, ...}

04/19/23 14

Formal Definition (Set Theory)

Formally, given sets D1, D2, …. Dn a relation r is a

subset of D1 x D2 x … x Dn

x: Cartesian product For sets A and B, the Cartesian product A × B is the set of

all ordered pairs (a, b) where a ∈ A and b ∈ B.

Thus, a relation is a set of n-tuples (a1, a2, …, an) where

each ai Di

04/19/23 15

Example

Example: If

customer_name = {Jones, Smith, Curry, Lindsay, …}

customer_street = {Main, North, Park, …}

customer_city = {Harrison, Rye, Pittsfield, …}

Then r = { (Jones, Main, Harrison), (Smith, North, Rye),

(Curry, North, Rye), (Lindsay, Park, Pittsfield) }

is a relation over

customer_name × customer_street × customer_city

04/19/23 16

Attribute Types

Each attribute of a relation has a name, designating the role of the attribute

The set of allowed values for each attribute is called the domain of the attribute

Attribute values (domain members) are required to be atomic; that is, indivisible E.g. the value of an attribute can be an account

number, but cannot be a set of account numbers Domain is said to be atomic if all its members are

atomic The special value null is a member of every domain

04/19/23 17

Relation Schema

A1, A2, …, An are attributes

R = (A1, A2, …, An ) is a relation schema

Example:

Customer_schema = (customer_name,

customer_street, customer_city) r(R) denotes a relation r on the relation schema R

Example:

customer (Customer_schema)

04/19/23 18

Relation Instance

The current values (relation instance) of a relation are specified by a table

An element t of r is a tuple, represented by a row in a table

JonesSmithCurryLindsay

customer_name

MainNorthNorthPark

customer_street

HarrisonRyeRyePittsfield

customer_city

customer

attributes(or columns)

tuples(or rows)

04/19/23 19

Definition Summary

Informal Terms Formal Terms

Table Relation

Column Attribute/Domain

Row Tuple

Values in a column Domain

Table Definition Schema of a Relation

Populated Table Extension

04/19/23 20

Characteristics of Relation

The tuples in a ration r(R) are not considered to be ordered, even though they appear to be in the tabular form.

We consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .

All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.

04/19/23 21

Characteristics of Relation

Notation: we refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t).

Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.

04/19/23 22

Relational Integrity Constraints

Integrity Constraints are conditions that must hold on all valid relation instances.

There are four main types of constraints:1. Domain constraints

1. The value of a attribute must come from its domain

2. Key constraints

3. Entity integrity constraints

4. Referential integrity constraints

Primary Key Constraints

A set of fields is a candidate key (abbreviated as key) for a relation if :1. No two distinct tuples can have same values in all key

fields, and

2. Property 1 is not true for any subset of the key.

What if Part 2 is false? A super key: a set of fields that contains a key.

If there are multiple keys for a relation, one of the keys is chosen (by DBA) to be the primary key.

04/19/23 24

Key Example

E.g., given a schema Student(sid: string, name: string, gpa: float) we have: sid is a key for Students. (What about name?) The set

{sid, gpa} is a superkey.

CAR (licence_num: string, Engine_serial_num: string, make: string, model: string, year: integer) What is the candidate key(s) Which one you may use as a primary key What are the super keys

04/19/23 25

Entity Integrity

Entity Integrity: The primary key attributes (PK) of each relation schema R cannot have null values in any tuple of r(R). Other attributes of R may be similarly constrained to

disallow null values, even though they are not members of the primary key.

Foreign Keys, Referential Integrity

Foreign key : Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.

Foreign key constraint: The foreign key in the referencing relation must match the primary key of the referenced relation.

E.g. sid is a foreign key referring to Students: Student(sid: string, name: string, gpa: float) Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential

integrity is achieved, i.e., no dangling references.

Foreign Key constraints

Only students listed in the Students relation should be allowed to enroll for courses.

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

EnrolledStudents

Possible violation: Add <50000, History105, B> to Enrolled. Possible violation: delete <53650, Smith, …> from Students.

04/19/23 28

Other Types of Constraints

Semantic Integrity Constraints: based on application semantics and cannot be

expressed by the model per se e.g., “the max. no. of hours per employee for all

projects he or she works on is 56 hrs per week” A constraint specification language may have to be

used to express these SQL-99 allows triggers and ASSERTIONS to

allow for some of these

Next class

Logical design of databases using the relational model.

04/19/23 29

Home work 1, due 2/6

04/19/23 30