cs 405g: introduction to database systems lecture 4: relational model instructor: chen qian
Post on 21-Dec-2015
213 Views
Preview:
TRANSCRIPT
04/19/23 2
Review
A data model is a group of concepts for describing data.
What are the two terms used by ER model to describe a miniworld? Entity Relationship
What makes a good conceptual database design1010111101
?
204/19/23
04/19/23 3
Today’s Outline
Relational Model
Relational Model and Relational Database Schemas Informal definition, not so formal, and formal
Relational Model Constraints
04/19/23 4
Why Study the Relational Model?
Most widely used model. “Legacy systems” in older models
e.g., IBM’s IMS Object-oriented concepts merged in
“Object-Relational” model Early work done in POSTGRES research project at Berkeley
XML features in most relational systems Can export XML interfaces Can embed XML inside relational fields
04/19/23 5
Historically
The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper:"A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.
The above paper caused a major revolution in the field of Database management and earned Ted Codd the coveted ACM Turing Award.
The picture is from wikipedia
04/19/23 7
Relational Model Concepts
Relational database: a set of relations. Relation: made up of 2 parts:
Schema : specifies name of relation, plus the name and type of each attribute. E.g. Students(sid: string, name: string, login:
string, age: integer, gpa: real) Instance : a table, with rows and columns.
#rows = cardinality #fields = degree / arity
04/19/23 8
Relation
RELATION: A table of values A relation may be thought of as a set of rows (table view). Each row represents a fact that corresponds to a real-world
entity or relationship. Each row has a value of an item or set of items that uniquely
identifies that row in the table.
04/19/23 9
Relation
Sometimes row-ids or sequential numbers are assigned to identify the rows in the table.
A relation may alternately be thought of as a set of columns (schema view).
Each column typically is called by its column name or column header or attribute name.
04/19/23 10
A (Slightly) Formal Definition
A database is a collection of relations (or tables) Each relation is identified by a name and a list of
attributes (or columns) Each attribute has a name and a domain (or type)
Such as SID:string Set-valued attributes not allowed
Simplicity is a virtue!
Schemas
11
Relation schema = relation name + attributes + types of attributes in order Example: Beers(name, manf) or Beers(name: string, manf:
string)
Database = collection of relations. Database schema = set of all relation schemas in the
database.
04/19/23 12
Schema versus instance
Schema (metadata) Students(sid: string, name: string, login: string, age:
integer, gpa: real) Specification of how data is to be structured logically Defined at set-up; Rarely changes
Instance Content Changes rapidly, but always conforms to the schema
Compare to type and objects of type in a programming language
Entity and entity type?
04/19/23 13
Example
Schema Student (SID integer, name string, age integer, GPA float) Course (CID string, title string) Enroll (SID integer, CID integer)
Instance { 142, Amy, 20, 3.3, 123, Bob, 22, 3.1, ...} { CS405G, Intro. to Database Systems, ...} { 142, CS405G, 142, CS314, ...}
04/19/23 14
Formal Definition (Set Theory)
Formally, given sets D1, D2, …. Dn a relation r is a
subset of D1 x D2 x … x Dn
x: Cartesian product For sets A and B, the Cartesian product A × B is the set of
all ordered pairs (a, b) where a ∈ A and b ∈ B.
Thus, a relation is a set of n-tuples (a1, a2, …, an) where
each ai Di
04/19/23 15
Example
Example: If
customer_name = {Jones, Smith, Curry, Lindsay, …}
customer_street = {Main, North, Park, …}
customer_city = {Harrison, Rye, Pittsfield, …}
Then r = { (Jones, Main, Harrison), (Smith, North, Rye),
(Curry, North, Rye), (Lindsay, Park, Pittsfield) }
is a relation over
customer_name × customer_street × customer_city
04/19/23 16
Attribute Types
Each attribute of a relation has a name, designating the role of the attribute
The set of allowed values for each attribute is called the domain of the attribute
Attribute values (domain members) are required to be atomic; that is, indivisible E.g. the value of an attribute can be an account
number, but cannot be a set of account numbers Domain is said to be atomic if all its members are
atomic The special value null is a member of every domain
04/19/23 17
Relation Schema
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name,
customer_street, customer_city) r(R) denotes a relation r on the relation schema R
Example:
customer (Customer_schema)
04/19/23 18
Relation Instance
The current values (relation instance) of a relation are specified by a table
An element t of r is a tuple, represented by a row in a table
JonesSmithCurryLindsay
customer_name
MainNorthNorthPark
customer_street
HarrisonRyeRyePittsfield
customer_city
customer
attributes(or columns)
tuples(or rows)
04/19/23 19
Definition Summary
Informal Terms Formal Terms
Table Relation
Column Attribute/Domain
Row Tuple
Values in a column Domain
Table Definition Schema of a Relation
Populated Table Extension
04/19/23 20
Characteristics of Relation
The tuples in a ration r(R) are not considered to be ordered, even though they appear to be in the tabular form.
We consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to be ordered .
All values are considered atomic (indivisible). A special null value is used to represent values that are unknown or inapplicable to certain tuples.
04/19/23 21
Characteristics of Relation
Notation: we refer to component values of a tuple t by t[Ai] = vi (the value of attribute Ai for tuple t).
Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes Au, Av, ..., Aw, respectively.
04/19/23 22
Relational Integrity Constraints
Integrity Constraints are conditions that must hold on all valid relation instances.
There are four main types of constraints:1. Domain constraints
1. The value of a attribute must come from its domain
2. Key constraints
3. Entity integrity constraints
4. Referential integrity constraints
Primary Key Constraints
A set of fields is a candidate key (abbreviated as key) for a relation if :1. No two distinct tuples can have same values in all key
fields, and
2. Property 1 is not true for any subset of the key.
What if Part 2 is false? A super key: a set of fields that contains a key.
If there are multiple keys for a relation, one of the keys is chosen (by DBA) to be the primary key.
04/19/23 24
Key Example
E.g., given a schema Student(sid: string, name: string, gpa: float) we have: sid is a key for Students. (What about name?) The set
{sid, gpa} is a superkey.
CAR (licence_num: string, Engine_serial_num: string, make: string, model: string, year: integer) What is the candidate key(s) Which one you may use as a primary key What are the super keys
04/19/23 25
Entity Integrity
Entity Integrity: The primary key attributes (PK) of each relation schema R cannot have null values in any tuple of r(R). Other attributes of R may be similarly constrained to
disallow null values, even though they are not members of the primary key.
Foreign Keys, Referential Integrity
Foreign key : Set of fields in one relation that is used to `refer’ to a tuple in another relation. (Must correspond to primary key of the second relation.) Like a `logical pointer’.
Foreign key constraint: The foreign key in the referencing relation must match the primary key of the referenced relation.
E.g. sid is a foreign key referring to Students: Student(sid: string, name: string, gpa: float) Enrolled(sid: string, cid: string, grade: string) If all foreign key constraints are enforced, referential
integrity is achieved, i.e., no dangling references.
Foreign Key constraints
Only students listed in the Students relation should be allowed to enroll for courses.
sid name login age gpa
53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8
sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B
EnrolledStudents
Possible violation: Add <50000, History105, B> to Enrolled. Possible violation: delete <53650, Smith, …> from Students.
04/19/23 28
Other Types of Constraints
Semantic Integrity Constraints: based on application semantics and cannot be
expressed by the model per se e.g., “the max. no. of hours per employee for all
projects he or she works on is 56 hrs per week” A constraint specification language may have to be
used to express these SQL-99 allows triggers and ASSERTIONS to
allow for some of these
top related