1 cs351 introduction to database systems [email protected]

76
1 CS351 Introduction to Database systems [email protected]

Upload: katherine-rodgers

Post on 27-Dec-2015

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

1

CS351 Introduction to Database systems

[email protected]

Page 2: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

2

About me

Quan Wang– developer at Oracle Corp.

• JDBC• Web Services• Cloud

Page 3: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

3

Content

Introduction– Why databases?

Data Model

Page 4: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

4

Why databases?

It used to be about boring stuff: employee records, bank records, etc.

Today, the field covers all the largest sources of data, with many new ideas. Web search. Data mining. Scientific and medical databases. Integrating information.

Page 5: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

5

Why databases?

You may not notice it, but databases are behind almost everything you do on the Web. Google searches. Queries at Amazon, eBay, etc.

Page 6: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

6

Why databases?

Databases often have unique concurrency-control problems. Many activities (transactions) at the

database at all times. Must not confuse actions, e.g., two

withdrawals from the same account must each debit the account.

Page 7: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

7

Why databases?

Changing landscape System R: Oracle, MySQL, IBM, MS NoSQL: MangoDB NewSQL: VoltDB Cloud: Amazon RDS, Google Bigquery Bigdata: Hadoop, MapR

Page 8: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

8

Why databases?

Job perspectives– Startups– Massive industry

Page 9: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

9

Content

Introduction– Why database?– What to cover?

Data Model

Page 10: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

10

What to cover?

Database – a set of inter-related data

pertaining to some organization– airline, university, store

Database Management System (DBMS)

– one or more databases – algorithms accessing the databases

Page 11: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

11

Page 12: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

12

What to cover?

Database Theory– relational model and algebra– norm forms

Design of databases.– normalization– E/R, UML

Page 13: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

13

What to cover?

Database programming– SQL, stored procedures.

Integration?– ODBC, JDBC– O-R Mapping

Page 14: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

14

What not covered?

Not covered– DB tuning– DB administration(DBA)– DB implementation– Programming languages

• May need to learn languages and the DB related aspects

Page 15: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

15

Content

Introduction– Why database?– What to cover?– Logistics

Data Model

Page 16: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

16

Logistics

Email: [email protected] Office hour

– Tuesday & Thursday – 1 pm to 2pm– 201L

Page 17: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

17

Logistics

No TA No detailed comments on

assignments Distribute answer keys when

appropriate Check discrepancies Raise issues

Page 18: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

18

Logistics

Course page: – Http://encs.vancouver.wsu.edu/~

quwang– weekly schedule, assignments

Submission page– http://lms.wsu.edu – Use drop box for assignments and

project submission

Page 19: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

19

Logistics Assignments 30% Team project 20% Midterms 25% Final 25%

Page 20: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

20

Logistics

Team project– mini-ebay– Max 3 people each team– Single grade for each team

Assignments and project – Late submission 10% penalty

Page 21: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

21

Logistics

Databases– MySQL– SQLite

• /usr/bin/sqlite3

Page 22: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

22

Logistics

Exams– Lectures, textbook, and assigned

readings– Only areas covered in the

lectures– Not curved

Page 23: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

23

Content

Introduction Why database? What to cover? Logistics

Data Model– Why not flat files?

Page 24: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

24

Why not flat files?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

operations – check balance– deposit or withdraw– change address

Page 25: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

25

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – code coupled with data format

• changing delimiter would require code change

Page 26: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

26

Why not flat files?

dataSavings.data:

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

Checkings.data:

David, 789 NE Sany Blvd, 1500, 08/20/1998, 0.001!

Lisa, 432 NE Alberta St, 3500, 09/20/1990, 0

problems – redundancy leads to

inconsistency

Page 27: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

27

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – concurrency

• withdraw calls for file lock– withdraw $100 from the same account at

the same time, synchronized

– withdraw $100 from different accounts at the same time, synchronized unnecessarily

Page 28: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

28

Why not flat files?

dataDavid, 123 NW Flanders St, 500.75, 08/20/2010, 0.005

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

problems – user interface

• file based algorithms

Page 29: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

29

Why not flat files?

summary – code coupled with data– redundancy leading to

inconsistency – low concurrency– unfriendly user interface

Page 30: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

30

Why not flat files?

database as abstraction data model hides data format SQL hides algorithms runtime handles concurrency

Page 31: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

31

Content

Introduction Data Model

– why not flat files?– relational data model

• relations• schema• keys

Page 32: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

32

What is a Data Model?

1. Underlying structure of a database.2. Mathematical representation of data.

Examples: relational model = tables; semistructured model = trees/graphs; key-value model=key value pairs.

3. Operations on data.4. Constraints.

Page 33: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

33

A Relation is a Table

name manfWinterbrew Pete’sBud Lite Anheuser-Busch

Beers

Attributes(columnheaders)

Tuples(rows)

Relation name

Page 34: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

34

Schemas

Relation schema = relation name and attribute list. Optionally: types of attributes. Example: Beers(name, manf) or

Beers(name: string, manf: string) Database = collection of relations. Database schema = set of all

relation schemas in the database.

Page 35: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

35

Why Relations?

Very simple model. Often matches how we think

about data. Abstract model that underlies SQL,

the most important database language today.

Page 36: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

37

Example

Sells( bar, beer, price ) Bars( name, addr )Joe’s Bud 2.50 Joe’s Maple St.Joe’s Miller 2.75 Sue’s River Rd.Sue’s Bud 2.50Sue’s Coors3.00

Page 37: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

38

Database Schemas in SQL

SQL is primarily a query language, for getting information from a database.

But SQL also includes a data-definition component, DDL, for describing database schemas.

Page 38: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

39

Creating (Declaring) a Relation

Simplest form is:CREATE TABLE <name> (<list of elements>);

To delete a relation:DROP TABLE <name>;

Page 39: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

40

Elements of Table Declarations

Most basic element: an attribute and its type.

The most common types are: INT or INTEGER (synonyms). REAL or FLOAT (synonyms). CHAR(n ) = fixed-length string of n

characters. VARCHAR(n ) = variable-length string

of up to n characters.

Page 40: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

41

Example: Create Table

CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL

);

Page 41: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

42

SQL Values

Integers and reals are represented as you would expect.

Strings are too, except they require single quotes. Two single quotes = real quote, e.g., ’Joe’’s Bar’.

Any value can be NULL.

Page 42: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

43

Dates and Times

DATE and TIME are types in SQL. The form of a date value is:

DATE ’yyyy-mm-dd’ Example: DATE ’2007-09-30’ for

Sept. 30, 2007.

Page 43: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

44

Times as Values

The form of a time value is:TIME ’hh:mm:ss’

with an optional decimal point and fractions of a second following. Example: TIME ’15:30:02.5’ = two

and a half seconds after 3:30PM.

Page 44: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

45

Declaring Keys

An attribute or list of attributes may be declared PRIMARY KEY or UNIQUE.

Either says that no two tuples of the relation may agree in all the attribute(s) on the list.

There are a few distinctions to be mentioned later.

Page 45: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

46

Declaring Single-Attribute Keys

Place PRIMARY KEY or UNIQUE after the type in the declaration of the attribute.

Example:CREATE TABLE Beers (

name CHAR(20) UNIQUE,

manf CHAR(20)

);

Page 46: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

47

Declaring Multiattribute Keys

A key declaration can also be another element in the list of elements of a CREATE TABLE statement.

This form is essential if the key consists of more than one attribute. May be used even for one-attribute

keys.

Page 47: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

48

Example: Multiattribute Key

The bar and beer together are the key for Sells:CREATE TABLE Sells (

bar CHAR(20),

beer VARCHAR(20),

price REAL,

PRIMARY KEY (bar, beer)

);

Page 48: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

49

PRIMARY KEY vs. UNIQUE

1. There can be only one PRIMARY KEY for a relation, but several UNIQUE attributes.

2. No attribute of a PRIMARY KEY can ever be NULL in any tuple. But attributes declared UNIQUE may have NULL’s, and there may be several tuples with NULL.

Page 49: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

50

Content

Introduction Data Model

– why not flat files?– relational data model

relations schema keys

– why relations?

Page 50: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

51

Why relations?

datasavings.dat

NAME,ADDR,BALANCE,OPENED,INTEREST

David, 123 NW Flanders St, 500.75, 08/20/2010, 0.005!

Lisa, 432 NE Alberta St, 4000.00, 09/20/2001, 0.006

Page 51: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

52

Terminalogy

DDL: Data Definition Language CREATE TABLE

DML: Data Manipulation Language INSERT

SQL: Structured Query Language SELECT...FROM...WHERE

Page 52: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

53

Why relations?

schema Savings( name CHAR(50),

addr CHAR(50),

balance REAL,

opened DATE,

interest REAL);

Page 53: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

54

Why relations?

relation

operations – check balance– deposit or withdraw– change address

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

Page 54: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

55

Why relations?

relation

advantage – code coupled with data format?

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

Page 55: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

56

Why relations?

relation

advantage – concurrency

• simultaneous withdraw calls

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

Page 56: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

57

Why relations?

relations

advantage – redundancy?

Savings(name, addr, balance, opened, interest)

Checkings(name, addr, balance, opened, interest)

Page 57: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

58

Why relations?

relations

advantage – redundancy?

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

name addr balance opened interest

David 123123 Sandy Blvd 1500 1998 0.001

Lisa 432 Alberta St 3500 1990 0

Savings

Checkings

Page 58: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

59

Why relations?

relations

advantage – redundancy?

Customers(cust_id, name, addr)

Savings(cust_id, balance, opened, interest)

Checkings(cust_id, balance, opened, interest)

Page 59: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

60

Why relations?

relations

advantage – redundancy?

cust_id name addr

c1 David 123 NW Flander St

c2 Lisa 432 Alberta Stcust_id

balance opened interest

c1 1200 1998 0.001

c2 3500 1990 0

cust_id

balance opened interest

c1 500.75 2010 0.005

c2 4000.00 2001 0.006

Customers

Savings

Checkings

Page 60: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

61

Why relations?

relation

advantage – user interface

• natural language like support desirable

name addr balance opened interest

David 123 NW Flander St 500.75 2010 0.005

Lisa 432 Alberta St 4000.00 2001 0.006

Page 61: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

62

Why relations?

relation

advantage – user interface SELECT name, addr

FROM savingsWHERE balance>2000

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

Page 62: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

63

Why relations?

relation

advantage – user interface

SELECT name, addr, balance FROM savingsORDER BY balance

NAME ADDR BALANCE OPENED INTEREST

David 123 NW Flander St

500.75 2010 0.005

Lisa 432 Alberta St

4000.00 2001 0.006

Page 63: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

64

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?

Page 64: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

65

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?– semi-structured data models

• XML and JSON

Page 65: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

66

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

XML

Page 66: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

67

JSON{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

Page 67: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

68

XML and JSON, side-by-side

<Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author>Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <Publisher>Springer</Publisher></Book>

{ "Book": { "Title": "Parsing Techniques", "Authors": [ "Dick Grune", "Ceriel J.H. Jacobs" ], "Date": "2007", "Publisher": "Springer" }}

Page 68: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

69

XML Schema<xs:element name="Book"> <xs:complexType> <xs:sequence> <xs:element name="Title" type="xs:string" /> <xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="Date" type="xs:gYear" /> <xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType></xs:element>

Page 69: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

70

JSON Schema{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

Page 70: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

71

String type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Title" type="xs:string" />

Page 71: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

72

List type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Authors"> <xs:complexType> <xs:sequence> <xs:element name="Author" type="xs:string" maxOccurs="5"/> </xs:sequence> </xs:complexType></xs:element>

Page 72: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

73

Year type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Date" type="xs:gYear" />

Page 73: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

74

Enumeration type{ "$schema": "http://json-schema.org/draft-04/schema "type": "object", "properties": { "Book": { "type": "object", "properties": { "Title": {"type": "string"}, "Authors": {"type": "array", "minItems": 1, "maxItems": 5, "items": { "type": "string" }}, "Date": {"type": "string", "pattern": "^[0-9]{4}$"}, "Publisher": {"type": "string", "enum": ["Springer", "MIT Press", "Harvard Press"]} }, "required": ["Title", "Authors", "Date"], "additionalProperties": false } }, "required": ["Book"], "additionalProperties": false}

<xs:element name="Publisher" minOccurs="0"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Springer" /> <xs:enumeration value="MIT Press" /> <xs:enumeration value="Harvard Press" /> </xs:restriction> </xs:simpleType></xs:element>

Page 74: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

75

Semi-Structured Data Models

XML Adoption– Web Services (SOAP)– XML DB

JSON Adoption– Web Services (REST)– MongoDB

Page 75: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

76

Content

Introduction Data Model

– why not flat files?– relational data model– why relations?– semi-structured data models– history

Page 76: 1 CS351 Introduction to Database systems quan.wang@wsu.edu

77

History Hierarchical (IMS): late 1960’s and 1970’s Network (CODASYL): 1970’s Relational: 1970’s and early 1980’s Entity-Relationship: 1970’s Object-oriented: late 1980’s and early 1990’s

Object-relational: late 1980’s and early 1990’s Semi-structured (XML): late 1990’s to the present