relational databases week 8 information technologies 17:610:550:01 - fall 2008 -

Post on 31-Mar-2015

218 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Relational databases

Week 8Information Technologies 17:610:550:01

- Fall 2008 -

Announcements

• Quiz grades are in

• Assignment 3 is due next week

• MS Access 2007 CDs available

Agenda

• Recap Excel Tables

• Relational Databases – Basic concepts– MS Access

• End of class feedback questionaire

Last class we covered…

• … Basic excel capabilities– Automatic calculation using formula/functions– Making sense of data

• using conditional formatting

• Charts

• But there is much more which we will not cover in this class but will give a gist– Working with large amount of data - tables– Making sense of data

• Sort/Filter

• Pivot Table

Excel Tables

First row contains field names

Each row is a record

Excel Tables

• A table is an area in the worksheet that contains rows and columns of similar or related information – Can be used as part of a database or organized collection of

related information– Worksheet rows represent the records; worksheet columns

represent the fields in a record • The first row contains the column labels or field

names– Identifies data to be entered in the columns

• Each row in the table contains a record

Excel Tables

• Every cell in the table area, except the field names, contains a specific value for a specific field in a specific record

• Every record (row) contains the same fields (columns) in the same order as every other record

Create Tables

• Create table from data already in a spreadsheet:– Select the range of cells that contains the data– Click the Insert tab and click Table in the Tables group

• The Create Table dialog box appears; make appropriate changes– Click OK to complete the table creation and display the contextual

Design tab

• Create table and then add the data:– Select a range of cells on a sheet– Click the Insert tab and click Table in the Tables group

• The Create Table dialog box appears asking for the range of data for the table

– Click OK to display the contextual Design tab

Sorting Data

• Sorting arranges records in a table by the value in field(s) within a table

• The sort command puts lists in ascending or descending order according to specified keys

• Keys are the fields on which records are sorted

Sorting Data (continued)

Sorted by instrument Sorted by class

Multiple Level Sorts

Filtering and Totaling Data

• Data refers to a fact or facts about a specific record or sets of records

• Information is data that has been arranged in some form and viewed as useful

Use AutoFilters

• A quick way to display a subset of data from a table

• Filtered data displays only the records that meet the criteria you specify

• To apply a simple AutoFilter to a data table, click the arrow in the column header

Using AutoFilters (continued)

Filter drop-down list

List filtered to display only juniors

Agenda

• Recap Excel Tables

• Relational Databases – Basic concepts– MS Access

• End of class feedback questionaire

What is a database system?

• Database:– a large, integrated collection of data

• Models something about the real world– Entities (e.g., teams, games)– Relationships (e.g., the Red Sox won the World Series)

• A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases– Today’s focus on relational databases

And here…

Is the WWW a DBMS?

• Fairly sophisticated search available– Crawler indexes pages on the Web– Keyword-based search for pages

• But, currently– Data is mostly unstructured and untyped– Can’t modify the data– Can’t get summaries, complex combinations of data– Few guarantees provided for freshness of data, consistency across data

items, fault tolerance, …

• The picture is changing– New standards, e.g., XML, Semantic Web, etc., can provide richer

models of data

Database Basics

• What is a database?– Collection of data, organized to support access– Models some aspects of reality

• Components of a relational database:– Field = an “atomic” unit of data– Record = a collection of related fields– Table = a collection of related records

• Each record is one row in the table• Each field is one column in the table

– Database = a collection of tables– Primary Key = the field that uniquely identifies a record

A Simple Example

Name DOB SSN

John Doe 04/15/1970 153-78-9082

Jane Smith 08/31/1985 768-91-2376

Mary Adams 11/05/1972 891-13-3057

Field

Field Name

Record

Primary Key

Table

Why “Relational”?

• Databases model some aspects of reality• A relational database is a database that groups data using common attributes

found in the data set– The resulting "clumps" of organized data is much easier for people to understand– The grouping uses the relational model

• MS Access is a relational database management system, or RDBMS

Relational Database Terminology

Relational DB terminology MS Access

relation, base relvar table

derived relvar query

tuple raw/record

attribute column/field

• A relation is defined as a set of tuples that have the same attributes.

• A relation is usually described as a table, which is organized into rows and columns.

• All the data referenced by an attribute are in the same domain and conform to the same constraints.

The Registrar Example

• What do we need to know (i.e., model)?– Something about the students (e.g., first name, last

name, email, department)– Something about the courses (e.g., course ID,

description, enrolled students, grades)– Which students are in which courses

A First Try

Put everything in a big table…

Discussion: Why is this a bad idea?

1 Arrows John EE EE lis550 Information Technology 90 jarrows@wam1 Arrows John EE Elec Engin ee750 Communication 95 ja_2002@yahoo2 Peters Kathy HIST HIST lis550 Informatino Technology 95 kpeters2@wam2 Peters Kathy HIST history hist405 American History 80 kpeters2@wma3 Smith Chris HIST history hist405 American History 90 smith2002@glue4 Smith John CLIS Info Sci lis550 Information Technology 98 js03@wam

Goals of “Normalization”

• Remove duplicates/Save space– Save each fact only once

• More rapid updates– Every fact only needs to be updated once

• More rapid search– Finding something once is good enough

• Avoid inconsistency– Changing data once changes it everywhere

Another Try...

Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies

Course ID Course Namelis550 Information Technologyee750 Communicationhist405 American History

Student ID Course ID Grade1 lis550 901 ee750 952 lis550 952 hist405 803 hist405 904 lis550 98

Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam

Student Table

Department Table Course Table

Enrollment Table

Approaches to Normalization

• For simple problems (like the homework):– Start with “binary relationships”: pairs of fields that are

related

– Group together wherever possible

– Add keys where necessary

• For more complicated problems:– Entity relationship modeling

Some Lingo

• “Primary Key” uniquely identifies a record– e.g., student ID in the student table

• “Foreign Key” is primary key in the other table– It need not be unique in this table

The Data Model

Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies

Course ID Course Namelbsc690 Information Technologyee750 Communicationhist405 American History

Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98

Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam

Student Table

Department Table Course Table

Enrollment Table

Primary key

Primary key

Primary key

Primary key

Foreign key

Relational operations: Join

Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam

“Joined” Table

Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam

Student Table

Department TableDepartment ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies

Relational Operations: Project

SELECT Student ID, Department

Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam

Student ID Department1 Electrical Engineering2 History3 History4 Information Stuides

Relational operations: Restrict

Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam

Student ID Last Name First Name Department ID Department email2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue

WHERE Department ID = “HIST”

Relational Operations

• Joining tables: JOIN• Choosing columns: SELECT

– Based on their label

• Choosing rows: WHERE– Based on their contents

• These can be specified together

department ID = “HIST”

SELECT Student ID, Dept WHERE Dept = “History”

Querying a Database

• Queries allow us to ask a question about data and receive an answer back by returning a subset of the table data.

• Querying a database will make use of all or part of these relational operations mentioned previously (JOIN,SELECT,RESTRICT)

Database Integrity

• Registrar database must be internally consistent– All enrolled students must have an entry in the student

table

– All courses must have a name

– …

• What happens:– When a student withdraws from the university?

– When a course is taken off the books?

Integrity Constraints

• Conditions that must be true of the database at any time– Specified when the database is designed– Checked when the database is modified

• RDBMS ensures that integrity constraints are always kept– So that database contents remain faithful to the real world– Helps avoid data entry errors

Discussion Point

• How is a relational database different from a spreadsheet?

Spreadsheet or Relational DB?

• You are working with large amounts of data

• You need to create relationships between your data

• You rely on external databases to analyze data

Your data is of a manageable data size

There is no need for relationships between data

You are primarily creating calculations and statistics

Use Relational DB when: Use spreasheets when:

Agenda

• Recap Excel Tables

• Relational Databases – Basic concepts– MS Access

• End of class feedback questionaire

MS Access

• Intro to MS access

• Relational Databases and Multi-table queries

Open a Database

Open Recent Documents list

• Choose Open to browse for a file or choose a database from the Recent Documents list

Open a Database

Open Recent Database list

Open a New database

Open from a Template

• When you first open MS Access this is the first screen you see.

MS Access Database Terminology

• Field

• Record

• Table

• Database

A database is made up of one or more tables

Individual tables in a database

Records

Individual fields

Objects

• Tables

• Queries

• Reports

• Forms

• Macros

• Modules

Objects

Work with Table Views

• Datasheet View – used to add, modify, delete and view records

• Design View – used to create and modify the fields in a table

Design View

Datasheet View

Work with Table Views

• Click the Home tab• Click View from the View ribbon

Table View Options

Datasheet View

• Primary Key – a field that identifies each record as being unique

Primary Key

Design View

• Click F6 to switch between the upper and lower panes

Key symbol identifies primary key field

Set field properties in the lower pane

Backing-up and Renaming Access Files

• Save As – different in Access than other Office applications– Save As saves only the current object, not the

entire database

• To save a database with a new name you must either:– Backup the database– Copy, paste, and rename the database

Backing-up a Database

• Backing-up an Access file will produce a copy of your file with a default filename

Default filename of a backup file is the name of the database and the current date

Compact and Repair

Compact and Repair is located under the Manage menu

• Fixes problems due to inefficient file storage and growth of a database– Should be performed everyday– Often decreases the file size by 50% or more

Filters

• Create a subset of records

• Do not change underlying table data

• Two types– Filter by Selection– Filter by Form

Filter by Selection

• Selects only the records that match pre-selected criteria

Table before filter by selection

Results of filter

Filter by selection being applied from pre-determined criteria

Applying and Removing a Filter

• Once a filter is applied, the Toggle Filter icon will be available

• The Toggle Filter icon can be used to apply and remove the current filter as many times as desired

Filter icon in the Sort and Filter group

Toggle Filter icon

Sorting Table Data

• Lists records in ascending or design order according to one or more fields

Last Name field sorted ascending

Last Name field sorted descending

Recap: Excel or Access?

• You are working with large amounts of data

• You need to create relationships between your data

• You rely on external databases to analyze data

Your data is of a manageable data size

There is no need for relationships between data

You are primarily creating calculations and statistics

Use Access when: Use Excel when:

Recap: Relational Database-RDBMS

• Relational database management systems allow data to be grouped based on common attributes. Data is grouped into tables and relationships are created between the tables

• This is much more efficient than the opposite of an RDBMS which is a flat file. Flat files store data in one single file with no special groupings or collections

MS Access

• Intro to MS access

• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database

Table Design Considerations

Just as you first create a blueprint to build a house, you should first sketch or outline the design of a database table

Careful pre-planning

will save you much time

in the future

All design decision are done

when you are in the Design View of the Table

Design Considerations – Field Size Property

• Set the field size in Table Design View• Always anticipate the current field size may

one day need to be larger

Set field size in the Field Properties grid of Table Design View

Design Considerations – Validation Rules

• Used to avoid data entry errors by restricting what can be entered

• Validation text can be used to provide an explanation of the type of data that is allowed in a field

• Eg: <>0 will not allow 0 to be a value of the field

Set validation rules in the Field Properties grid of Table Design View

Design Considerations - Indexing

• Indexing helps sorting and search process

• Can be set to disallow duplicates (which sometimes in needed) – example when is needed?

• Relates the field values to the records that contain the field value

Indexed Property

Design Considerations – Store Data in its Smallest part

• For greater flexibility, store data in its smallest part – Instead of one field for an address, use many– Instead of one field for a name, two or three

Like this

Not like this

Design Consideration - Plan for Date Arithmetic

• Using a data type of date/time for all date fields allows the use of date arithmetic

Fields declared as a data type of Date/Time

Design Considerations – Design Multiple Tables

• Using multiple tables helps reduce redundancy– The process is also referred to as normalization

Multiple table tabs identify open tables

Multiple tables shown in the Navigation pane

MS Access

• Intro to MS access

• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database

Creating Tables – From the Create Tab

Enter data directly into a table, including the field names

Enter field names, data types and descriptions in Table Design View

Begin with a template

• When you create a new table is good practice to start with the Table Design to enter field names, data types, properties

Creating Tables – From the Import Tab

• Click the application from which to import or • Choose the type of file you wish to import

Click the appropriate application button

Choose a file type to import

Recap: Work with Table Views

• Click the Home tab• Click View from the View ribbon

Table View Options

Recap: Table Design View Key symbol identifies primary key field

Set field properties in the lower pane

Create Tables – Specifying field names

• After choosing your method of creation begin implementing the table design– Use CamelCase notation for field names (e.g., LastName, i.e,

no spaces)– Specify data types– Establish a primary key– Consider the need for a foreign key

Table

Table Design View

Add field in Table View

Create Tables – Primary Key

• Tables are automatically created with an AutoNumber field which serves as the primary key

• To change the primary key– Select a field in Table Design View – Click the primary key icon

Primary Key Field

Primary Key icon

Create tables - Field Properties

• Field Properties can be used to specify characteristics for individual fields

• Located in the lower pane of Table Design View

Field Size property

Indexing

Recap: Datasheet View

• Here you enter values in your table after you designed your table in the Table Design View

• Primary Key – a field that identifies each record as being unique

Primary Key

MS Access

• Intro to MS access

• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database

Tables Relationships

• The strength of Access is the fact that it is a relational database– This means you can have multiple tables and create

relationships between each table– This helps eliminate redundant data

Relationship between two tables

Primary key

Foreign key

Foreign Key

Customer ID - Primary Key in Customer TableCustomer ID –will only appear in one record - there must only be one unique id per customer

Customer ID - Regular Field in Orders TableCustomer ID may appear many times – one customer can place many orders

• Foreign key is used to establish relationships between tables• Based on the above example:

– Customer Id is the foreign key in the Orders table– This is referred to as a One to Many Relationship

Establishing Relationships - Using the Relationship Window

• Click the Database tools and click the Relationships icon• First time of access will be empty and you need to add tables• Add the tables or queries from the Show table dialog box

Relationships icon

Show Table dialog box

Relationship window

Establishing Relationships

• In the Relationship window, click and drag a field name from one table to a field name in a related table

Click and drag to create a relationship

Establishing Relationships

• Enter the appropriate settings in the Edit relationships dialog box and click Create

• A join line will appear when one table is joined to another

Infinity symbol notes referential integrity has been appliedSet referential integrity and

cascades

Referential Integrity

• Assures that the references to relationships between data is accurate

• Established when creating the relationship between two tables

Enforce Referential Integrity

Cascades

• When active, data changed in one table that is in a relationship will be changed in its related tables

• Can be set when establishing relationships between tables

Cascade update and cascade delete

MS Access

• Intro to MS access

• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database

Queries

• Queries allow us to ask questions about data• This record set that answers our question is called a dataset

Employees table

Dataset resulting from querying table for only employees who are Sales Representatives

Create Queries - Using Query Design View

• From the Create Tab select Query Design from Other group• Two panes – the table pane and the design pane• Striking the F6 key will toggle you between sections

Tables pane

Design pane

Create Tab

Select Query• Searches

associated tables and returns a dataset that matches the query parameters

• Changes made to the dataset will be reflected in the associated tables

Specifying Criteria in a Select Query

• Field row – displays the field name

• Sort row – enables you to sort the dataset

• Show row – controls whether or not you see a field in the dataset

• Criteria row – determines the records that will be selected for display

Fields in design grid allow us to specify criteria for the dataset

Specifying Criteria – Currency and Operands

• Specify criteria with currency – Without the dollar sign – With or without the decimal point

• Use operands such as:– Less than and greater than– Equal to or not equal to

Greater than (>) operand

Currency amount entered without dollar sign

Specifying Criteria – Wildcards

• Asterisk - searches for a pattern that includes any number of characters in the position of the asterisk

• Question mark - searches for a pattern that includes a single character in the position of the question mark

Query with asterisk wildcard and resulting dataset

Query with question mark and asterisk wildcard and resulting dataset to specify criteria for the dataset

Specifying Criteria – Null Values

• IS NULL finds only records that have no value• IS NOT NULL excludes Null value records

Is Null criteria and resulting dataset

IS NOT NULL criteria and partial resulting dataset

Specifying Criteria – And and Or

• OR finds records that can match one or more conditions

• AND finds records that must match all criteria specified

Or Criterion and resulting dataset

And criterion and resulting dataset

Copy a Query

• Sometime we might want to do several queries that different only by one or two attributes

• Right click on the query - chose Copy form the shortcut menu• Right click and chose paste• In the Paste as dialog box, give the query a new name

Run a Query

• Running, or executing, a query is done by clicking the Run command

Run command

Creating Queries – Using the Query Wizard

• From the Create tab, choose Query Wizard for the Other group

• Choose query type from the New Query dialog box

Select Simple Query Wizard

Query Wizard icon

Creating Queries – Using the Query Wizard: continued

• Select the Table/Queries to include and choose the desired fields

• Select aggregate totals needed in the Summary Options box

Creating Queries – Using the Query Wizard: continued

• Title your query and open in Datasheet View or Query Design View

Sharing Data with Excel

• Data can be imported from Excel– It may be appended to an existing table– It may be used to create a new table

Excel icon External Data tab

Sharing Data with Excel

• Select the Excel file you would like to import• Select how you would like to import the data

– Appended – added to the end of an existing table

– New table – creates a new table in a database

– Linked – create a new table that is linked to the source file in Excel

Select the Source

Select the destination

top related