relational databases week 8 information technologies 17:610:550:01 - fall 2008 -
TRANSCRIPT
Relational databases
Week 8Information Technologies 17:610:550:01
- Fall 2008 -
Announcements
• Quiz grades are in
• Assignment 3 is due next week
• MS Access 2007 CDs available
Agenda
• Recap Excel Tables
• Relational Databases – Basic concepts– MS Access
• End of class feedback questionaire
Last class we covered…
• … Basic excel capabilities– Automatic calculation using formula/functions– Making sense of data
• using conditional formatting
• Charts
• But there is much more which we will not cover in this class but will give a gist– Working with large amount of data - tables– Making sense of data
• Sort/Filter
• Pivot Table
Excel Tables
First row contains field names
Each row is a record
Excel Tables
• A table is an area in the worksheet that contains rows and columns of similar or related information – Can be used as part of a database or organized collection of
related information– Worksheet rows represent the records; worksheet columns
represent the fields in a record • The first row contains the column labels or field
names– Identifies data to be entered in the columns
• Each row in the table contains a record
Excel Tables
• Every cell in the table area, except the field names, contains a specific value for a specific field in a specific record
• Every record (row) contains the same fields (columns) in the same order as every other record
Create Tables
• Create table from data already in a spreadsheet:– Select the range of cells that contains the data– Click the Insert tab and click Table in the Tables group
• The Create Table dialog box appears; make appropriate changes– Click OK to complete the table creation and display the contextual
Design tab
• Create table and then add the data:– Select a range of cells on a sheet– Click the Insert tab and click Table in the Tables group
• The Create Table dialog box appears asking for the range of data for the table
– Click OK to display the contextual Design tab
Sorting Data
• Sorting arranges records in a table by the value in field(s) within a table
• The sort command puts lists in ascending or descending order according to specified keys
• Keys are the fields on which records are sorted
Sorting Data (continued)
Sorted by instrument Sorted by class
Multiple Level Sorts
Filtering and Totaling Data
• Data refers to a fact or facts about a specific record or sets of records
• Information is data that has been arranged in some form and viewed as useful
Use AutoFilters
• A quick way to display a subset of data from a table
• Filtered data displays only the records that meet the criteria you specify
• To apply a simple AutoFilter to a data table, click the arrow in the column header
Using AutoFilters (continued)
Filter drop-down list
List filtered to display only juniors
Agenda
• Recap Excel Tables
• Relational Databases – Basic concepts– MS Access
• End of class feedback questionaire
What is a database system?
• Database:– a large, integrated collection of data
• Models something about the real world– Entities (e.g., teams, games)– Relationships (e.g., the Red Sox won the World Series)
• A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases– Today’s focus on relational databases
Databases Now…
And here…
Is the WWW a DBMS?
• Fairly sophisticated search available– Crawler indexes pages on the Web– Keyword-based search for pages
• But, currently– Data is mostly unstructured and untyped– Can’t modify the data– Can’t get summaries, complex combinations of data– Few guarantees provided for freshness of data, consistency across data
items, fault tolerance, …
• The picture is changing– New standards, e.g., XML, Semantic Web, etc., can provide richer
models of data
Database Basics
• What is a database?– Collection of data, organized to support access– Models some aspects of reality
• Components of a relational database:– Field = an “atomic” unit of data– Record = a collection of related fields– Table = a collection of related records
• Each record is one row in the table• Each field is one column in the table
– Database = a collection of tables– Primary Key = the field that uniquely identifies a record
A Simple Example
Name DOB SSN
John Doe 04/15/1970 153-78-9082
Jane Smith 08/31/1985 768-91-2376
Mary Adams 11/05/1972 891-13-3057
Field
Field Name
Record
Primary Key
Table
Why “Relational”?
• Databases model some aspects of reality• A relational database is a database that groups data using common attributes
found in the data set– The resulting "clumps" of organized data is much easier for people to understand– The grouping uses the relational model
• MS Access is a relational database management system, or RDBMS
Relational Database Terminology
Relational DB terminology MS Access
relation, base relvar table
derived relvar query
tuple raw/record
attribute column/field
• A relation is defined as a set of tuples that have the same attributes.
• A relation is usually described as a table, which is organized into rows and columns.
• All the data referenced by an attribute are in the same domain and conform to the same constraints.
The Registrar Example
• What do we need to know (i.e., model)?– Something about the students (e.g., first name, last
name, email, department)– Something about the courses (e.g., course ID,
description, enrolled students, grades)– Which students are in which courses
A First Try
Put everything in a big table…
Discussion: Why is this a bad idea?
1 Arrows John EE EE lis550 Information Technology 90 jarrows@wam1 Arrows John EE Elec Engin ee750 Communication 95 ja_2002@yahoo2 Peters Kathy HIST HIST lis550 Informatino Technology 95 kpeters2@wam2 Peters Kathy HIST history hist405 American History 80 kpeters2@wma3 Smith Chris HIST history hist405 American History 90 smith2002@glue4 Smith John CLIS Info Sci lis550 Information Technology 98 js03@wam
Goals of “Normalization”
• Remove duplicates/Save space– Save each fact only once
• More rapid updates– Every fact only needs to be updated once
• More rapid search– Finding something once is good enough
• Avoid inconsistency– Changing data once changes it everywhere
Another Try...
Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
Course ID Course Namelis550 Information Technologyee750 Communicationhist405 American History
Student ID Course ID Grade1 lis550 901 ee750 952 lis550 952 hist405 803 hist405 904 lis550 98
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department Table Course Table
Enrollment Table
Approaches to Normalization
• For simple problems (like the homework):– Start with “binary relationships”: pairs of fields that are
related
– Group together wherever possible
– Add keys where necessary
• For more complicated problems:– Entity relationship modeling
Some Lingo
• “Primary Key” uniquely identifies a record– e.g., student ID in the student table
• “Foreign Key” is primary key in the other table– It need not be unique in this table
The Data Model
Department ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
Course ID Course Namelbsc690 Information Technologyee750 Communicationhist405 American History
Student ID Course ID Grade1 lbsc690 901 ee750 952 lbsc690 952 hist405 803 hist405 904 lbsc690 98
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department Table Course Table
Enrollment Table
Primary key
Primary key
Primary key
Primary key
Foreign key
Relational operations: Join
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
“Joined” Table
Student ID Last Name First Name Department ID email1 Arrows John EE jarrows@wam2 Peters Kathy HIST kpeters2@wam3 Smith Chris HIST smith2002@glue4 Smith John CLIS js03@wam
Student Table
Department TableDepartment ID DepartmentEE Electrical EngineeringHIST HistoryCLIS Information Studies
Relational Operations: Project
SELECT Student ID, Department
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
Student ID Department1 Electrical Engineering2 History3 History4 Information Stuides
Relational operations: Restrict
Student ID Last Name First Name Dept ID Department email1 Arrows John EE Electrical Engineering jarrows@wam2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue4 Smith John CLIS Information Stuides js03@wam
Student ID Last Name First Name Department ID Department email2 Peters Kathy HIST History kpeters2@wam3 Smith Chris HIST History smith2002@glue
WHERE Department ID = “HIST”
Relational Operations
• Joining tables: JOIN• Choosing columns: SELECT
– Based on their label
• Choosing rows: WHERE– Based on their contents
• These can be specified together
department ID = “HIST”
SELECT Student ID, Dept WHERE Dept = “History”
Querying a Database
• Queries allow us to ask a question about data and receive an answer back by returning a subset of the table data.
• Querying a database will make use of all or part of these relational operations mentioned previously (JOIN,SELECT,RESTRICT)
Database Integrity
• Registrar database must be internally consistent– All enrolled students must have an entry in the student
table
– All courses must have a name
– …
• What happens:– When a student withdraws from the university?
– When a course is taken off the books?
Integrity Constraints
• Conditions that must be true of the database at any time– Specified when the database is designed– Checked when the database is modified
• RDBMS ensures that integrity constraints are always kept– So that database contents remain faithful to the real world– Helps avoid data entry errors
Discussion Point
• How is a relational database different from a spreadsheet?
Spreadsheet or Relational DB?
• You are working with large amounts of data
• You need to create relationships between your data
• You rely on external databases to analyze data
Your data is of a manageable data size
There is no need for relationships between data
You are primarily creating calculations and statistics
Use Relational DB when: Use spreasheets when:
Agenda
• Recap Excel Tables
• Relational Databases – Basic concepts– MS Access
• End of class feedback questionaire
MS Access
• Intro to MS access
• Relational Databases and Multi-table queries
Open a Database
Open Recent Documents list
• Choose Open to browse for a file or choose a database from the Recent Documents list
Open a Database
Open Recent Database list
Open a New database
Open from a Template
• When you first open MS Access this is the first screen you see.
MS Access Database Terminology
• Field
• Record
• Table
• Database
A database is made up of one or more tables
Individual tables in a database
Records
Individual fields
Objects
• Tables
• Queries
• Reports
• Forms
• Macros
• Modules
Objects
Work with Table Views
• Datasheet View – used to add, modify, delete and view records
• Design View – used to create and modify the fields in a table
Design View
Datasheet View
Work with Table Views
• Click the Home tab• Click View from the View ribbon
Table View Options
Datasheet View
• Primary Key – a field that identifies each record as being unique
Primary Key
Design View
• Click F6 to switch between the upper and lower panes
Key symbol identifies primary key field
Set field properties in the lower pane
Backing-up and Renaming Access Files
• Save As – different in Access than other Office applications– Save As saves only the current object, not the
entire database
• To save a database with a new name you must either:– Backup the database– Copy, paste, and rename the database
Backing-up a Database
• Backing-up an Access file will produce a copy of your file with a default filename
Default filename of a backup file is the name of the database and the current date
Compact and Repair
Compact and Repair is located under the Manage menu
• Fixes problems due to inefficient file storage and growth of a database– Should be performed everyday– Often decreases the file size by 50% or more
Filters
• Create a subset of records
• Do not change underlying table data
• Two types– Filter by Selection– Filter by Form
Filter by Selection
• Selects only the records that match pre-selected criteria
Table before filter by selection
Results of filter
Filter by selection being applied from pre-determined criteria
Applying and Removing a Filter
• Once a filter is applied, the Toggle Filter icon will be available
• The Toggle Filter icon can be used to apply and remove the current filter as many times as desired
Filter icon in the Sort and Filter group
Toggle Filter icon
Sorting Table Data
• Lists records in ascending or design order according to one or more fields
Last Name field sorted ascending
Last Name field sorted descending
Recap: Excel or Access?
• You are working with large amounts of data
• You need to create relationships between your data
• You rely on external databases to analyze data
Your data is of a manageable data size
There is no need for relationships between data
You are primarily creating calculations and statistics
Use Access when: Use Excel when:
Recap: Relational Database-RDBMS
• Relational database management systems allow data to be grouped based on common attributes. Data is grouped into tables and relationships are created between the tables
• This is much more efficient than the opposite of an RDBMS which is a flat file. Flat files store data in one single file with no special groupings or collections
MS Access
• Intro to MS access
• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database
Table Design Considerations
Just as you first create a blueprint to build a house, you should first sketch or outline the design of a database table
Careful pre-planning
will save you much time
in the future
All design decision are done
when you are in the Design View of the Table
Design Considerations – Field Size Property
• Set the field size in Table Design View• Always anticipate the current field size may
one day need to be larger
Set field size in the Field Properties grid of Table Design View
Design Considerations – Validation Rules
• Used to avoid data entry errors by restricting what can be entered
• Validation text can be used to provide an explanation of the type of data that is allowed in a field
• Eg: <>0 will not allow 0 to be a value of the field
Set validation rules in the Field Properties grid of Table Design View
Design Considerations - Indexing
• Indexing helps sorting and search process
• Can be set to disallow duplicates (which sometimes in needed) – example when is needed?
• Relates the field values to the records that contain the field value
Indexed Property
Design Considerations – Store Data in its Smallest part
• For greater flexibility, store data in its smallest part – Instead of one field for an address, use many– Instead of one field for a name, two or three
Like this
Not like this
Design Consideration - Plan for Date Arithmetic
• Using a data type of date/time for all date fields allows the use of date arithmetic
Fields declared as a data type of Date/Time
Design Considerations – Design Multiple Tables
• Using multiple tables helps reduce redundancy– The process is also referred to as normalization
Multiple table tabs identify open tables
Multiple tables shown in the Navigation pane
MS Access
• Intro to MS access
• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database
Creating Tables – From the Create Tab
Enter data directly into a table, including the field names
Enter field names, data types and descriptions in Table Design View
Begin with a template
• When you create a new table is good practice to start with the Table Design to enter field names, data types, properties
Creating Tables – From the Import Tab
• Click the application from which to import or • Choose the type of file you wish to import
Click the appropriate application button
Choose a file type to import
Recap: Work with Table Views
• Click the Home tab• Click View from the View ribbon
Table View Options
Recap: Table Design View Key symbol identifies primary key field
Set field properties in the lower pane
Create Tables – Specifying field names
• After choosing your method of creation begin implementing the table design– Use CamelCase notation for field names (e.g., LastName, i.e,
no spaces)– Specify data types– Establish a primary key– Consider the need for a foreign key
Table
Table Design View
Add field in Table View
Create Tables – Primary Key
• Tables are automatically created with an AutoNumber field which serves as the primary key
• To change the primary key– Select a field in Table Design View – Click the primary key icon
Primary Key Field
Primary Key icon
Create tables - Field Properties
• Field Properties can be used to specify characteristics for individual fields
• Located in the lower pane of Table Design View
Field Size property
Indexing
Recap: Datasheet View
• Here you enter values in your table after you designed your table in the Table Design View
• Primary Key – a field that identifies each record as being unique
Primary Key
MS Access
• Intro to MS access
• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database
Tables Relationships
• The strength of Access is the fact that it is a relational database– This means you can have multiple tables and create
relationships between each table– This helps eliminate redundant data
Relationship between two tables
Primary key
Foreign key
Foreign Key
Customer ID - Primary Key in Customer TableCustomer ID –will only appear in one record - there must only be one unique id per customer
Customer ID - Regular Field in Orders TableCustomer ID may appear many times – one customer can place many orders
• Foreign key is used to establish relationships between tables• Based on the above example:
– Customer Id is the foreign key in the Orders table– This is referred to as a One to Many Relationship
Establishing Relationships - Using the Relationship Window
• Click the Database tools and click the Relationships icon• First time of access will be empty and you need to add tables• Add the tables or queries from the Show table dialog box
Relationships icon
Show Table dialog box
Relationship window
Establishing Relationships
• In the Relationship window, click and drag a field name from one table to a field name in a related table
Click and drag to create a relationship
Establishing Relationships
• Enter the appropriate settings in the Edit relationships dialog box and click Create
• A join line will appear when one table is joined to another
Infinity symbol notes referential integrity has been appliedSet referential integrity and
cascades
Referential Integrity
• Assures that the references to relationships between data is accurate
• Established when creating the relationship between two tables
Enforce Referential Integrity
Cascades
• When active, data changed in one table that is in a relationship will be changed in its related tables
• Can be set when establishing relationships between tables
Cascade update and cascade delete
MS Access
• Intro to MS access
• Relational Databases and Multi-table queries– Tables/Database Design Consideration– Creating Tables– Creating relationships between tables – Querying the Database
Queries
• Queries allow us to ask questions about data• This record set that answers our question is called a dataset
Employees table
Dataset resulting from querying table for only employees who are Sales Representatives
Create Queries - Using Query Design View
• From the Create Tab select Query Design from Other group• Two panes – the table pane and the design pane• Striking the F6 key will toggle you between sections
Tables pane
Design pane
Create Tab
Select Query• Searches
associated tables and returns a dataset that matches the query parameters
• Changes made to the dataset will be reflected in the associated tables
Specifying Criteria in a Select Query
• Field row – displays the field name
• Sort row – enables you to sort the dataset
• Show row – controls whether or not you see a field in the dataset
• Criteria row – determines the records that will be selected for display
Fields in design grid allow us to specify criteria for the dataset
Specifying Criteria – Currency and Operands
• Specify criteria with currency – Without the dollar sign – With or without the decimal point
• Use operands such as:– Less than and greater than– Equal to or not equal to
Greater than (>) operand
Currency amount entered without dollar sign
Specifying Criteria – Wildcards
• Asterisk - searches for a pattern that includes any number of characters in the position of the asterisk
• Question mark - searches for a pattern that includes a single character in the position of the question mark
Query with asterisk wildcard and resulting dataset
Query with question mark and asterisk wildcard and resulting dataset to specify criteria for the dataset
Specifying Criteria – Null Values
• IS NULL finds only records that have no value• IS NOT NULL excludes Null value records
Is Null criteria and resulting dataset
IS NOT NULL criteria and partial resulting dataset
Specifying Criteria – And and Or
• OR finds records that can match one or more conditions
• AND finds records that must match all criteria specified
Or Criterion and resulting dataset
And criterion and resulting dataset
Copy a Query
• Sometime we might want to do several queries that different only by one or two attributes
• Right click on the query - chose Copy form the shortcut menu• Right click and chose paste• In the Paste as dialog box, give the query a new name
Run a Query
• Running, or executing, a query is done by clicking the Run command
Run command
Creating Queries – Using the Query Wizard
• From the Create tab, choose Query Wizard for the Other group
• Choose query type from the New Query dialog box
Select Simple Query Wizard
Query Wizard icon
Creating Queries – Using the Query Wizard: continued
• Select the Table/Queries to include and choose the desired fields
• Select aggregate totals needed in the Summary Options box
Creating Queries – Using the Query Wizard: continued
• Title your query and open in Datasheet View or Query Design View
Sharing Data with Excel
• Data can be imported from Excel– It may be appended to an existing table– It may be used to create a new table
Excel icon External Data tab
Sharing Data with Excel
• Select the Excel file you would like to import• Select how you would like to import the data
– Appended – added to the end of an existing table
– New table – creates a new table in a database
– Linked – create a new table that is linked to the source file in Excel
Select the Source
Select the destination