ch10.1 cse 4100 compiler concepts for database systems prof. steven a. demurjian computer science...
Post on 21-Dec-2015
219 views
TRANSCRIPT
CH10.1
CSE4100
Compiler Concepts for Database SystemsCompiler Concepts for Database Systems
Prof. Steven A. Demurjian Computer Science & Engineering Department
The University of Connecticut371 Fairfield Way, Unit 2155
Storrs, CT [email protected]
http://www.engr.uconn.edu/~steve(860) 486 - 4818
CH10.2
CSE4100
OverviewOverview Motivation and BackgroundMotivation and Background Database System Architecture Database System Architecture
Exploring its Capabilities Focusing on Compiler-Related Concepts
Compile Time Issues in Database SystemsCompile Time Issues in Database Systems The SQL Query Language Optimization Issues in Database Systems Typing
Runtime Issues in Database SystemsRuntime Issues in Database Systems Transaction Processing Execution for Complex Joins
CH10.3
CSE4100
Database System ArchitectureDatabase System Architecture What are the Various Components?What are the Various Components? How do they Relate to Compilers?How do they Relate to Compilers?
CH10.4
CSE4100
How Does it Compare to Java Environment?How Does it Compare to Java Environment?
CH10.5
CSE4100
Database Concepts - SummaryDatabase Concepts - Summary Schema vs. DataSchema vs. Data
Database-Structured Collection of Data Describing Objects of Universe of Discourse being Modeling. A Database Consists of Schema and Data
Schema: Describes the Intension (Type) of Objects Data: Describes the Extension (Instances) of Objects
What is Schema w.r.t. Compilers? What is Data?What is Schema w.r.t. Compilers? What is Data?
CH10.6
CSE4100
What is a DBMS?What is a DBMS? A Database Management System (DBMS) is the
Generalized Tool that Facilitates the Management of and Access to the Database
Main Functions: Defining a Database: Specifying Data Types,
Structures, and Constraints Constructing a Database: the Process of
Storing the Data Itself on Some Storage Medium
Manipulating a Database: Function for Querying Specific Data in the Database and Updating the Database
What are the Analogies of Each of the Main What are the Analogies of Each of the Main Functions w.r.t. Programming Languages and Functions w.r.t. Programming Languages and Compilers?Compilers?
CH10.7
CSE4100
What is a DBMS?What is a DBMS? Additional Functions:Additional Functions:
Interaction with File Manager So that Details Related to Data Storage and Access are
Removed From Application Programs Integrity Enforcement
Guarantee Correctness, Validity, Consistency Security Enforcement
Prevent Data From Illegal Uses Concurrency Control
Control the Interference Between Concurrent Programs Recovery from Failure Query Processing and Optimization
Again – What are Relevant Compiler Concepts?Again – What are Relevant Compiler Concepts?
CH10.8
CSE4100
DBMS ArchitectureDBMS Architecture DBMS LanguagesDBMS Languages
Data Definition Language (DDL) Data Manipulation Language (DML)
From Embedded Queries or DB Commands Within a Program
“Stand-alone” Query Language Host Language:Host Language:
DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++)
DBMS InterfacesDBMS Interfaces Menu-Based Interface Graphical Interface Forms-Based Interface Interface for DBA (DB Administrator)
CH10.9
CSE4100
DBMS ArchitectureDBMS Architecture Main DBMS ModulesMain DBMS Modules
DDL Compiler DML Compiler Ad-hoc (Interactive) Query Compiler Run-time Database Processor Stored Data Manager Concurrency/Back-Up/Recovery Subsystem
DBMS Utility ModulesDBMS Utility Modules Loading Routines Backup Utility System Catalog/data Dictionary
CH10.10
CSE4100
Components of a DBMSComponents of a DBMS
CH10.11
CSE4100
ANSI/SPARC - Three Schema ArchitectureANSI/SPARC - Three Schema Architecture External Data Schema (Users’ view)External Data Schema (Users’ view) Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema) Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema) What are the Programming Language Analogies?What are the Programming Language Analogies?
CH10.12
CSE4100
Conceptual SchemaConceptual Schema Describes the Meaning of Data in the Universe of Discourse Describes the Meaning of Data in the Universe of Discourse
Emphasizes on General, Conceptually Relevant, and Often Time Invariant Structural Aspects of the Universe of Discourse
Excludes the Physical Organization and Access Aspects of the DataExcludes the Physical Organization and Access Aspects of the Data
This could be a UML Design that Realizes a Set of Classes (no data) or Java Class This could be a UML Design that Realizes a Set of Classes (no data) or Java Class Declarations (APIs)Declarations (APIs)
CH10.13
CSE4100
Conceptual SchemaConceptual Schema Another Example – A Programming Language Level Another Example – A Programming Language Level
DefinitionDefinition
CH10.14
CSE4100
External SchemaExternal Schema Describes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s ViewDescribes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s View Derived from the Conceptual SchemaDerived from the Conceptual Schema
What is the View of the Outside World in OO?What is the View of the Outside World in OO? Akin to Public InterfaceAkin to Public Interface
CH10.15
CSE4100
External SchemaExternal Schema Another ExampleAnother Example
CH10.16
CSE4100
Internal SchemaInternal Schema Describes How the Information Described in the Describes How the Information Described in the
Conceptual Schema is Physically Represented in a Conceptual Schema is Physically Represented in a Database to Provide the Overall Best PerformanceDatabase to Provide the Overall Best Performance
CH10.17
CSE4100
Internal SchemaInternal Schema Another ExampleAnother Example
This Corresponds to Data Typing and Layout in This Corresponds to Data Typing and Layout in Compilers from Runtime Environment!Compilers from Runtime Environment!
CH10.18
CSE4100
Unified Example of Three SchemasUnified Example of Three Schemas
CH10.19
CSE4100
Database Access ProcessDatabase Access Process What Does This Access Process Resemble?What Does This Access Process Resemble?
Akin to Runtime Execution Environment!Akin to Runtime Execution Environment! A More Complex Activation Process!A More Complex Activation Process!
CH10.20
CSE4100
Metadata vs. DataMetadata vs. Data
Recall Introspection and Reflection in Java where you Recall Introspection and Reflection in Java where you Can “Look” into the Class Definitions Themselves!Can “Look” into the Class Definitions Themselves!
CH10.21
CSE4100
Data IndependenceData Independence Ability that Allows Application Programs Not Being Ability that Allows Application Programs Not Being
Affected by Changes in Irrelevant Parts of the Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Conceptual Data Representation, Data Storage Structure and Data Access MethodsStructure and Data Access Methods
Invisibility (Transparency) of the Details of Entire Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Database Organization, Storage Structure and Access Strategy to the UsersStrategy to the Users
Recall Software Engineering Concepts:Recall Software Engineering Concepts: Abstraction the Details of an Application's
Components Can Be Hidden, Providing a Broad Perspective on the Design
Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users
Realized in Today’s Modern PLs!Realized in Today’s Modern PLs!
CH10.22
CSE4100
What are System Components?What are System Components? How are these Similar to Complier/PL Concepts?How are these Similar to Complier/PL Concepts?
CH10.23
CSE4100
Relational ModelRelational Model Relational Model of Data Based on the Concept of a Relational Model of Data Based on the Concept of a
RelationRelation Relation - a Mathematical Concept Based on SetsRelation - a Mathematical Concept Based on Sets Strength of the Relational Approach to Data Strength of the Relational Approach to Data
Management Comes From the Formal Foundation Management Comes From the Formal Foundation Provided by the Theory of RelationsProvided by the Theory of Relations
RELATION: A Table of ValuesRELATION: A Table of Values A Relation May Be Thought of as a Set of Rows A Relation May Alternately be Though of as a Set
of Columns Each Row of the Relation May Be Given an
Identifier Each Column Typically is Called by its Column
Name or Column Header or Attribute Name
CH10.24
CSE4100
Relational Tables - Rows/Columns/TuplesRelational Tables - Rows/Columns/Tuples
CH10.25
CSE4100
Relational Database DefinitionRelational Database DefinitionCREATE TABLE Student: Name(CHAR(30)), SSN(CHAR(9)), Gpa(FLOAT(2))CREATE TABLE Faculty: Name(CHAR(30)), SSN(CHAR(9)), Ophone(CHAR(7))CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6))CREATE TABLE Formats: Section#(INTEGER(3)), Quarter(CHAR(10)), Campus(CHAR(15))CREATE TABLE TakeorTeach: SSN(CHAR(9)), Course#(CHAR(6)), Section#(INTEGER(3))CREATE TABLE COfferings: Course#(CHAR(6)), Section#(INTEGER(3))
Student(Name*, SSN, Gpa)Faculty(Name*, SSN, Ophone)Courses(Course#*, Title, Descrip, PCourse#*)Formats(Section#*, Quarter, Campus)TakeorTeach(SSN, Course#, Section#)COfferings(Course#, Section#)
CH10.26
CSE4100
Relational ViewsRelational Views Two Views Derived From Prior TablesTwo Views Derived From Prior Tables
Student Transcript View Course Prerequisite View
CH10.27
CSE4100
SQL is a Partial Example of a Tuple Relational SQL is a Partial Example of a Tuple Relational LanguageLanguage Simple Queries are all Declarative More Complex Queries are both Declarative and
Procedural (e.g., joins, nested queries) Find the names of employees working on the CAD/CAM Find the names of employees working on the CAD/CAM
projectprojectSELECT EMP.ENAMEFROM EMP, WORKS, PROJWHERE (EMP.ENO= WORKS.ENO) AND (WORKS.PNO = PROJ.PNO) AND (PROJ.PNAME = “CAD/CAM”)
SQL Defines a Programming Language and Associated SQL Defines a Programming Language and Associated Semantics for Usage and ProcessingSemantics for Usage and Processing
SQL: Tuple Relational Calculus-BasedSQL: Tuple Relational Calculus-Based
CH10.28
CSE4100
SQL ComponentsSQL Components Data Definition Language (DDL)Data Definition Language (DDL)
For External and Conceptual Schemas Views - DDL for External Schemas
Data Manipulation Language (DML)Data Manipulation Language (DML) Interactive DML Against External and Conceptual
Schemas Embedded DML in Host PLs (EQL, JDBC, etc.)
Note: Separation of Definition (DDL) from Usage Note: Separation of Definition (DDL) from Usage (DML) – Is there Something Similar in PLs?(DML) – Is there Something Similar in PLs?
Others Others Integrity (Allowable Values/Referential) Transaction Control (Long-Duration and Batch) Authorization (Who can Do What When)
CH10.29
CSE4100
SQL DDL and DMLSQL DDL and DML Data Definition Language (DDL) - Data Definition Language (DDL) - DeclarationsDeclarations
Defining the Relational Schema - Relations, Attributes, Domains - The Meta-Data
CREATE TABLE Student: Name(CHAR(30)),SSN(CHAR(9)),GPA(FLOAT(2))CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)),
Descrip(CHAR(100)), PCourse#(CHAR(6)) Data Manipulation Language (DML) - Data Manipulation Language (DML) - CodeCode
Defining the Queries Against the SchemaSELECT Name, SSNFrom Student Where GPA > 3.00
CH10.30
CSE4100
Data Definition Language - DDLData Definition Language - DDL A Pre-Defined set of Primitive TypesA Pre-Defined set of Primitive Types
Numeric Character-string Bit-string Additional Types
Defining DomainsDefining Domains Defining SchemaDefining Schema Defining TablesDefining Tables Defining ViewsDefining Views Note: Each DBMS May have their Own DBMS Note: Each DBMS May have their Own DBMS
Specific Data Types - Is this Good or Bad?Specific Data Types - Is this Good or Bad? What is this Similar to re. Different C++ Compilers?What is this Similar to re. Different C++ Compilers? These are Akin to PL Data Types!These are Akin to PL Data Types!
CH10.31
CSE4100
DDL - Primitive TypesDDL - Primitive Types
NumericNumeric INTEGER (or INT), SMALLINT REAL, DOUBLE PRECISION FLOAT(N) Floating Point with at Least N Digits DECIMAL(P,D) (DEC(P,D) or NUMERIC(P,D))
have P Total Digits with D to Right of Decimal Note that INTs and REALs are Machine Dependent Note that INTs and REALs are Machine Dependent
(Based on Hardware/OS Platform)(Based on Hardware/OS Platform) Again – this is Similar to PLs/Compilers and Code Again – this is Similar to PLs/Compilers and Code
Generation – Data LayoutGeneration – Data Layout
CH10.32
CSE4100
DDL - Primitive TypesDDL - Primitive Types Character-StringCharacter-String
CHAR(N) or CHARACTER(N) - Fixed VARCHAR(N), CHAR VARYING(N), or
CHARACTER VARYING(N) Variable with at Most N Characters
Bit-StringsBit-Strings BIT(N) Fixed VARBIT(N) or BIT VARYING(N)
Variable with at Most N Bits
CH10.33
CSE4100
DDL - Primitive TypesDDL - Primitive Types These Specialized Primitive Types are Used to:These Specialized Primitive Types are Used to:
Simplify Modeling Process Include “Popular” Types Reduce Composite Attributes/Programming
DATE : YYYY-MM-DDDATE : YYYY-MM-DD TIME: HH-MM-SSTIME: HH-MM-SS TIME(I): HH-MM-SS-F....F - I Fraction SecondsTIME(I): HH-MM-SS-F....F - I Fraction Seconds TIME WITH TIME ZONE: HH-MM-SS-HH-MMTIME WITH TIME ZONE: HH-MM-SS-HH-MM TIME-STAMP:TIME-STAMP:
YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} PLs also have Specialized Types!PLs also have Specialized Types! Problem: Different Database Systems Sometime Problem: Different Database Systems Sometime
Implement these Types very DifferentlyImplement these Types very Differently This Impacts Portability!This Impacts Portability!
CH10.34
CSE4100
What is a SQL Schema?What is a SQL Schema? A Schema in SQL is the Major Meta-Data ConstructA Schema in SQL is the Major Meta-Data Construct Supports the Definition of:Supports the Definition of:
Relation - Table with Name Attributes - Columns and their Types Identification - Primary Key Constraints - Referential Integrity (FK)
Two Part DefinitionTwo Part Definition CREATE Schema - Named Database or
Conceptually Related Tables CREATE Table - Individual Tables of the Schema
CH10.35
CSE4100
DDL-Create/Drop a SchemaDDL-Create/Drop a Schema Creating a Schema:Creating a Schema:
CREATE SCHEMA CREATE SCHEMA MY_COMPANYMY_COMPANY AUTHORIZATION AUTHORIZATION DemurjianDemurjian;; Schema MY_COMPANY bas Been Created and is
Owner by the User “Demurjian” Tables can now be Created and Added to Schema
Dropping a Schema:Dropping a Schema:DROP SCHEMA DROP SCHEMA MY_COMPANYMY_COMPANY RESTRICT; RESTRICT;DROP SCHEMA DROP SCHEMA MY_COMPANYMY_COMPANY CASCADE CASCADE;; Restrict:
Drop Operation Fails If Schema is Not Empty Cascade:
Drop Operation Removes Everything in the Schema
CH10.36
CSE4100
DDL - Create TablesDDL - Create Tables
CREATE TABLE EMPLOYEE( FNAME VARCHAR(15) NOT NULL ,MINIT CHAR ,LNAME VARCHAR(15) NOT NULL ,SSN CHAR(9) NOT NULL ,BDATE DATEADDRESS VARCHAR(30) ,SEX CHAR ,SALARY DECIMAL(10,2) ,SUPERSSN CHAR(9) ,DNO INT NOT NULL ,PRIMARY KEY (SSN) ,FOREIGN KEY (SUPERSSN)
REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (DNO)
REFERENCES DEPARTMENT(DNUMBER) ) ;
CH10.37
CSE4100
DDL - Create Tables (continued)DDL - Create Tables (continued)
CREATE TABLE DEPARTMENT ( DNAME VARCHAR(15) NOT NULL ,
DNUMBER INT NOT NULL ,MGRSSN CHAR(9) NOT NULL , MGRSTARTDATE DATE , PRIMARY KEY (DNUMBER) , UNIQUE (DNAME) ,FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ) ;
CREATE TABLE DEPT_LOCATIONS (DNUMBER INT NOT NULL ,
DLOCATION VARCHAR(15) NOT NULL , PRIMARY KEY (DNUMBER, DLOCATION) ,
FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT(DNUMBER) ) ;
CH10.38
CSE4100
DDL - Create Tables (continued)DDL - Create Tables (continued)
CREATE TABLE PROJECT (PNAME VARCHAR(15) NOT NULL ,
PNUMBER INT NOT NULL ,PLOCATION VARCHAR(15) , DNUM INT NOT NULL , PRIMARY KEY (PNUMBER) , UNIQUE (PNAME) ,
FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER) ) ;
CREATE TABLE WORKS_ON (ESSN CHAR(9) NOT NULL , PNO INT NOT NULL ,
HOURS DECIMAL(3,1) NOT NULL , PRIMARY KEY (ESSN, PNO) , FOREIGN KEY (ESSN)
REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (PNO)
REFERENCES PROJECT(PNUMBER) ) ;
CH10.39
CSE4100
DDL - Create Tables with ConstraintsDDL - Create Tables with Constraints
CREATE TABLE EMPLOYEE( . . . ,DNO INT NOT NULL DEFAULT 1,CONSTRAINT EMPPK
PRIMARY KEY (SSN) ,CONSTRAINT EMPSUPERFK
FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN)ON DELETE SET NULLON UPDATE CASCADE ,
CONSTRAINT EMPDEPTFKFOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ON DELETE SET DEFAULT ON UPDATE CASCADE );
CH10.40
CSE4100
DDL - Create Tables with ConstraintsDDL - Create Tables with Constraints
Is there an Equivalent to Keys and Constraints in PLs?Is there an Equivalent to Keys and Constraints in PLs? What Does Java Have Internally?What Does Java Have Internally? Constraints Facilitate Type Checking at Data Level!Constraints Facilitate Type Checking at Data Level!
CREATE TABLE DEPARTMENT( . . . ,MGRSSN CHAR(9) NOT NULL
DEFAULT '888665555' ,. . . ,CONSTRAINT DEPTPK
PRIMARY KEY (DNUMBER) ,CONSTRAINT DEPTSK
UNIQUE (DNAME),CONSTRAINT DEPTMGRFK
FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN)ON DELETE SET DEFAULTON UPDATE CASCADE );
CH10.41
CSE4100
Data Manipulation Language - DMLData Manipulation Language - DML SQL has the SELECT Statement for Retrieving Info. SQL has the SELECT Statement for Retrieving Info.
from a Database (Not Relational Algebra Select)from a Database (Not Relational Algebra Select) SQL vs. Formal Relational ModelSQL vs. Formal Relational Model
SQL Allows a Table (Relation) to have Two or More Identical Tuples in All Their Attribute Values
Hence, an SQL Table is a Multi-set (Sometimes Called a Bag) of Tuples; it is Not a Set of Tuples
SQL Relations Can Be Constrained to Sets by SQL Relations Can Be Constrained to Sets by PRIMARY KEY or UNIQUE Attributes Using the DISTINCT Option in a Query
Implied Processing and Procedural SemanticsImplied Processing and Procedural Semantics SQL Queries have Specific Semantics These Semantics Dictate Processing Includes Code Generation, Optimization, etc.
CH10.42
CSE4100
Interactive DML - Main ComponentsInteractive DML - Main Components Select-from-where Statement Contains:Select-from-where Statement Contains:
Select Clause - Chosen Attributes/Columns From Clause - Involved Tables Where Clause - Constrain Tuple Values Tuple Variables - Distinguish Among Same Names
in Different Tables String Matching - Detailed Matching Including
Exact Starts With Near
Ordering of Rows - Sorting Tuple Results
CH10.43
CSE4100
Recall Prior Schema Recall Prior Schema
CH10.44
CSE4100
… …and Corresponding DB Tablesand Corresponding DB Tables
Which Represent Tuples/Instances of Each Relation
1455
ASCnullWBnullnull
CH10.45
CSE4100
… …and Corresponding DB Tablesand Corresponding DB Tables
CH10.46
CSE4100
Simple SQL QueriesSimple SQL Queries Query 0:Query 0: Retrieve the Birthdate and Address of the Retrieve the Birthdate and Address of the
Employee whose Name is 'John B. Smith'.Employee whose Name is 'John B. Smith'.SELECT BDATE, ADDRESSSELECT BDATE, ADDRESSFROM EMPLOYEEFROM EMPLOYEEWHERE FNAME='John' AND MINIT='B’WHERE FNAME='John' AND MINIT='B’ AND LNAME='Smith’ AND LNAME='Smith’
Which Row(s) are Selected?Which Row(s) are Selected?
Note: While All of these Next Queries are from Note: While All of these Next Queries are from Chapter 8, Some are From “Earlier” EditionChapter 8, Some are From “Earlier” Edition
BSCnullWB nullnull
CH10.47
CSE4100
Simple SQL QueriesSimple SQL Queries Query 1:Query 1: Retrieve Name and Address of all Employees who Retrieve Name and Address of all Employees who
work for the 'Research' Departmentwork for the 'Research' DepartmentSELECTSELECT FNAME, MINIT, LNAME, ADDRESS, DNAMEFNAME, MINIT, LNAME, ADDRESS, DNAMEFROM FROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENTWHEREWHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO
What Action is Being Performed? Join! Cartesian Product!What Action is Being Performed? Join! Cartesian Product!
CH10.48
CSE4100
Simple SQL Queries - ResultSimple SQL Queries - Result
Theta Join on DNO=DNUMBER
CH10.49
CSE4100
Simple SQL QueriesSimple SQL Queries Query 2:Query 2: For Every Project in 'Stafford', list the Project For Every Project in 'Stafford', list the Project
Number, the Controlling Dept. Number, and the Dept. Number, the Controlling Dept. Number, and the Dept. Manager's Last Name, Address, and BirthdateManager's Last Name, Address, and BirthdateSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND WHERE DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION='Stafford' PLOCATION='Stafford' In Q2, there are Two Join Conditions:In Q2, there are Two Join Conditions:
The Join Condition DNUM=DNUMBER Relates a Project to its Controlling Department
The Join Condition MGRSSN=SSN Relates the Controlling Department to the Employee who Manages that Department
CH10.50
CSE4100
Query ResultsQuery Results
ASCnullWBnullnull
SELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION='Stafford'
CH10.51
CSE4100
Qualification of AttributesQualification of Attributes In SQL, the Same Name for Two (or More) Attributes In SQL, the Same Name for Two (or More) Attributes
is Allowed if Attributes are in Different Relationsis Allowed if Attributes are in Different Relations In Those Cases, Query Must Qualify by Prefixing the In Those Cases, Query Must Qualify by Prefixing the
Relation Name to the Attribute NameRelation Name to the Attribute Name EMPLOYEE.LNAME, DEPARTMENT.DNAME
Aliases: When Queries Must Refer to the Same Aliases: When Queries Must Refer to the Same Relation TwiceRelation Twice Alias is Akin to a Variable – Reference in PL! In These Situations, it is Considered that there are
Two Different Copies of the Same Relation Let’s See Examples of Both ConceptsLet’s See Examples of Both Concepts
CH10.52
CSE4100
Attribute QualificationAttribute Qualification Query 8:Query 8: For Each Employee, Retrieve the Employee's For Each Employee, Retrieve the Employee's
Name, and Name of his or her Immediate SupervisorName, and Name of his or her Immediate SupervisorSELECTSELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME E.FNAME, E.LNAME, S.FNAME, S.LNAME
FROM FROM EMPLOYEE E SEMPLOYEE E SWHEREWHERE E.SUPERSSN=S.SSNE.SUPERSSN=S.SSN
E and S are E and S are aliasesaliases for the EMPLOYEE relation for the EMPLOYEE relation E Represents Employees in the Role of Supervisees
S Represents Employees in the Role of Supervisor
Another Form of Query 8 is:Another Form of Query 8 is:SELECTSELECT E.FNAME, E.LNAME, S.FNAME, S.LNAMEE.FNAME, E.LNAME, S.FNAME, S.LNAMEFROM FROM EMPLOYEE AS E, EMPLOYEE AS SEMPLOYEE AS E, EMPLOYEE AS SWHEREWHERE E.SUPERSSN=S.SSNE.SUPERSSN=S.SSN
CH10.53
CSE4100
Query ResultsQuery Results
ASCnullWBnullnull
SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAMEFROM EMPLOYEE AS E, EMPLOYEE AS SWHERE E.SUPERSSN=S.SSN
CH10.54
CSE4100
Nested QueriesNested Queries SQL SELECT SQL SELECT Nested QueryNested Query is Specified within is Specified within
WHERE-clause of another Query (the WHERE-clause of another Query (the Outer QueryOuter Query)) Query 1A:Query 1A: Retrieve the Name and Address of all Retrieve the Name and Address of all
Employees who Work for the 'Research' DepartmentEmployees who Work for the 'Research' DepartmentSELECTSELECT FNAME, LNAME, ADDRESSFNAME, LNAME, ADDRESSFROM FROM EMPLOYEEEMPLOYEEWHEREWHERE DNO IN DNO IN
(SELECT (SELECT DNUMBERDNUMBERFROMFROM DEPARTMENTDEPARTMENTWHEREWHERE DNAME='Research' )DNAME='Research' )
Note: This Reformulates Earlier Query 1 Note: This Reformulates Earlier Query 1 The End Result is Essentially:The End Result is Essentially:
Outer and Inner For/While Loops!
CH10.55
CSE4100
How Does Nested Query Work?How Does Nested Query Work? The Nested Query Selects Number of 'Research' Dept.The Nested Query Selects Number of 'Research' Dept. The Outer Query Selects an EMPLOYEE Tuple If Its The Outer Query Selects an EMPLOYEE Tuple If Its
DNO Value Is in the Result of Either Nested QueryDNO Value Is in the Result of Either Nested Query IN represents Set Inclusion of Result SetIN represents Set Inclusion of Result Set We Can Have Several Levels of Nested QueriesWe Can Have Several Levels of Nested Queries SELECTSELECT FNAME, LNAME, ADDRESSFNAME, LNAME, ADDRESS
FROM FROM EMPLOYEEEMPLOYEEWHEREWHERE DNO IN DNO IN
(SELECT (SELECT DNUMBERDNUMBERFROMFROM DEPARTMENTDEPARTMENTWHEREWHERE Dname=’Research' )Dname=’Research' )
CH10.56
CSE4100
NULLS in SQL QueriesNULLS in SQL Queries SQL Allows Queries that Check if a value is NULL SQL Allows Queries that Check if a value is NULL
(Missing or Undefined or not Applicable)(Missing or Undefined or not Applicable) SQL uses SQL uses ISIS or or IS NOTIS NOT to compare NULLs since it to compare NULLs since it
Considers each NULL value Distinct from other NULL Considers each NULL value Distinct from other NULL Values, so Values, so Equality Comparison is not AppropriateEquality Comparison is not Appropriate
Query 18:Query 18: Retrieve the names of all employees who do Retrieve the names of all employees who do not have supervisors.not have supervisors.SELECT SELECT FNAME, LNAMEFNAME, LNAMEFROMFROM EMPLOYEE EMPLOYEE WHEREWHERE SUPERSSN IS NULLSUPERSSN IS NULL
Why Would Such a Capability be Useful?Why Would Such a Capability be Useful? Downloading/Crossloading a Database Promoting a Attribute to PK/FK
CH10.57
CSE4100
Aggregate Functions in SQL QueriesAggregate Functions in SQL Queries Query 19:Query 19: Find Maximum Salary, Minimum Salary, Find Maximum Salary, Minimum Salary,
and Average Salary among all Employeesand Average Salary among all EmployeesSELECT SELECT MAX(SALARY), MIN(SALARY), MAX(SALARY), MIN(SALARY), AVG(SALARY)AVG(SALARY)FROMFROM EMPLOYEE EMPLOYEE
Query 20:Query 20: Find maximum and Minimum Salaries Find maximum and Minimum Salaries among 'Research' Department Employeesamong 'Research' Department EmployeesSELECT MAX(SALARY), MIN(SALARY) SELECT MAX(SALARY), MIN(SALARY) FROMFROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENT WHERE WHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO
What Does What Does Query 22Query 22 Do? Do? SELECT COUNT(*)SELECT COUNT(*)FROMFROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENTWHEREWHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO
CH10.58
CSE4100
Grouping in SQL QueriesGrouping in SQL Queries Query 24:Query 24: For Each Department, Retrieve the DNO, For Each Department, Retrieve the DNO,
Number of Employees, and Their Average SalaryNumber of Employees, and Their Average SalarySELECT DNO, COUNT (*), AVG (SALARY)SELECT DNO, COUNT (*), AVG (SALARY)FROMFROM EMPLOYEEEMPLOYEEGROUP BYGROUP BY DNODNO
EMPLOYEE tuples are Divided into Groups; each EMPLOYEE tuples are Divided into Groups; each group has the Same Value for Grouping Attribute group has the Same Value for Grouping Attribute DNODNO
COUNT and AVG functions are applied to each Group COUNT and AVG functions are applied to each Group of Tuples Aeparatelyof Tuples Aeparately
SELECT-clause Includes only the Grouping Attribute SELECT-clause Includes only the Grouping Attribute and the Functions to be Applied on each Tuple Groupand the Functions to be Applied on each Tuple Group
Are there PL Equivalents to these Data Oriented Are there PL Equivalents to these Data Oriented Actions? Yes – in Specific APIs but Not PL Itself!Actions? Yes – in Specific APIs but Not PL Itself!
CH10.59
CSE4100
Results of Results of Query 24:Query 24: SELECT DNO, COUNT (*), AVG (SALARY)SELECT DNO, COUNT (*), AVG (SALARY)
FROMFROM EMPLOYEEEMPLOYEEGROUP BYGROUP BY DNODNO
CH10.60
CSE4100
INSERT SQL QueriesINSERT SQL Queries Add one or more Tuples to a Relation, with Attribute Add one or more Tuples to a Relation, with Attribute
values Listed in the order specified in the CREATEvalues Listed in the order specified in the CREATE Update 1Update 1::
INSERT INTO EMPLOYEEINSERT INTO EMPLOYEEVALUES ('Richard','K','Marini', '653298653', VALUES ('Richard','K','Marini', '653298653',
'30-DEC-52', '98 Oak Forest,Katy,TX', 'M', '30-DEC-52', '98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 ) 37000,'987654321', 4 )
Another Form of Update 1:Another Form of Update 1:INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)
VALUES ('Richard','K','Marini')VALUES ('Richard','K','Marini') All PK and FK Values must be ProvidedAll PK and FK Values must be Provided Nulls are AllowedNulls are Allowed DDL Constraints are EnforcedDDL Constraints are Enforced Another form of “Type Checking” at Instance Level Another form of “Type Checking” at Instance Level
This is Akin to Dynamic Type Checking!
CH10.61
CSE4100
DELETE SQL QueriesDELETE SQL Queries Sample Deletes IncludeSample Deletes Include
DELETE FROM EMPLOYEEWHERE LNAME='Brown'DELETE FROM EMPLOYEEWHERE SSN='123456789’DELETE FROM EMPLOYEEWHERE DNO IN
(SELECT DNUMBER FROM DEPARTMENT
WHERE DNAME='Research')DELETE FROM EMPLOYEE
No. of Tuples Deleted Dependent on WHERE Clause No. of Tuples Deleted Dependent on WHERE Clause Referential Integrity Referential Integrity (Type Checking!) (Type Checking!) is Enforced is Enforced
During DELETEDuring DELETE
CH10.62
CSE4100
UPDATE SQL QueriesUPDATE SQL Queries Give all Employees in the 'Research' Dept. a 10% raiseGive all Employees in the 'Research' Dept. a 10% raise
UPDATE EMPLOYEEUPDATE EMPLOYEESETSET SALARY = SALARY *1.1 SALARY = SALARY *1.1
WHERE WHERE DNO IN DNO IN(SELECT(SELECT DNUMBERDNUMBER FROM FROM DEPARTMENTDEPARTMENT WHERE WHERE DNAME='Research')DNAME='Research')
Modified SALARY Value Depends on the Original Modified SALARY Value Depends on the Original SALARY Value in each TupleSALARY Value in each Tuple
SALARY = SALARY *1.1 SALARY = SALARY *1.1 - - Use PL InterpretationUse PL Interpretation
CH10.63
CSE4100
Query Processing and OptimizationQuery Processing and Optimization What are the Processing Issues for DBs? What are the Processing Issues for DBs?
Database Applications of Today and Tomorrow Require High Volumes of Information!
Increase of Information Still Requires High Performance!
Throughput and Response Time Where's the Bottleneck in DBS?
CPU ?? Main Memory Size/Speed ?? Virtual Memory Limitations ?? Communications Bus ?? I/O Channel ??
How Does this Relate to Compilers/PLs?How Does this Relate to Compilers/PLs?
CH10.64
CSE4100
90-10 Rule for Database Processing90-10 Rule for Database Processing Load (Transaction per second) vs. Load (Transaction per second) vs.
Performance (Response Time of Transactions)Performance (Response Time of Transactions) Processing of Large Amounts of Raw DataProcessing of Large Amounts of Raw Data
Addressed in Secondary Storage Staged to Main Memory
Identifying Relevant DataIdentifying Relevant Data Large Amounts of Raw Data Discarded Focus on Data Most Likely to Contain Answers Possible Loss of CPU and Main Memory Cycles
This is Double Jeopardy!This is Double Jeopardy! Load of DBS Must be Reduced Performance of DBS Degrades
CH10.65
CSE4100
Only 10% of Relevant Only 10% of Relevant Data has AnswersData has Answers
Note: Naive Approach to Database Searching Often Occurs Note: Naive Approach to Database Searching Often Occurs (Little or No Indexing in Practice!)(Little or No Indexing in Practice!)
90-10 Rule for Conventional DBS90-10 Rule for Conventional DBS
ApplicationPrograms
OperatingSystem
DatabaseFunctions
On-LineI/O
Disk I/O
Only 10% of Raw Data is Only 10% of Raw Data is RelevantRelevant
CH10.66
CSE4100
Query Optimization GoalQuery Optimization Goal Limit Costly Join Operation by Reducing Data to be Limit Costly Join Operation by Reducing Data to be
Scanned or that Participates in the JoinScanned or that Participates in the Join While Improving Selection and Projection can Help, the While Improving Selection and Projection can Help, the
Main Objective is JoinMain Objective is Join In Worst Case - Cartesian Product Can Improve by Introducing Indices on the Join
Attributes (R.B and S.C) to Limit “Product” Can Further Improve by Sorting on the Join
Attributes (R.B and S.C) This Reduces Block Accesses by Limiting the Number of
Blocks that Must be Examined in a Join If B’s Values Range from 0 to 100 and C from 50 to 150,
only need to Compare from 50 to 100 Focus is on Reducing Costly Ops – Same as PL Focus is on Reducing Costly Ops – Same as PL
Optimization to Replace * with +Optimization to Replace * with +
CH10.67
CSE4100
Query ProcessingQuery Processing Internal Data StructureInternal Data Structure
Memory Hierarchy Main Memory + Secondary Memory Information Must be Staged from Secondary to Primary
Memory for Database Operation Sequential Search
Brute force Approach Direct Access (Indexed Search)
Hash, Inverted Index file, Binary Search Tree, B-tree, B+-tree
Improves Selection by Focusing on Subset of Tuples that are Involved in the Answer and Equijoin by Not Having to Compare All Blocks in Two Relations
CH10.68
CSE4100
Algorithms for Database Query OperatorsAlgorithms for Database Query Operators Largely Fall into Three Classes: Sorting-Based Largely Fall into Three Classes: Sorting-Based
Methods, Hash-Based Methods, Index-Based MethodsMethods, Hash-Based Methods, Index-Based Methods Such Algorithms are Divided into Three Degrees of Such Algorithms are Divided into Three Degrees of
Difficulty and Cost (Limiting Factor is Size of Data)Difficulty and Cost (Limiting Factor is Size of Data) One Pass Algorithms
Where Data is Only Read Once From Disk Two-pass Algorithms
Data is Read from Disk, Processed in Some Way, Written Back to Disk, Read Again for Processing, etc.
Multi-pass Algorithms Where 3 or More Passes Are Required, i.e., Recursive
Generalization of the Two-pass Algorithms Akin to Multiple Pass Compilers at Data LevelAkin to Multiple Pass Compilers at Data Level
CH10.69
CSE4100
21 3 1000
Database Join and Sort are ExternalDatabase Join and Sort are External Suppose that your DBS has 1,000 1K Blocks of Suppose that your DBS has 1,000 1K Blocks of
Memory Available for Performing Operations (e.g., Memory Available for Performing Operations (e.g., Select, Project, Join, Union, Aggregation, etc.)Select, Project, Join, Union, Aggregation, etc.)
Suppose Sort R by R.BSuppose Sort R by R.B R Contains 5000 Blocks In order to Perform a Sort/Merge - You Must Use
External Algorithm since all 5000 Blocks Can Fit Into Memory at the Same Time
Suppose Join R (500 Blocks) and S (800 Blocks)Suppose Join R (500 Blocks) and S (800 Blocks) Again - their Total Exceeds Memory - Hence you
Must Take an Approach that Compares One Block of R with All Blocks of S, etc. (Slides 22,23)
CH10.70
CSE4100
Database Join and Sort are ExternalDatabase Join and Sort are External What’s True about Today’s DBMS Like Oracle?What’s True about Today’s DBMS Like Oracle? Oracle Recommends 2 Gigabytes of Primary MemoryOracle Recommends 2 Gigabytes of Primary Memory That 2 Gigabytes Must be Shared by:That 2 Gigabytes Must be Shared by:
Operating System Other Applications Running on “Same” Server
(Web Server, etc.) Database Management Software
Even if there was 1.5 Gigabytes Available, Modern Even if there was 1.5 Gigabytes Available, Modern DBs can Exceed that size Very EasilyDBs can Exceed that size Very Easily
Moreover, Moreover, Cartesian Product Could Exceed Available Mem. Join Could Require External Approach Since All
Tables Involved in Join Can’t fit in 1.5 Gigabytes External Sorting/Block Oriented Processing is NormExternal Sorting/Block Oriented Processing is Norm
CH10.71
CSE4100
The System CatalogThe System Catalog Store the Meta Information that Describes Each Store the Meta Information that Describes Each
Database, Including a Description ofDatabase, Including a Description of Conceptual Database Schema (Logical Data
Model) Relations, Attributes, Keys, Indexes, Views
Internal Schema External Schema
Store Information Needed by Specific DBMS ModulesStore Information Needed by Specific DBMS Modules Query Optimization Module Security and Authorization
CH10.72
CSE4100
Example of Catalog InformationExample of Catalog Information
CH10.73
CSE4100
Relational DBMS CatalogRelational DBMS Catalog All Metadata Stored as RelationsAll Metadata Stored as Relations Example of Metadata Tables are:Example of Metadata Tables are:
CH10.74
CSE4100
SELECT EMP.ENAME
FROM EMP, WORKS, PROJ
WHERE (EMP.ENO= WORKS.ENO)
AND (WORKS.PNO = PROJ.PNO)
AND (PROJ.PNAME = “CAD/CAM”)
Uses of System CatalogUses of System Catalog DDL Compilers:DDL Compilers:
Correct Definition ofRelations and Attributes
DML (Query) Compiler:DML (Query) Compiler: DML Parser
Guided by the Description of DML Syntax and the Schema Information in the Catalog, Generates a Query Tree after Parser
Optimizer Generates Access Paths that is Relatively Optimal for
Executing a Query/ DML Command, by Accessing the Database Structure Information (Schemas), and Mapping High-level SQL Queries Into Low-level File Access Commands
CH10.75
CSE4100
Revisit Typical Database ProcessingRevisit Typical Database Processing
Pre-Processing- Parser/Lexical- Optimizer/Views
Post-Processing- Collection of Results- Aggregation Operations- Security Checks
User Transaction
Response to User
Errors
High-Level Processing- Enqueue Trans.- Request Locks- Release Locks-Dequeue Trans.
ErrorsResults
Parsed and OptimizedUser Trans.
Low-Level Processing- Enqueue Trans.- Request Locks- Issue I/Os- Process Returned Data- Integrity Checks- Security Checks- Logging for Recovery- Release Locks- Dequeue Trans.
Concurrency ControlLock Request
Response Lock Request
Disk I/O
Recovery
I/ORequest
Results
CH10.76
CSE4100
Typical Database ProcessingTypical Database Processing Pre-ProcessingPre-Processing
Actions Taken Upon Receipt of a Query from User SQL Query via Query Tool or JDBC Call “Compilation” of DB Query Check Syntax, Semantics, Optimize, Develop Run-
Time Strategy (Similar to PL Compilation) Query is Translated to DB Transaction
A Transaction Contains Multiple DB Operations Transaction has Explicit Order of Operations
Database Transaction Must Succeed or Fail There is no Intermediate State Completely Executed and Committed or
Aborts at any Point and Undone New State or Previous State of DB
CH10.77
CSE4100
Typical Database ProcessingTypical Database Processing High-Level ProcessingHigh-Level Processing
Enqueue Transaction from Pre-Processing Transaction Must Wait for “Earlier” Transactions Remember - Shared DB State!
Request Locks from Concurrency Control All Locks Before Proceeding vs. Locks as Needed Avoid Deadlock and Livelock
Release Locks As Use of Data Completes to Increase Availability What Happens if Failure of Later Step in Transaction
Dequeue Transaction Completes Transaction Processing Return “Result” to Post-Processing
CH10.78
CSE4100
Typical Database ProcessingTypical Database Processing Low-Level ProcessingLow-Level Processing
Enqueue Transaction - Do Actual DB Operations Request Locks - Lower Granularity Level Issue I/Os - Based on Operations to Access
“Correct” and “Relevant” DB Records Process Returned Data - Aggregation, Sorting Integrity Checks: Do I/D/U Satisfy Constraints? Security Checks: Is DB R/I/D/U Allowed? Logging for Recovery - Commit the Transaction Release Locks - Available to Others Dequeue Transaction - Return Results to High-
Level Processing Note: The Multiple Operations of Each DB
Transaction All Must be Successful
CH10.79
CSE4100
Typical Database ProcessingTypical Database Processing Post ProcessingPost Processing
Collection of Results May be Passed Portions of Results as they Complete For Example, Sorted Blocks of Data that are then
Merged in a Final Step Aggregation Operations
May be Passed Aggregate Intermediate Results Sum for Different Departments to be Totaled
Security Checks Last Step Filtering to Insure Only Allowed Data is
Returned May Execute Query but Only see Aggregate Result
Send Results to User
CH10.80
CSE4100
Typical Database ProcessingTypical Database Processing Concurrency ControlConcurrency Control
Control Access to Information Data and Metadata Prevent Simultaneous Updates Ensure Database Always Correct and Consistent Serial Schedule vs. Serializable Transaction Two Types
Pessimistic - Locking-Based - Assume Collisions Will Occur - e.g., Peoplesoft Course Registration
Optimistic - Time-Based - Fix Problems After the Fact - e.g., ATM Machines Example
CC Manages Locks at Different Granularity Levels (Table, Attribute, View, Tuple, Metadata, etc.)
CH10.81
CSE4100
Typical Database ProcessingTypical Database Processing Disk I/ODisk I/O
Performs the Actual Disk I/O for Read/Writes Block Oriented Activity Maintain Queue of All I/O Requests
Ordering is Critical Related to Concurrency Control and Consistency
Single DB Transactions can have Multiple DB Operations
Disk I/Os for Different Operations at Different Times
High and Low Level Processing will Determine What Operations Needed When
Disk I/O - Relatively “Dumb”
CH10.82
CSE4100
Typical Database ProcessingTypical Database Processing RecoveryRecovery
Tightly Tied to DB Transaction Concept Transactions Must be:
Atomic - Happens or Doesn’t Durable - Once Committed, Results Survive Failure Consistent - Follows Protocol/Correct DB State
When Failure Occurs, Can we: Recover to a Correct “Earlier” State Reconcile all “Active” Transactions that were Executing
at Failure Time Involves Logging of Database Actions Objective: High Availability and Reliability
CH10.83
CSE4100
Query OptimizationQuery Optimization Not Really Optimizing, but Planning to Avoid Bad Not Really Optimizing, but Planning to Avoid Bad
Execution StrategiesExecution Strategies ModelsModels
Heuristics-Based Apply Transformation Rules According to a General
Strategy Focus on Relational Algebra that Underlies Each Query Improve the “Order” of Relational Operations
Cost-Based Minimize a Cost Function
I/O Cost + CPU Cost Subject to a Set of Constraints
CH10.84
CSE4100
Query Processing MethodologyQuery Processing Methodology
High-level Calculus-based Query
QueryPreprocessing
QueryPreprocessing
QueryOptimization
QueryOptimization
Algebraic Query (a tree structure) LOGICALSCHEMA
LOGICALSCHEMA
INTERNALSCHEMA
INTERNALSCHEMA
Execution Schedule (file access plan)
EXTERNALSCHEMA
EXTERNALSCHEMA
CH10.85
CSE4100
Refute Incorrect QueriesRefute Incorrect Queries Example: Example:
E(ENAME, ENO), P(JNO,JNAME), W(ENO,PNO,DUR)E(ENAME, ENO), P(JNO,JNAME), W(ENO,PNO,DUR) SELECTSELECT ENAME, PNAME ENAME, PNAME
FROMFROM E, P, W E, P, W WHEREWHERE DUR > 27 AND DUR < 25 DUR > 27 AND DUR < 25 IncorrectIncorrect
Disjoint Components are Useless Multiple Relations, Missing Joins, may not be
incorrect, but may indicate Cartesian product ContradictoryContradictory
Qualification can not be Satisfied by any Tuple DUR > 27 AND DUR < 25
CH10.86
CSE4100
SimplificationSimplification Why Simplify?Why Simplify?
The Simpler the Query, the Less Work there is and the Better the Performance
How? Use transformation rulesHow? Use transformation rules Elimination of Redundancy
Idempotency Rules Application of Transitivity Use of Integrity Rules
ExampleExample x > a and x > b DUR > 27 AND DUR > 25
CH10.87
CSE4100
RestructuringRestructuring Convert Relational Calculus to Convert Relational Calculus to
Relational AlgebraRelational Algebra Make use of Query TreesMake use of Query Trees ExampleExample Find the names of employees Find the names of employees
other than J. Doe who worked other than J. Doe who worked on the CAD/CAM project for on the CAD/CAM project for either 1 or 2 years.either 1 or 2 years.
SELECT ENAMEFROM E, W, PWHERE E.ENO=W.ENO AND W.JNO=P.JNO AND E.ENAME°"J. Doe"AND P.JNAME="CAD/CAM" AND (W.DUR=12 OR W.DUR=24)
ENAME
(DUR=12 OR DUR=24) AND
JNAME=“CAD/CAM” AND
ENAME°“J. DOE”
JNO
ENO
P W E
Project
Select
Join
CH10.88
CSE4100
Query Optimization ObjectivesQuery Optimization Objectives Improving PerformanceImproving Performance Arriving at a Query Plan of ExecutionArriving at a Query Plan of Execution Analyzing the Relational Algebra QueryAnalyzing the Relational Algebra Query
Replace Costly Operations Do Selections and Projections Early
Optimization Heuristics for the Relational AlgebraOptimization Heuristics for the Relational Algebra Performing Selection and Projection Before Join Combining Several Selections Over a Single
Relation Into One Selection Find Common Subexpressions Algebraic Rewriting/transformation Rules
General Transformation Rules for Relational Algebra General Transformation Rules for Relational Algebra (Equivalence-preserving Algebraic Rewriting Rules)(Equivalence-preserving Algebraic Rewriting Rules)
CH10.89
CSE4100
Why is it important?Why is it important?
SELECT ENAMEFROM E,WWHERE E.ENO = W.ENO AND W.RESP = "Manager"
Strategy 1Strategy 1 ENAME(RESP="Manager"E.ENO=G.ENO(E W))
Strategy 2Strategy 2 ENAME( E ENO(RESP="Manager"(W)))
Query Optimization: An ExampleQuery Optimization: An Example
CH10.90
CSE4100
Assume :Assume : card(E) = 4,000; card(W)=10,000 10% of tuples in W satisfy RESP="Manager"
(selection generates 1,000 tuples) Execution time Proportional to the Sum of the Execution time Proportional to the Sum of the
Cardinalities of the Temporary RelationsCardinalities of the Temporary Relations Searching is Done by Sequential ScanningSearching is Done by Sequential Scanning
Strategy 1Strategy 1 Strategy 2Strategy 2Cartesian prod.Cartesian prod. = 40,000,000= 40,000,000 Selection over WSelection over W = 10,000= 10,000Search over allSearch over all = = 40,000,00040,000,000 Join(4000*1000) Join(4000*1000) = = 4,000,0004,000,000
80,000,00080,000,000 4,010,000 4,010,000
Cost of AlternativesCost of Alternatives
CH10.91
CSE4100
General Query Optimization StrategyGeneral Query Optimization Strategy Perform Selections EarlyPerform Selections Early
Yields Smaller Intermediate Results Direct Impact on Subsequent Join/Cartesian Prod.
Combine Selections with a Prior Cartesian Product into Combine Selections with a Prior Cartesian Product into a Theta or Equi Joina Theta or Equi Join Join is a Cheaper Operation
Combine (Cascade) Selections and ProjectionsCombine (Cascade) Selections and Projections
ABAB((BB (R)) (R)) ABAB(R)(R)
pp11 (( p p22 (R)) (R)) pp11 ^ p ^ p22 (R) (R)
This Results in One Pass Instead of Two over TableThis Results in One Pass Instead of Two over Table
CH10.92
CSE4100
General Query Optimization StrategyGeneral Query Optimization Strategy Identify Common SubexpressionsIdentify Common Subexpressions
Compute Once and Store use Stored Version for Subsequent Times Often Useful When Views are Employed
Preprocess Data via Sorts and IndexesPreprocess Data via Sorts and Indexes Speeds up Searches and Joins by Limiting Scope
Evaluate and Assess Different Options Evaluate and Assess Different Options For Cartesian Product, Use Smaller Relation for
Comparison Use System Catalog (Meta-data) to Effect Order in
Query Execution Plan
CH10.93
CSE4100
Relational Algebra TransformationsRelational Algebra Transformations Cascade of SelectionCascade of Selection
p1 ^ p2 ^ …^ pn(R)p1
(p2(...(pn
(R))...))
Commutativity of SelectionCommutativity of Selection
p1(p2
(R))p2(p1
(R))
p1 orp2(R )p1
(R p2(R)
Cascade of ProjectionCascade of Projection
A1,A2, … An(R)A1(A2(...(An(R))...))
A1(R) if A1 A2 ... An Commuting Selection with ProjectionCommuting Selection with Projection
A1,A2,...,An(p(R))p(A1,A2,...,An(R)
CH10.94
CSE4100
Relational Algebra TransformationsRelational Algebra Transformations Commutativity of Theta Join and Cartesian ProductCommutativity of Theta Join and Cartesian Product
R A SS A R R SS R
Commuting Selection with Theta Join (Cartesian)Commuting Selection with Theta Join (Cartesian) p(A)(R S) p(A)(R)) S
A defined on R only
p(A)^p(B)(R S) p(A)(R)) p(B)(S))
(A defined on R, B defined on S) Also Holds for Theta Join as Well
Commuting Projection with Theta Join (Cartesian)Commuting Projection with Theta Join (Cartesian) C(R S) A(R) B(S) where AB=C A are Attributes in C for R and B are Attributes in C
for S
CH10.95
CSE4100
Relational Algebra TransformationsRelational Algebra Transformations Commutativity of Set OperationsCommutativity of Set Operations
R S S R R S S R
Associativity of Set OperationsAssociativity of Set Operations (R S) T R S T) (R S) T R (S T) (R S) S R (S T) (R S) S R (S T)
Commuting Select with Set OperationsCommuting Select with Set Operations
p(Ai)(R T) p(Ai)(R) p(Ai)(T)
where Ai is defined on both R and T
p(Ai)(R T) p(Ai)(R) p(Ai)(T)
where Ai is defined on both R and T
CH10.96
CSE4100
11. Commuting Projection with Union11. Commuting Projection with Union
C(R q(Aj,Bk) S) A’(R) q(Aj,Bk) B’(S)
C(R S) A’ (R) B’ (S)
where R[A] and S[B]
C = A' B' where A' A, B’ B12. Converting Selection/Cartesian Into Theta Join12. Converting Selection/Cartesian Into Theta Join
C (R S) R S
Relational Algebra TransformationsRelational Algebra Transformations
C
CH10.97
CSE4100
ENAME
(DUR=12 OR DUR=24) AND
JNAME=“CAD/CAM” AND
ENAME= “J. DOE”
JNO
ENOP
W E
Canonical query tree at the end of query preprocessing phase
E(ENAME, ENO)P(JNO,JNAME)
W(ENO,PNO,DUR)
Heuristic Optimization: ExampleHeuristic Optimization: Example
CH10.98
CSE4100 ENAME
DUR=12 OR DUR=24
JNAME=“CAD/CAM”
ENAME = “J. DOE”
JNO
ENOP
W E
Use cascading of selectionsrule to decompose selections
Heuristic Optimization– ExampleHeuristic Optimization– Example
CH10.99
CSE4100
E
ENAME = "J. Doe"
JNO
ENO
P W
ENAME
DUR=12 OR DUR=24
JNAME=“CAD/CAM” Push selection downusing commutativity of selection over join
Heuristic Optimization– ExampleHeuristic Optimization– Example
CH10.100
CSE4100
P
JNO
JNAME = "CAD/CAM"
E
ENAME = "J. Doe"
ENO
W
ENAME
DUR=12 OR DUR=24 Push selection downusing commutativity of selection over join
Heuristic Optimization–ExampleHeuristic Optimization–Example
CH10.101
CSE4100
E
ENAME
ENAME = "J. Doe"
WP
JNO
ENO
JNAME = "CAD/CAM" DUR =12 DUR=24
Push selection down
Heuristic Optimization–ExampleHeuristic Optimization–Example
CH10.102
CSE4100
E
ENAME
ENAME = "J. Doe"
WP
JNO
JNO,ENAME
ENO
JNAME = "CAD/CAM"
JNO
DUR =12 DUR=24
JNO,ENO
JNO,ENAMEDo early projection
Heuristic Optimization–ExampleHeuristic Optimization–Example
CH10.103
CSE4100
E
ENAME
ENAME = "J. Doe"
W
P
JNO
JNO,ENAME
ENO
JNAME = "CAD/CAM"
JNO
DUR =12 DUR=24
JNO,ENO
JNO,ENAME
Identify subtrees thatcan be implemented in one algorithm
Heuristic Optimization–ExampleHeuristic Optimization–Example
CH10.104
CSE4100
Books
Loans
Borrower
Borrower.Card_No = Loans.Card_No
X
X
Title
Date 1/1/88
Books.LC_No = Loans.LC_No
Heuristic Optimization: A Second ExampleHeuristic Optimization: A Second Example
Loans.LC_No,Loans.Card_No
Loans.LC_No
Borr.Card_No
Books.LC_No, Title
What is the Final Step? Combine Select and Cartesian Product
Result: Equijoins!
CH10.105
CSE4100
Cost-Based OptimizationCost-Based Optimization Reduce Defined Cost of Executing QueriesReduce Defined Cost of Executing Queries What is Involved in the Cost of Executing a Query?What is Involved in the Cost of Executing a Query?
Access Cost to Secondary Storage Search for Data Block (Index) Read/Write Index and Data Blocks
Storage Cost Index and Data Blocks Intermediate Files
Computation Cost Query Planning - Optimization Effort Record Search, Sort, Merge Actual Transaction/Query Operations
Communications Cost Transfer of Results to the User
CH10.106
CSE4100 Operation Complexity
SelectProject
(w/o duplicate elimination)O(n)
Project(with duplicate elimination)
GroupO(nlog n)
Join
Division
Set Operators
O(nlog n)
Cartesian Product O(n2)
Complexity of Relational OperationsComplexity of Relational Operations Assuming Assuming
Relations of Cardinality n
Sequential Scan of Data in each Relation
Complexity of Each Complexity of Each Operation is Operation is IndicatedIndicated
Avoid Cartesian Avoid Cartesian Product at All Costs!Product at All Costs!
CH10.107
CSE4100
Cost-Based OptimizationCost-Based Optimization To Understand Cost-Based Operations, we Must Focus To Understand Cost-Based Operations, we Must Focus
on Implementation Strategy of:on Implementation Strategy of: Select Project Join
For Select and Project - There is a Fixed Cost that we For Select and Project - There is a Fixed Cost that we Must Live WithMust Live With
For JoinFor Join Implementation Strategy Different Join Strategies
Objective:Objective: Minimize the Number of Blocks Involved
Note that Cost-Based and Relational Algebra Heuristic Note that Cost-Based and Relational Algebra Heuristic Optimization Can Complement One AnotherOptimization Can Complement One Another
CH10.108
CSE4100
Optimization SummaryOptimization Summary Most Systems Implement Only a Few StrategiesMost Systems Implement Only a Few Strategies The Number of Strategies that are Considered by Any The Number of Strategies that are Considered by Any
Query Optimizer is Limited Query Optimizer is Limited Some Systems Reduce the Number of Strategies by Some Systems Reduce the Number of Strategies by
Making a Heuristic Guess of Strategy for Each QueryMaking a Heuristic Guess of Strategy for Each Query The Optimizer Considers Every Possible Strategy,
but Terminates as Soon as it Determines the Cost is Greater than the Pre-chosen Strategy
Thus Only a Few Competing Strategies Require Full Analysis of the Cost
The Overhead of Query Optimization is Reduced Remember - Trade off in Optimization TimeRemember - Trade off in Optimization Time
For PL - Optimization is Pre-Execution (Compile) For DB - Optimization is Part of Execution (Run)