ch10.1 cse 4100 compiler concepts for database systems prof. steven a. demurjian computer science...

108
CH10.1 CSE 4100 Compiler Concepts for Database Compiler Concepts for Database Systems Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut 371 Fairfield Way, Unit 2155 Storrs, CT 06269-3155 [email protected] http://www.engr.uconn.edu/~steve (860) 486 - 4818

Post on 21-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.1

CSE4100

Compiler Concepts for Database SystemsCompiler Concepts for Database Systems

Prof. Steven A. Demurjian Computer Science & Engineering Department

The University of Connecticut371 Fairfield Way, Unit 2155

Storrs, CT [email protected]

http://www.engr.uconn.edu/~steve(860) 486 - 4818

Page 2: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.2

CSE4100

OverviewOverview Motivation and BackgroundMotivation and Background Database System Architecture Database System Architecture

Exploring its Capabilities Focusing on Compiler-Related Concepts

Compile Time Issues in Database SystemsCompile Time Issues in Database Systems The SQL Query Language Optimization Issues in Database Systems Typing

Runtime Issues in Database SystemsRuntime Issues in Database Systems Transaction Processing Execution for Complex Joins

Page 3: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.3

CSE4100

Database System ArchitectureDatabase System Architecture What are the Various Components?What are the Various Components? How do they Relate to Compilers?How do they Relate to Compilers?

Page 4: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.4

CSE4100

How Does it Compare to Java Environment?How Does it Compare to Java Environment?

Page 5: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.5

CSE4100

Database Concepts - SummaryDatabase Concepts - Summary Schema vs. DataSchema vs. Data

Database-Structured Collection of Data Describing Objects of Universe of Discourse being Modeling. A Database Consists of Schema and Data

Schema: Describes the Intension (Type) of Objects Data: Describes the Extension (Instances) of Objects

What is Schema w.r.t. Compilers? What is Data?What is Schema w.r.t. Compilers? What is Data?

Page 6: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.6

CSE4100

What is a DBMS?What is a DBMS? A Database Management System (DBMS) is the

Generalized Tool that Facilitates the Management of and Access to the Database

Main Functions: Defining a Database: Specifying Data Types,

Structures, and Constraints Constructing a Database: the Process of

Storing the Data Itself on Some Storage Medium

Manipulating a Database: Function for Querying Specific Data in the Database and Updating the Database

What are the Analogies of Each of the Main What are the Analogies of Each of the Main Functions w.r.t. Programming Languages and Functions w.r.t. Programming Languages and Compilers?Compilers?

Page 7: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.7

CSE4100

What is a DBMS?What is a DBMS? Additional Functions:Additional Functions:

Interaction with File Manager So that Details Related to Data Storage and Access are

Removed From Application Programs Integrity Enforcement

Guarantee Correctness, Validity, Consistency Security Enforcement

Prevent Data From Illegal Uses Concurrency Control

Control the Interference Between Concurrent Programs Recovery from Failure Query Processing and Optimization

Again – What are Relevant Compiler Concepts?Again – What are Relevant Compiler Concepts?

Page 8: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.8

CSE4100

DBMS ArchitectureDBMS Architecture DBMS LanguagesDBMS Languages

Data Definition Language (DDL) Data Manipulation Language (DML)

From Embedded Queries or DB Commands Within a Program

“Stand-alone” Query Language Host Language:Host Language:

DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++)

DBMS InterfacesDBMS Interfaces Menu-Based Interface Graphical Interface Forms-Based Interface Interface for DBA (DB Administrator)

Page 9: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.9

CSE4100

DBMS ArchitectureDBMS Architecture Main DBMS ModulesMain DBMS Modules

DDL Compiler DML Compiler Ad-hoc (Interactive) Query Compiler Run-time Database Processor Stored Data Manager Concurrency/Back-Up/Recovery Subsystem

DBMS Utility ModulesDBMS Utility Modules Loading Routines Backup Utility System Catalog/data Dictionary

Page 10: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.10

CSE4100

Components of a DBMSComponents of a DBMS

Page 11: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.11

CSE4100

ANSI/SPARC - Three Schema ArchitectureANSI/SPARC - Three Schema Architecture External Data Schema (Users’ view)External Data Schema (Users’ view) Conceptual Data Schema (Logical Schema)Conceptual Data Schema (Logical Schema) Internal Data Schema (Physical Schema)Internal Data Schema (Physical Schema) What are the Programming Language Analogies?What are the Programming Language Analogies?

Page 12: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.12

CSE4100

Conceptual SchemaConceptual Schema Describes the Meaning of Data in the Universe of Discourse Describes the Meaning of Data in the Universe of Discourse

Emphasizes on General, Conceptually Relevant, and Often Time Invariant Structural Aspects of the Universe of Discourse

Excludes the Physical Organization and Access Aspects of the DataExcludes the Physical Organization and Access Aspects of the Data

This could be a UML Design that Realizes a Set of Classes (no data) or Java Class This could be a UML Design that Realizes a Set of Classes (no data) or Java Class Declarations (APIs)Declarations (APIs)

Page 13: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.13

CSE4100

Conceptual SchemaConceptual Schema Another Example – A Programming Language Level Another Example – A Programming Language Level

DefinitionDefinition

Page 14: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.14

CSE4100

External SchemaExternal Schema Describes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s ViewDescribes Parts of the Information in the Conceptual Schema in a form Convenient to a Particular User Group’s View Derived from the Conceptual SchemaDerived from the Conceptual Schema

What is the View of the Outside World in OO?What is the View of the Outside World in OO? Akin to Public InterfaceAkin to Public Interface

Page 15: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.15

CSE4100

External SchemaExternal Schema Another ExampleAnother Example

Page 16: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.16

CSE4100

Internal SchemaInternal Schema Describes How the Information Described in the Describes How the Information Described in the

Conceptual Schema is Physically Represented in a Conceptual Schema is Physically Represented in a Database to Provide the Overall Best PerformanceDatabase to Provide the Overall Best Performance

Page 17: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.17

CSE4100

Internal SchemaInternal Schema Another ExampleAnother Example

This Corresponds to Data Typing and Layout in This Corresponds to Data Typing and Layout in Compilers from Runtime Environment!Compilers from Runtime Environment!

Page 18: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.18

CSE4100

Unified Example of Three SchemasUnified Example of Three Schemas

Page 19: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.19

CSE4100

Database Access ProcessDatabase Access Process What Does This Access Process Resemble?What Does This Access Process Resemble?

Akin to Runtime Execution Environment!Akin to Runtime Execution Environment! A More Complex Activation Process!A More Complex Activation Process!

Page 20: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.20

CSE4100

Metadata vs. DataMetadata vs. Data

Recall Introspection and Reflection in Java where you Recall Introspection and Reflection in Java where you Can “Look” into the Class Definitions Themselves!Can “Look” into the Class Definitions Themselves!

Page 21: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.21

CSE4100

Data IndependenceData Independence Ability that Allows Application Programs Not Being Ability that Allows Application Programs Not Being

Affected by Changes in Irrelevant Parts of the Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Conceptual Data Representation, Data Storage Structure and Data Access MethodsStructure and Data Access Methods

Invisibility (Transparency) of the Details of Entire Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Database Organization, Storage Structure and Access Strategy to the UsersStrategy to the Users

Recall Software Engineering Concepts:Recall Software Engineering Concepts: Abstraction the Details of an Application's

Components Can Be Hidden, Providing a Broad Perspective on the Design

Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users

Realized in Today’s Modern PLs!Realized in Today’s Modern PLs!

Page 22: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.22

CSE4100

What are System Components?What are System Components? How are these Similar to Complier/PL Concepts?How are these Similar to Complier/PL Concepts?

Page 23: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.23

CSE4100

Relational ModelRelational Model Relational Model of Data Based on the Concept of a Relational Model of Data Based on the Concept of a

RelationRelation Relation - a Mathematical Concept Based on SetsRelation - a Mathematical Concept Based on Sets Strength of the Relational Approach to Data Strength of the Relational Approach to Data

Management Comes From the Formal Foundation Management Comes From the Formal Foundation Provided by the Theory of RelationsProvided by the Theory of Relations

RELATION: A Table of ValuesRELATION: A Table of Values A Relation May Be Thought of as a Set of Rows A Relation May Alternately be Though of as a Set

of Columns Each Row of the Relation May Be Given an

Identifier Each Column Typically is Called by its Column

Name or Column Header or Attribute Name

Page 24: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.24

CSE4100

Relational Tables - Rows/Columns/TuplesRelational Tables - Rows/Columns/Tuples

Page 25: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.25

CSE4100

Relational Database DefinitionRelational Database DefinitionCREATE TABLE Student: Name(CHAR(30)), SSN(CHAR(9)), Gpa(FLOAT(2))CREATE TABLE Faculty: Name(CHAR(30)), SSN(CHAR(9)), Ophone(CHAR(7))CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)), PCourse#(CHAR(6))CREATE TABLE Formats: Section#(INTEGER(3)), Quarter(CHAR(10)), Campus(CHAR(15))CREATE TABLE TakeorTeach: SSN(CHAR(9)), Course#(CHAR(6)), Section#(INTEGER(3))CREATE TABLE COfferings: Course#(CHAR(6)), Section#(INTEGER(3))

Student(Name*, SSN, Gpa)Faculty(Name*, SSN, Ophone)Courses(Course#*, Title, Descrip, PCourse#*)Formats(Section#*, Quarter, Campus)TakeorTeach(SSN, Course#, Section#)COfferings(Course#, Section#)

Page 26: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.26

CSE4100

Relational ViewsRelational Views Two Views Derived From Prior TablesTwo Views Derived From Prior Tables

Student Transcript View Course Prerequisite View

Page 27: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.27

CSE4100

SQL is a Partial Example of a Tuple Relational SQL is a Partial Example of a Tuple Relational LanguageLanguage Simple Queries are all Declarative More Complex Queries are both Declarative and

Procedural (e.g., joins, nested queries) Find the names of employees working on the CAD/CAM Find the names of employees working on the CAD/CAM

projectprojectSELECT EMP.ENAMEFROM EMP, WORKS, PROJWHERE (EMP.ENO= WORKS.ENO) AND (WORKS.PNO = PROJ.PNO) AND (PROJ.PNAME = “CAD/CAM”)

SQL Defines a Programming Language and Associated SQL Defines a Programming Language and Associated Semantics for Usage and ProcessingSemantics for Usage and Processing

SQL: Tuple Relational Calculus-BasedSQL: Tuple Relational Calculus-Based

Page 28: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.28

CSE4100

SQL ComponentsSQL Components Data Definition Language (DDL)Data Definition Language (DDL)

For External and Conceptual Schemas Views - DDL for External Schemas

Data Manipulation Language (DML)Data Manipulation Language (DML) Interactive DML Against External and Conceptual

Schemas Embedded DML in Host PLs (EQL, JDBC, etc.)

Note: Separation of Definition (DDL) from Usage Note: Separation of Definition (DDL) from Usage (DML) – Is there Something Similar in PLs?(DML) – Is there Something Similar in PLs?

Others Others Integrity (Allowable Values/Referential) Transaction Control (Long-Duration and Batch) Authorization (Who can Do What When)

Page 29: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.29

CSE4100

SQL DDL and DMLSQL DDL and DML Data Definition Language (DDL) - Data Definition Language (DDL) - DeclarationsDeclarations

Defining the Relational Schema - Relations, Attributes, Domains - The Meta-Data

CREATE TABLE Student: Name(CHAR(30)),SSN(CHAR(9)),GPA(FLOAT(2))CREATE TABLE Courses: Course#(CHAR(6)), Title(CHAR(20)),

Descrip(CHAR(100)), PCourse#(CHAR(6)) Data Manipulation Language (DML) - Data Manipulation Language (DML) - CodeCode

Defining the Queries Against the SchemaSELECT Name, SSNFrom Student Where GPA > 3.00

Page 30: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.30

CSE4100

Data Definition Language - DDLData Definition Language - DDL A Pre-Defined set of Primitive TypesA Pre-Defined set of Primitive Types

Numeric Character-string Bit-string Additional Types

Defining DomainsDefining Domains Defining SchemaDefining Schema Defining TablesDefining Tables Defining ViewsDefining Views Note: Each DBMS May have their Own DBMS Note: Each DBMS May have their Own DBMS

Specific Data Types - Is this Good or Bad?Specific Data Types - Is this Good or Bad? What is this Similar to re. Different C++ Compilers?What is this Similar to re. Different C++ Compilers? These are Akin to PL Data Types!These are Akin to PL Data Types!

Page 31: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.31

CSE4100

DDL - Primitive TypesDDL - Primitive Types

NumericNumeric INTEGER (or INT), SMALLINT REAL, DOUBLE PRECISION FLOAT(N) Floating Point with at Least N Digits DECIMAL(P,D) (DEC(P,D) or NUMERIC(P,D))

have P Total Digits with D to Right of Decimal Note that INTs and REALs are Machine Dependent Note that INTs and REALs are Machine Dependent

(Based on Hardware/OS Platform)(Based on Hardware/OS Platform) Again – this is Similar to PLs/Compilers and Code Again – this is Similar to PLs/Compilers and Code

Generation – Data LayoutGeneration – Data Layout

Page 32: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.32

CSE4100

DDL - Primitive TypesDDL - Primitive Types Character-StringCharacter-String

CHAR(N) or CHARACTER(N) - Fixed VARCHAR(N), CHAR VARYING(N), or

CHARACTER VARYING(N) Variable with at Most N Characters

Bit-StringsBit-Strings BIT(N) Fixed VARBIT(N) or BIT VARYING(N)

Variable with at Most N Bits

Page 33: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.33

CSE4100

DDL - Primitive TypesDDL - Primitive Types These Specialized Primitive Types are Used to:These Specialized Primitive Types are Used to:

Simplify Modeling Process Include “Popular” Types Reduce Composite Attributes/Programming

DATE : YYYY-MM-DDDATE : YYYY-MM-DD TIME: HH-MM-SSTIME: HH-MM-SS TIME(I): HH-MM-SS-F....F - I Fraction SecondsTIME(I): HH-MM-SS-F....F - I Fraction Seconds TIME WITH TIME ZONE: HH-MM-SS-HH-MMTIME WITH TIME ZONE: HH-MM-SS-HH-MM TIME-STAMP:TIME-STAMP:

YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} YYYY-MM-DD-HH-MM-SS-F...F{-HH-MM} PLs also have Specialized Types!PLs also have Specialized Types! Problem: Different Database Systems Sometime Problem: Different Database Systems Sometime

Implement these Types very DifferentlyImplement these Types very Differently This Impacts Portability!This Impacts Portability!

Page 34: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.34

CSE4100

What is a SQL Schema?What is a SQL Schema? A Schema in SQL is the Major Meta-Data ConstructA Schema in SQL is the Major Meta-Data Construct Supports the Definition of:Supports the Definition of:

Relation - Table with Name Attributes - Columns and their Types Identification - Primary Key Constraints - Referential Integrity (FK)

Two Part DefinitionTwo Part Definition CREATE Schema - Named Database or

Conceptually Related Tables CREATE Table - Individual Tables of the Schema

Page 35: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.35

CSE4100

DDL-Create/Drop a SchemaDDL-Create/Drop a Schema Creating a Schema:Creating a Schema:

CREATE SCHEMA CREATE SCHEMA MY_COMPANYMY_COMPANY AUTHORIZATION AUTHORIZATION DemurjianDemurjian;; Schema MY_COMPANY bas Been Created and is

Owner by the User “Demurjian” Tables can now be Created and Added to Schema

Dropping a Schema:Dropping a Schema:DROP SCHEMA DROP SCHEMA MY_COMPANYMY_COMPANY RESTRICT; RESTRICT;DROP SCHEMA DROP SCHEMA MY_COMPANYMY_COMPANY CASCADE CASCADE;; Restrict:

Drop Operation Fails If Schema is Not Empty Cascade:

Drop Operation Removes Everything in the Schema

Page 36: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.36

CSE4100

DDL - Create TablesDDL - Create Tables

CREATE TABLE EMPLOYEE( FNAME VARCHAR(15) NOT NULL ,MINIT CHAR ,LNAME VARCHAR(15) NOT NULL ,SSN CHAR(9) NOT NULL ,BDATE DATEADDRESS VARCHAR(30) ,SEX CHAR ,SALARY DECIMAL(10,2) ,SUPERSSN CHAR(9) ,DNO INT NOT NULL ,PRIMARY KEY (SSN) ,FOREIGN KEY (SUPERSSN)

REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (DNO)

REFERENCES DEPARTMENT(DNUMBER) ) ;

Page 37: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.37

CSE4100

DDL - Create Tables (continued)DDL - Create Tables (continued)

CREATE TABLE DEPARTMENT ( DNAME VARCHAR(15) NOT NULL ,

DNUMBER INT NOT NULL ,MGRSSN CHAR(9) NOT NULL , MGRSTARTDATE DATE , PRIMARY KEY (DNUMBER) , UNIQUE (DNAME) ,FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN) ) ;

CREATE TABLE DEPT_LOCATIONS (DNUMBER INT NOT NULL ,

DLOCATION VARCHAR(15) NOT NULL , PRIMARY KEY (DNUMBER, DLOCATION) ,

FOREIGN KEY (DNUMBER) REFERENCES DEPARTMENT(DNUMBER) ) ;

Page 38: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.38

CSE4100

DDL - Create Tables (continued)DDL - Create Tables (continued)

CREATE TABLE PROJECT (PNAME VARCHAR(15) NOT NULL ,

PNUMBER INT NOT NULL ,PLOCATION VARCHAR(15) , DNUM INT NOT NULL , PRIMARY KEY (PNUMBER) , UNIQUE (PNAME) ,

FOREIGN KEY (DNUM) REFERENCES DEPARTMENT(DNUMBER) ) ;

CREATE TABLE WORKS_ON (ESSN CHAR(9) NOT NULL , PNO INT NOT NULL ,

HOURS DECIMAL(3,1) NOT NULL , PRIMARY KEY (ESSN, PNO) , FOREIGN KEY (ESSN)

REFERENCES EMPLOYEE(SSN) ,FOREIGN KEY (PNO)

REFERENCES PROJECT(PNUMBER) ) ;

Page 39: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.39

CSE4100

DDL - Create Tables with ConstraintsDDL - Create Tables with Constraints

CREATE TABLE EMPLOYEE( . . . ,DNO INT NOT NULL DEFAULT 1,CONSTRAINT EMPPK

PRIMARY KEY (SSN) ,CONSTRAINT EMPSUPERFK

FOREIGN KEY (SUPERSSN) REFERENCES EMPLOYEE(SSN)ON DELETE SET NULLON UPDATE CASCADE ,

CONSTRAINT EMPDEPTFKFOREIGN KEY (DNO) REFERENCES DEPARTMENT(DNUMBER) ON DELETE SET DEFAULT ON UPDATE CASCADE );

Page 40: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.40

CSE4100

DDL - Create Tables with ConstraintsDDL - Create Tables with Constraints

Is there an Equivalent to Keys and Constraints in PLs?Is there an Equivalent to Keys and Constraints in PLs? What Does Java Have Internally?What Does Java Have Internally? Constraints Facilitate Type Checking at Data Level!Constraints Facilitate Type Checking at Data Level!

CREATE TABLE DEPARTMENT( . . . ,MGRSSN CHAR(9) NOT NULL

DEFAULT '888665555' ,. . . ,CONSTRAINT DEPTPK

PRIMARY KEY (DNUMBER) ,CONSTRAINT DEPTSK

UNIQUE (DNAME),CONSTRAINT DEPTMGRFK

FOREIGN KEY (MGRSSN) REFERENCES EMPLOYEE(SSN)ON DELETE SET DEFAULTON UPDATE CASCADE );

Page 41: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.41

CSE4100

Data Manipulation Language - DMLData Manipulation Language - DML SQL has the SELECT Statement for Retrieving Info. SQL has the SELECT Statement for Retrieving Info.

from a Database (Not Relational Algebra Select)from a Database (Not Relational Algebra Select) SQL vs. Formal Relational ModelSQL vs. Formal Relational Model

SQL Allows a Table (Relation) to have Two or More Identical Tuples in All Their Attribute Values

Hence, an SQL Table is a Multi-set (Sometimes Called a Bag) of Tuples; it is Not a Set of Tuples

SQL Relations Can Be Constrained to Sets by SQL Relations Can Be Constrained to Sets by PRIMARY KEY or UNIQUE Attributes Using the DISTINCT Option in a Query

Implied Processing and Procedural SemanticsImplied Processing and Procedural Semantics SQL Queries have Specific Semantics These Semantics Dictate Processing Includes Code Generation, Optimization, etc.

Page 42: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.42

CSE4100

Interactive DML - Main ComponentsInteractive DML - Main Components Select-from-where Statement Contains:Select-from-where Statement Contains:

Select Clause - Chosen Attributes/Columns From Clause - Involved Tables Where Clause - Constrain Tuple Values Tuple Variables - Distinguish Among Same Names

in Different Tables String Matching - Detailed Matching Including

Exact Starts With Near

Ordering of Rows - Sorting Tuple Results

Page 43: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.43

CSE4100

Recall Prior Schema Recall Prior Schema

Page 44: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.44

CSE4100

… …and Corresponding DB Tablesand Corresponding DB Tables

Which Represent Tuples/Instances of Each Relation

1455

ASCnullWBnullnull

Page 45: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.45

CSE4100

… …and Corresponding DB Tablesand Corresponding DB Tables

Page 46: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.46

CSE4100

Simple SQL QueriesSimple SQL Queries Query 0:Query 0: Retrieve the Birthdate and Address of the Retrieve the Birthdate and Address of the

Employee whose Name is 'John B. Smith'.Employee whose Name is 'John B. Smith'.SELECT BDATE, ADDRESSSELECT BDATE, ADDRESSFROM EMPLOYEEFROM EMPLOYEEWHERE FNAME='John' AND MINIT='B’WHERE FNAME='John' AND MINIT='B’ AND LNAME='Smith’ AND LNAME='Smith’

Which Row(s) are Selected?Which Row(s) are Selected?

Note: While All of these Next Queries are from Note: While All of these Next Queries are from Chapter 8, Some are From “Earlier” EditionChapter 8, Some are From “Earlier” Edition

BSCnullWB nullnull

Page 47: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.47

CSE4100

Simple SQL QueriesSimple SQL Queries Query 1:Query 1: Retrieve Name and Address of all Employees who Retrieve Name and Address of all Employees who

work for the 'Research' Departmentwork for the 'Research' DepartmentSELECTSELECT FNAME, MINIT, LNAME, ADDRESS, DNAMEFNAME, MINIT, LNAME, ADDRESS, DNAMEFROM FROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENTWHEREWHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO

What Action is Being Performed? Join! Cartesian Product!What Action is Being Performed? Join! Cartesian Product!

Page 48: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.48

CSE4100

Simple SQL Queries - ResultSimple SQL Queries - Result

Theta Join on DNO=DNUMBER

Page 49: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.49

CSE4100

Simple SQL QueriesSimple SQL Queries Query 2:Query 2: For Every Project in 'Stafford', list the Project For Every Project in 'Stafford', list the Project

Number, the Controlling Dept. Number, and the Dept. Number, the Controlling Dept. Number, and the Dept. Manager's Last Name, Address, and BirthdateManager's Last Name, Address, and BirthdateSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSSELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND WHERE DNUM=DNUMBER AND MGRSSN=SSN AND

PLOCATION='Stafford' PLOCATION='Stafford' In Q2, there are Two Join Conditions:In Q2, there are Two Join Conditions:

The Join Condition DNUM=DNUMBER Relates a Project to its Controlling Department

The Join Condition MGRSSN=SSN Relates the Controlling Department to the Employee who Manages that Department

Page 50: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.50

CSE4100

Query ResultsQuery Results

ASCnullWBnullnull

SELECT PNUMBER, DNUM, LNAME, BDATE,ADDRESSFROM PROJECT, DEPARTMENT, EMPLOYEEWHERE DNUM=DNUMBER AND MGRSSN=SSN AND

PLOCATION='Stafford'

Page 51: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.51

CSE4100

Qualification of AttributesQualification of Attributes In SQL, the Same Name for Two (or More) Attributes In SQL, the Same Name for Two (or More) Attributes

is Allowed if Attributes are in Different Relationsis Allowed if Attributes are in Different Relations In Those Cases, Query Must Qualify by Prefixing the In Those Cases, Query Must Qualify by Prefixing the

Relation Name to the Attribute NameRelation Name to the Attribute Name EMPLOYEE.LNAME, DEPARTMENT.DNAME

Aliases: When Queries Must Refer to the Same Aliases: When Queries Must Refer to the Same Relation TwiceRelation Twice Alias is Akin to a Variable – Reference in PL! In These Situations, it is Considered that there are

Two Different Copies of the Same Relation Let’s See Examples of Both ConceptsLet’s See Examples of Both Concepts

Page 52: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.52

CSE4100

Attribute QualificationAttribute Qualification Query 8:Query 8: For Each Employee, Retrieve the Employee's For Each Employee, Retrieve the Employee's

Name, and Name of his or her Immediate SupervisorName, and Name of his or her Immediate SupervisorSELECTSELECT E.FNAME, E.LNAME, S.FNAME, S.LNAME E.FNAME, E.LNAME, S.FNAME, S.LNAME

FROM FROM EMPLOYEE E SEMPLOYEE E SWHEREWHERE E.SUPERSSN=S.SSNE.SUPERSSN=S.SSN

E and S are E and S are aliasesaliases for the EMPLOYEE relation for the EMPLOYEE relation E Represents Employees in the Role of Supervisees

S Represents Employees in the Role of Supervisor

Another Form of Query 8 is:Another Form of Query 8 is:SELECTSELECT E.FNAME, E.LNAME, S.FNAME, S.LNAMEE.FNAME, E.LNAME, S.FNAME, S.LNAMEFROM FROM EMPLOYEE AS E, EMPLOYEE AS SEMPLOYEE AS E, EMPLOYEE AS SWHEREWHERE E.SUPERSSN=S.SSNE.SUPERSSN=S.SSN

Page 53: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.53

CSE4100

Query ResultsQuery Results

ASCnullWBnullnull

SELECT E.FNAME, E.LNAME, S.FNAME, S.LNAMEFROM EMPLOYEE AS E, EMPLOYEE AS SWHERE E.SUPERSSN=S.SSN

Page 54: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.54

CSE4100

Nested QueriesNested Queries SQL SELECT SQL SELECT Nested QueryNested Query is Specified within is Specified within

WHERE-clause of another Query (the WHERE-clause of another Query (the Outer QueryOuter Query)) Query 1A:Query 1A: Retrieve the Name and Address of all Retrieve the Name and Address of all

Employees who Work for the 'Research' DepartmentEmployees who Work for the 'Research' DepartmentSELECTSELECT FNAME, LNAME, ADDRESSFNAME, LNAME, ADDRESSFROM FROM EMPLOYEEEMPLOYEEWHEREWHERE DNO IN DNO IN

(SELECT (SELECT DNUMBERDNUMBERFROMFROM DEPARTMENTDEPARTMENTWHEREWHERE DNAME='Research' )DNAME='Research' )

Note: This Reformulates Earlier Query 1 Note: This Reformulates Earlier Query 1 The End Result is Essentially:The End Result is Essentially:

Outer and Inner For/While Loops!

Page 55: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.55

CSE4100

How Does Nested Query Work?How Does Nested Query Work? The Nested Query Selects Number of 'Research' Dept.The Nested Query Selects Number of 'Research' Dept. The Outer Query Selects an EMPLOYEE Tuple If Its The Outer Query Selects an EMPLOYEE Tuple If Its

DNO Value Is in the Result of Either Nested QueryDNO Value Is in the Result of Either Nested Query IN represents Set Inclusion of Result SetIN represents Set Inclusion of Result Set We Can Have Several Levels of Nested QueriesWe Can Have Several Levels of Nested Queries SELECTSELECT FNAME, LNAME, ADDRESSFNAME, LNAME, ADDRESS

FROM FROM EMPLOYEEEMPLOYEEWHEREWHERE DNO IN DNO IN

(SELECT (SELECT DNUMBERDNUMBERFROMFROM DEPARTMENTDEPARTMENTWHEREWHERE Dname=’Research' )Dname=’Research' )

Page 56: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.56

CSE4100

NULLS in SQL QueriesNULLS in SQL Queries SQL Allows Queries that Check if a value is NULL SQL Allows Queries that Check if a value is NULL

(Missing or Undefined or not Applicable)(Missing or Undefined or not Applicable) SQL uses SQL uses ISIS or or IS NOTIS NOT to compare NULLs since it to compare NULLs since it

Considers each NULL value Distinct from other NULL Considers each NULL value Distinct from other NULL Values, so Values, so Equality Comparison is not AppropriateEquality Comparison is not Appropriate

Query 18:Query 18: Retrieve the names of all employees who do Retrieve the names of all employees who do not have supervisors.not have supervisors.SELECT SELECT FNAME, LNAMEFNAME, LNAMEFROMFROM EMPLOYEE EMPLOYEE WHEREWHERE SUPERSSN IS NULLSUPERSSN IS NULL

Why Would Such a Capability be Useful?Why Would Such a Capability be Useful? Downloading/Crossloading a Database Promoting a Attribute to PK/FK

Page 57: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.57

CSE4100

Aggregate Functions in SQL QueriesAggregate Functions in SQL Queries Query 19:Query 19: Find Maximum Salary, Minimum Salary, Find Maximum Salary, Minimum Salary,

and Average Salary among all Employeesand Average Salary among all EmployeesSELECT SELECT MAX(SALARY), MIN(SALARY), MAX(SALARY), MIN(SALARY), AVG(SALARY)AVG(SALARY)FROMFROM EMPLOYEE EMPLOYEE

Query 20:Query 20: Find maximum and Minimum Salaries Find maximum and Minimum Salaries among 'Research' Department Employeesamong 'Research' Department EmployeesSELECT MAX(SALARY), MIN(SALARY) SELECT MAX(SALARY), MIN(SALARY) FROMFROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENT WHERE WHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO

What Does What Does Query 22Query 22 Do? Do? SELECT COUNT(*)SELECT COUNT(*)FROMFROM EMPLOYEE, DEPARTMENTEMPLOYEE, DEPARTMENTWHEREWHERE DNAME='Research' ANDDNAME='Research' AND DNUMBER=DNODNUMBER=DNO

Page 58: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.58

CSE4100

Grouping in SQL QueriesGrouping in SQL Queries Query 24:Query 24: For Each Department, Retrieve the DNO, For Each Department, Retrieve the DNO,

Number of Employees, and Their Average SalaryNumber of Employees, and Their Average SalarySELECT DNO, COUNT (*), AVG (SALARY)SELECT DNO, COUNT (*), AVG (SALARY)FROMFROM EMPLOYEEEMPLOYEEGROUP BYGROUP BY DNODNO

EMPLOYEE tuples are Divided into Groups; each EMPLOYEE tuples are Divided into Groups; each group has the Same Value for Grouping Attribute group has the Same Value for Grouping Attribute DNODNO

COUNT and AVG functions are applied to each Group COUNT and AVG functions are applied to each Group of Tuples Aeparatelyof Tuples Aeparately

SELECT-clause Includes only the Grouping Attribute SELECT-clause Includes only the Grouping Attribute and the Functions to be Applied on each Tuple Groupand the Functions to be Applied on each Tuple Group

Are there PL Equivalents to these Data Oriented Are there PL Equivalents to these Data Oriented Actions? Yes – in Specific APIs but Not PL Itself!Actions? Yes – in Specific APIs but Not PL Itself!

Page 59: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.59

CSE4100

Results of Results of Query 24:Query 24: SELECT DNO, COUNT (*), AVG (SALARY)SELECT DNO, COUNT (*), AVG (SALARY)

FROMFROM EMPLOYEEEMPLOYEEGROUP BYGROUP BY DNODNO

Page 60: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.60

CSE4100

INSERT SQL QueriesINSERT SQL Queries Add one or more Tuples to a Relation, with Attribute Add one or more Tuples to a Relation, with Attribute

values Listed in the order specified in the CREATEvalues Listed in the order specified in the CREATE Update 1Update 1::

INSERT INTO EMPLOYEEINSERT INTO EMPLOYEEVALUES ('Richard','K','Marini', '653298653', VALUES ('Richard','K','Marini', '653298653',

'30-DEC-52', '98 Oak Forest,Katy,TX', 'M', '30-DEC-52', '98 Oak Forest,Katy,TX', 'M', 37000,'987654321', 4 ) 37000,'987654321', 4 )

Another Form of Update 1:Another Form of Update 1:INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)INSERT INTO EMPLOYEE (FNAME, LNAME, SSN)

VALUES ('Richard','K','Marini')VALUES ('Richard','K','Marini') All PK and FK Values must be ProvidedAll PK and FK Values must be Provided Nulls are AllowedNulls are Allowed DDL Constraints are EnforcedDDL Constraints are Enforced Another form of “Type Checking” at Instance Level Another form of “Type Checking” at Instance Level

This is Akin to Dynamic Type Checking!

Page 61: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.61

CSE4100

DELETE SQL QueriesDELETE SQL Queries Sample Deletes IncludeSample Deletes Include

DELETE FROM EMPLOYEEWHERE LNAME='Brown'DELETE FROM EMPLOYEEWHERE SSN='123456789’DELETE FROM EMPLOYEEWHERE DNO IN

(SELECT DNUMBER FROM DEPARTMENT

WHERE DNAME='Research')DELETE FROM EMPLOYEE

No. of Tuples Deleted Dependent on WHERE Clause No. of Tuples Deleted Dependent on WHERE Clause Referential Integrity Referential Integrity (Type Checking!) (Type Checking!) is Enforced is Enforced

During DELETEDuring DELETE

Page 62: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.62

CSE4100

UPDATE SQL QueriesUPDATE SQL Queries Give all Employees in the 'Research' Dept. a 10% raiseGive all Employees in the 'Research' Dept. a 10% raise

UPDATE EMPLOYEEUPDATE EMPLOYEESETSET SALARY = SALARY *1.1 SALARY = SALARY *1.1

WHERE WHERE DNO IN DNO IN(SELECT(SELECT DNUMBERDNUMBER FROM FROM DEPARTMENTDEPARTMENT WHERE WHERE DNAME='Research')DNAME='Research')

Modified SALARY Value Depends on the Original Modified SALARY Value Depends on the Original SALARY Value in each TupleSALARY Value in each Tuple

SALARY = SALARY *1.1 SALARY = SALARY *1.1 - - Use PL InterpretationUse PL Interpretation

Page 63: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.63

CSE4100

Query Processing and OptimizationQuery Processing and Optimization What are the Processing Issues for DBs? What are the Processing Issues for DBs?

Database Applications of Today and Tomorrow Require High Volumes of Information!

Increase of Information Still Requires High Performance!

Throughput and Response Time Where's the Bottleneck in DBS?

CPU ?? Main Memory Size/Speed ?? Virtual Memory Limitations ?? Communications Bus ?? I/O Channel ??

How Does this Relate to Compilers/PLs?How Does this Relate to Compilers/PLs?

Page 64: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.64

CSE4100

90-10 Rule for Database Processing90-10 Rule for Database Processing Load (Transaction per second) vs. Load (Transaction per second) vs.

Performance (Response Time of Transactions)Performance (Response Time of Transactions) Processing of Large Amounts of Raw DataProcessing of Large Amounts of Raw Data

Addressed in Secondary Storage Staged to Main Memory

Identifying Relevant DataIdentifying Relevant Data Large Amounts of Raw Data Discarded Focus on Data Most Likely to Contain Answers Possible Loss of CPU and Main Memory Cycles

This is Double Jeopardy!This is Double Jeopardy! Load of DBS Must be Reduced Performance of DBS Degrades

Page 65: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.65

CSE4100

Only 10% of Relevant Only 10% of Relevant Data has AnswersData has Answers

Note: Naive Approach to Database Searching Often Occurs Note: Naive Approach to Database Searching Often Occurs (Little or No Indexing in Practice!)(Little or No Indexing in Practice!)

90-10 Rule for Conventional DBS90-10 Rule for Conventional DBS

ApplicationPrograms

OperatingSystem

DatabaseFunctions

On-LineI/O

Disk I/O

Only 10% of Raw Data is Only 10% of Raw Data is RelevantRelevant

Page 66: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.66

CSE4100

Query Optimization GoalQuery Optimization Goal Limit Costly Join Operation by Reducing Data to be Limit Costly Join Operation by Reducing Data to be

Scanned or that Participates in the JoinScanned or that Participates in the Join While Improving Selection and Projection can Help, the While Improving Selection and Projection can Help, the

Main Objective is JoinMain Objective is Join In Worst Case - Cartesian Product Can Improve by Introducing Indices on the Join

Attributes (R.B and S.C) to Limit “Product” Can Further Improve by Sorting on the Join

Attributes (R.B and S.C) This Reduces Block Accesses by Limiting the Number of

Blocks that Must be Examined in a Join If B’s Values Range from 0 to 100 and C from 50 to 150,

only need to Compare from 50 to 100 Focus is on Reducing Costly Ops – Same as PL Focus is on Reducing Costly Ops – Same as PL

Optimization to Replace * with +Optimization to Replace * with +

Page 67: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.67

CSE4100

Query ProcessingQuery Processing Internal Data StructureInternal Data Structure

Memory Hierarchy Main Memory + Secondary Memory Information Must be Staged from Secondary to Primary

Memory for Database Operation Sequential Search

Brute force Approach Direct Access (Indexed Search)

Hash, Inverted Index file, Binary Search Tree, B-tree, B+-tree

Improves Selection by Focusing on Subset of Tuples that are Involved in the Answer and Equijoin by Not Having to Compare All Blocks in Two Relations

Page 68: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.68

CSE4100

Algorithms for Database Query OperatorsAlgorithms for Database Query Operators Largely Fall into Three Classes: Sorting-Based Largely Fall into Three Classes: Sorting-Based

Methods, Hash-Based Methods, Index-Based MethodsMethods, Hash-Based Methods, Index-Based Methods Such Algorithms are Divided into Three Degrees of Such Algorithms are Divided into Three Degrees of

Difficulty and Cost (Limiting Factor is Size of Data)Difficulty and Cost (Limiting Factor is Size of Data) One Pass Algorithms

Where Data is Only Read Once From Disk Two-pass Algorithms

Data is Read from Disk, Processed in Some Way, Written Back to Disk, Read Again for Processing, etc.

Multi-pass Algorithms Where 3 or More Passes Are Required, i.e., Recursive

Generalization of the Two-pass Algorithms Akin to Multiple Pass Compilers at Data LevelAkin to Multiple Pass Compilers at Data Level

Page 69: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.69

CSE4100

21 3 1000

Database Join and Sort are ExternalDatabase Join and Sort are External Suppose that your DBS has 1,000 1K Blocks of Suppose that your DBS has 1,000 1K Blocks of

Memory Available for Performing Operations (e.g., Memory Available for Performing Operations (e.g., Select, Project, Join, Union, Aggregation, etc.)Select, Project, Join, Union, Aggregation, etc.)

Suppose Sort R by R.BSuppose Sort R by R.B R Contains 5000 Blocks In order to Perform a Sort/Merge - You Must Use

External Algorithm since all 5000 Blocks Can Fit Into Memory at the Same Time

Suppose Join R (500 Blocks) and S (800 Blocks)Suppose Join R (500 Blocks) and S (800 Blocks) Again - their Total Exceeds Memory - Hence you

Must Take an Approach that Compares One Block of R with All Blocks of S, etc. (Slides 22,23)

Page 70: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.70

CSE4100

Database Join and Sort are ExternalDatabase Join and Sort are External What’s True about Today’s DBMS Like Oracle?What’s True about Today’s DBMS Like Oracle? Oracle Recommends 2 Gigabytes of Primary MemoryOracle Recommends 2 Gigabytes of Primary Memory That 2 Gigabytes Must be Shared by:That 2 Gigabytes Must be Shared by:

Operating System Other Applications Running on “Same” Server

(Web Server, etc.) Database Management Software

Even if there was 1.5 Gigabytes Available, Modern Even if there was 1.5 Gigabytes Available, Modern DBs can Exceed that size Very EasilyDBs can Exceed that size Very Easily

Moreover, Moreover, Cartesian Product Could Exceed Available Mem. Join Could Require External Approach Since All

Tables Involved in Join Can’t fit in 1.5 Gigabytes External Sorting/Block Oriented Processing is NormExternal Sorting/Block Oriented Processing is Norm

Page 71: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.71

CSE4100

The System CatalogThe System Catalog Store the Meta Information that Describes Each Store the Meta Information that Describes Each

Database, Including a Description ofDatabase, Including a Description of Conceptual Database Schema (Logical Data

Model) Relations, Attributes, Keys, Indexes, Views

Internal Schema External Schema

Store Information Needed by Specific DBMS ModulesStore Information Needed by Specific DBMS Modules Query Optimization Module Security and Authorization

Page 72: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.72

CSE4100

Example of Catalog InformationExample of Catalog Information

Page 73: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.73

CSE4100

Relational DBMS CatalogRelational DBMS Catalog All Metadata Stored as RelationsAll Metadata Stored as Relations Example of Metadata Tables are:Example of Metadata Tables are:

Page 74: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.74

CSE4100

SELECT EMP.ENAME

FROM EMP, WORKS, PROJ

WHERE (EMP.ENO= WORKS.ENO)

AND (WORKS.PNO = PROJ.PNO)

AND (PROJ.PNAME = “CAD/CAM”)

Uses of System CatalogUses of System Catalog DDL Compilers:DDL Compilers:

Correct Definition ofRelations and Attributes

DML (Query) Compiler:DML (Query) Compiler: DML Parser

Guided by the Description of DML Syntax and the Schema Information in the Catalog, Generates a Query Tree after Parser

Optimizer Generates Access Paths that is Relatively Optimal for

Executing a Query/ DML Command, by Accessing the Database Structure Information (Schemas), and Mapping High-level SQL Queries Into Low-level File Access Commands

Page 75: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.75

CSE4100

Revisit Typical Database ProcessingRevisit Typical Database Processing

Pre-Processing- Parser/Lexical- Optimizer/Views

Post-Processing- Collection of Results- Aggregation Operations- Security Checks

User Transaction

Response to User

Errors

High-Level Processing- Enqueue Trans.- Request Locks- Release Locks-Dequeue Trans.

ErrorsResults

Parsed and OptimizedUser Trans.

Low-Level Processing- Enqueue Trans.- Request Locks- Issue I/Os- Process Returned Data- Integrity Checks- Security Checks- Logging for Recovery- Release Locks- Dequeue Trans.

Concurrency ControlLock Request

Response Lock Request

Disk I/O

Recovery

I/ORequest

Results

Page 76: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.76

CSE4100

Typical Database ProcessingTypical Database Processing Pre-ProcessingPre-Processing

Actions Taken Upon Receipt of a Query from User SQL Query via Query Tool or JDBC Call “Compilation” of DB Query Check Syntax, Semantics, Optimize, Develop Run-

Time Strategy (Similar to PL Compilation) Query is Translated to DB Transaction

A Transaction Contains Multiple DB Operations Transaction has Explicit Order of Operations

Database Transaction Must Succeed or Fail There is no Intermediate State Completely Executed and Committed or

Aborts at any Point and Undone New State or Previous State of DB

Page 77: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.77

CSE4100

Typical Database ProcessingTypical Database Processing High-Level ProcessingHigh-Level Processing

Enqueue Transaction from Pre-Processing Transaction Must Wait for “Earlier” Transactions Remember - Shared DB State!

Request Locks from Concurrency Control All Locks Before Proceeding vs. Locks as Needed Avoid Deadlock and Livelock

Release Locks As Use of Data Completes to Increase Availability What Happens if Failure of Later Step in Transaction

Dequeue Transaction Completes Transaction Processing Return “Result” to Post-Processing

Page 78: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.78

CSE4100

Typical Database ProcessingTypical Database Processing Low-Level ProcessingLow-Level Processing

Enqueue Transaction - Do Actual DB Operations Request Locks - Lower Granularity Level Issue I/Os - Based on Operations to Access

“Correct” and “Relevant” DB Records Process Returned Data - Aggregation, Sorting Integrity Checks: Do I/D/U Satisfy Constraints? Security Checks: Is DB R/I/D/U Allowed? Logging for Recovery - Commit the Transaction Release Locks - Available to Others Dequeue Transaction - Return Results to High-

Level Processing Note: The Multiple Operations of Each DB

Transaction All Must be Successful

Page 79: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.79

CSE4100

Typical Database ProcessingTypical Database Processing Post ProcessingPost Processing

Collection of Results May be Passed Portions of Results as they Complete For Example, Sorted Blocks of Data that are then

Merged in a Final Step Aggregation Operations

May be Passed Aggregate Intermediate Results Sum for Different Departments to be Totaled

Security Checks Last Step Filtering to Insure Only Allowed Data is

Returned May Execute Query but Only see Aggregate Result

Send Results to User

Page 80: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.80

CSE4100

Typical Database ProcessingTypical Database Processing Concurrency ControlConcurrency Control

Control Access to Information Data and Metadata Prevent Simultaneous Updates Ensure Database Always Correct and Consistent Serial Schedule vs. Serializable Transaction Two Types

Pessimistic - Locking-Based - Assume Collisions Will Occur - e.g., Peoplesoft Course Registration

Optimistic - Time-Based - Fix Problems After the Fact - e.g., ATM Machines Example

CC Manages Locks at Different Granularity Levels (Table, Attribute, View, Tuple, Metadata, etc.)

Page 81: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.81

CSE4100

Typical Database ProcessingTypical Database Processing Disk I/ODisk I/O

Performs the Actual Disk I/O for Read/Writes Block Oriented Activity Maintain Queue of All I/O Requests

Ordering is Critical Related to Concurrency Control and Consistency

Single DB Transactions can have Multiple DB Operations

Disk I/Os for Different Operations at Different Times

High and Low Level Processing will Determine What Operations Needed When

Disk I/O - Relatively “Dumb”

Page 82: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.82

CSE4100

Typical Database ProcessingTypical Database Processing RecoveryRecovery

Tightly Tied to DB Transaction Concept Transactions Must be:

Atomic - Happens or Doesn’t Durable - Once Committed, Results Survive Failure Consistent - Follows Protocol/Correct DB State

When Failure Occurs, Can we: Recover to a Correct “Earlier” State Reconcile all “Active” Transactions that were Executing

at Failure Time Involves Logging of Database Actions Objective: High Availability and Reliability

Page 83: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.83

CSE4100

Query OptimizationQuery Optimization Not Really Optimizing, but Planning to Avoid Bad Not Really Optimizing, but Planning to Avoid Bad

Execution StrategiesExecution Strategies ModelsModels

Heuristics-Based Apply Transformation Rules According to a General

Strategy Focus on Relational Algebra that Underlies Each Query Improve the “Order” of Relational Operations

Cost-Based Minimize a Cost Function

I/O Cost + CPU Cost Subject to a Set of Constraints

Page 84: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.84

CSE4100

Query Processing MethodologyQuery Processing Methodology

High-level Calculus-based Query

QueryPreprocessing

QueryPreprocessing

QueryOptimization

QueryOptimization

Algebraic Query (a tree structure) LOGICALSCHEMA

LOGICALSCHEMA

INTERNALSCHEMA

INTERNALSCHEMA

Execution Schedule (file access plan)

EXTERNALSCHEMA

EXTERNALSCHEMA

Page 85: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.85

CSE4100

Refute Incorrect QueriesRefute Incorrect Queries Example: Example:

E(ENAME, ENO), P(JNO,JNAME), W(ENO,PNO,DUR)E(ENAME, ENO), P(JNO,JNAME), W(ENO,PNO,DUR) SELECTSELECT ENAME, PNAME ENAME, PNAME

FROMFROM E, P, W E, P, W WHEREWHERE DUR > 27 AND DUR < 25 DUR > 27 AND DUR < 25 IncorrectIncorrect

Disjoint Components are Useless Multiple Relations, Missing Joins, may not be

incorrect, but may indicate Cartesian product ContradictoryContradictory

Qualification can not be Satisfied by any Tuple DUR > 27 AND DUR < 25

Page 86: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.86

CSE4100

SimplificationSimplification Why Simplify?Why Simplify?

The Simpler the Query, the Less Work there is and the Better the Performance

How? Use transformation rulesHow? Use transformation rules Elimination of Redundancy

Idempotency Rules Application of Transitivity Use of Integrity Rules

ExampleExample x > a and x > b DUR > 27 AND DUR > 25

Page 87: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.87

CSE4100

RestructuringRestructuring Convert Relational Calculus to Convert Relational Calculus to

Relational AlgebraRelational Algebra Make use of Query TreesMake use of Query Trees ExampleExample Find the names of employees Find the names of employees

other than J. Doe who worked other than J. Doe who worked on the CAD/CAM project for on the CAD/CAM project for either 1 or 2 years.either 1 or 2 years.

SELECT ENAMEFROM E, W, PWHERE E.ENO=W.ENO AND W.JNO=P.JNO AND E.ENAME°"J. Doe"AND P.JNAME="CAD/CAM" AND (W.DUR=12 OR W.DUR=24)

ENAME

(DUR=12 OR DUR=24) AND

JNAME=“CAD/CAM” AND

ENAME°“J. DOE”

JNO

ENO

P W E

Project

Select

Join

Page 88: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.88

CSE4100

Query Optimization ObjectivesQuery Optimization Objectives Improving PerformanceImproving Performance Arriving at a Query Plan of ExecutionArriving at a Query Plan of Execution Analyzing the Relational Algebra QueryAnalyzing the Relational Algebra Query

Replace Costly Operations Do Selections and Projections Early

Optimization Heuristics for the Relational AlgebraOptimization Heuristics for the Relational Algebra Performing Selection and Projection Before Join Combining Several Selections Over a Single

Relation Into One Selection Find Common Subexpressions Algebraic Rewriting/transformation Rules

General Transformation Rules for Relational Algebra General Transformation Rules for Relational Algebra (Equivalence-preserving Algebraic Rewriting Rules)(Equivalence-preserving Algebraic Rewriting Rules)

Page 89: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.89

CSE4100

Why is it important?Why is it important?

SELECT ENAMEFROM E,WWHERE E.ENO = W.ENO AND W.RESP = "Manager"

Strategy 1Strategy 1 ENAME(RESP="Manager"E.ENO=G.ENO(E W))

Strategy 2Strategy 2 ENAME( E ENO(RESP="Manager"(W)))

Query Optimization: An ExampleQuery Optimization: An Example

Page 90: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.90

CSE4100

Assume :Assume : card(E) = 4,000; card(W)=10,000 10% of tuples in W satisfy RESP="Manager"

(selection generates 1,000 tuples) Execution time Proportional to the Sum of the Execution time Proportional to the Sum of the

Cardinalities of the Temporary RelationsCardinalities of the Temporary Relations Searching is Done by Sequential ScanningSearching is Done by Sequential Scanning

Strategy 1Strategy 1 Strategy 2Strategy 2Cartesian prod.Cartesian prod. = 40,000,000= 40,000,000 Selection over WSelection over W = 10,000= 10,000Search over allSearch over all = = 40,000,00040,000,000 Join(4000*1000) Join(4000*1000) = = 4,000,0004,000,000

80,000,00080,000,000 4,010,000 4,010,000

Cost of AlternativesCost of Alternatives

Page 91: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.91

CSE4100

General Query Optimization StrategyGeneral Query Optimization Strategy Perform Selections EarlyPerform Selections Early

Yields Smaller Intermediate Results Direct Impact on Subsequent Join/Cartesian Prod.

Combine Selections with a Prior Cartesian Product into Combine Selections with a Prior Cartesian Product into a Theta or Equi Joina Theta or Equi Join Join is a Cheaper Operation

Combine (Cascade) Selections and ProjectionsCombine (Cascade) Selections and Projections

ABAB((BB (R)) (R)) ABAB(R)(R)

pp11 (( p p22 (R)) (R)) pp11 ^ p ^ p22 (R) (R)

This Results in One Pass Instead of Two over TableThis Results in One Pass Instead of Two over Table

Page 92: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.92

CSE4100

General Query Optimization StrategyGeneral Query Optimization Strategy Identify Common SubexpressionsIdentify Common Subexpressions

Compute Once and Store use Stored Version for Subsequent Times Often Useful When Views are Employed

Preprocess Data via Sorts and IndexesPreprocess Data via Sorts and Indexes Speeds up Searches and Joins by Limiting Scope

Evaluate and Assess Different Options Evaluate and Assess Different Options For Cartesian Product, Use Smaller Relation for

Comparison Use System Catalog (Meta-data) to Effect Order in

Query Execution Plan

Page 93: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.93

CSE4100

Relational Algebra TransformationsRelational Algebra Transformations Cascade of SelectionCascade of Selection

p1 ^ p2 ^ …^ pn(R)p1

(p2(...(pn

(R))...))

Commutativity of SelectionCommutativity of Selection

p1(p2

(R))p2(p1

(R))

p1 orp2(R )p1

(R p2(R)

Cascade of ProjectionCascade of Projection

A1,A2, … An(R)A1(A2(...(An(R))...))

A1(R) if A1 A2 ... An Commuting Selection with ProjectionCommuting Selection with Projection

A1,A2,...,An(p(R))p(A1,A2,...,An(R)

Page 94: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.94

CSE4100

Relational Algebra TransformationsRelational Algebra Transformations Commutativity of Theta Join and Cartesian ProductCommutativity of Theta Join and Cartesian Product

R A SS A R R SS R

Commuting Selection with Theta Join (Cartesian)Commuting Selection with Theta Join (Cartesian) p(A)(R S) p(A)(R)) S

A defined on R only

p(A)^p(B)(R S) p(A)(R)) p(B)(S))

(A defined on R, B defined on S) Also Holds for Theta Join as Well

Commuting Projection with Theta Join (Cartesian)Commuting Projection with Theta Join (Cartesian) C(R S) A(R) B(S) where AB=C A are Attributes in C for R and B are Attributes in C

for S

Page 95: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.95

CSE4100

Relational Algebra TransformationsRelational Algebra Transformations Commutativity of Set OperationsCommutativity of Set Operations

R S S R R S S R

Associativity of Set OperationsAssociativity of Set Operations (R S) T R S T) (R S) T R (S T) (R S) S R (S T) (R S) S R (S T)

Commuting Select with Set OperationsCommuting Select with Set Operations

p(Ai)(R T) p(Ai)(R) p(Ai)(T)

where Ai is defined on both R and T

p(Ai)(R T) p(Ai)(R) p(Ai)(T)

where Ai is defined on both R and T

Page 96: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.96

CSE4100

11. Commuting Projection with Union11. Commuting Projection with Union

C(R q(Aj,Bk) S) A’(R) q(Aj,Bk) B’(S)

C(R S) A’ (R) B’ (S)

where R[A] and S[B]

C = A' B' where A' A, B’ B12. Converting Selection/Cartesian Into Theta Join12. Converting Selection/Cartesian Into Theta Join

C (R S) R S

Relational Algebra TransformationsRelational Algebra Transformations

C

Page 97: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.97

CSE4100

ENAME

(DUR=12 OR DUR=24) AND

JNAME=“CAD/CAM” AND

ENAME= “J. DOE”

JNO

ENOP

W E

Canonical query tree at the end of query preprocessing phase

E(ENAME, ENO)P(JNO,JNAME)

W(ENO,PNO,DUR)

Heuristic Optimization: ExampleHeuristic Optimization: Example

Page 98: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.98

CSE4100 ENAME

DUR=12 OR DUR=24

JNAME=“CAD/CAM”

ENAME = “J. DOE”

JNO

ENOP

W E

Use cascading of selectionsrule to decompose selections

Heuristic Optimization– ExampleHeuristic Optimization– Example

Page 99: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.99

CSE4100

E

ENAME = "J. Doe"

JNO

ENO

P W

ENAME

DUR=12 OR DUR=24

JNAME=“CAD/CAM” Push selection downusing commutativity of selection over join

Heuristic Optimization– ExampleHeuristic Optimization– Example

Page 100: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.100

CSE4100

P

JNO

JNAME = "CAD/CAM"

E

ENAME = "J. Doe"

ENO

W

ENAME

DUR=12 OR DUR=24 Push selection downusing commutativity of selection over join

Heuristic Optimization–ExampleHeuristic Optimization–Example

Page 101: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.101

CSE4100

E

ENAME

ENAME = "J. Doe"

WP

JNO

ENO

JNAME = "CAD/CAM" DUR =12 DUR=24

Push selection down

Heuristic Optimization–ExampleHeuristic Optimization–Example

Page 102: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.102

CSE4100

E

ENAME

ENAME = "J. Doe"

WP

JNO

JNO,ENAME

ENO

JNAME = "CAD/CAM"

JNO

DUR =12 DUR=24

JNO,ENO

JNO,ENAMEDo early projection

Heuristic Optimization–ExampleHeuristic Optimization–Example

Page 103: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.103

CSE4100

E

ENAME

ENAME = "J. Doe"

W

P

JNO

JNO,ENAME

ENO

JNAME = "CAD/CAM"

JNO

DUR =12 DUR=24

JNO,ENO

JNO,ENAME

Identify subtrees thatcan be implemented in one algorithm

Heuristic Optimization–ExampleHeuristic Optimization–Example

Page 104: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.104

CSE4100

Books

Loans

Borrower

Borrower.Card_No = Loans.Card_No

X

X

Title

Date 1/1/88

Books.LC_No = Loans.LC_No

Heuristic Optimization: A Second ExampleHeuristic Optimization: A Second Example

Loans.LC_No,Loans.Card_No

Loans.LC_No

Borr.Card_No

Books.LC_No, Title

What is the Final Step? Combine Select and Cartesian Product

Result: Equijoins!

Page 105: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.105

CSE4100

Cost-Based OptimizationCost-Based Optimization Reduce Defined Cost of Executing QueriesReduce Defined Cost of Executing Queries What is Involved in the Cost of Executing a Query?What is Involved in the Cost of Executing a Query?

Access Cost to Secondary Storage Search for Data Block (Index) Read/Write Index and Data Blocks

Storage Cost Index and Data Blocks Intermediate Files

Computation Cost Query Planning - Optimization Effort Record Search, Sort, Merge Actual Transaction/Query Operations

Communications Cost Transfer of Results to the User

Page 106: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.106

CSE4100 Operation Complexity

SelectProject

(w/o duplicate elimination)O(n)

Project(with duplicate elimination)

GroupO(nlog n)

Join

Division

Set Operators

O(nlog n)

Cartesian Product O(n2)

Complexity of Relational OperationsComplexity of Relational Operations Assuming Assuming

Relations of Cardinality n

Sequential Scan of Data in each Relation

Complexity of Each Complexity of Each Operation is Operation is IndicatedIndicated

Avoid Cartesian Avoid Cartesian Product at All Costs!Product at All Costs!

Page 107: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.107

CSE4100

Cost-Based OptimizationCost-Based Optimization To Understand Cost-Based Operations, we Must Focus To Understand Cost-Based Operations, we Must Focus

on Implementation Strategy of:on Implementation Strategy of: Select Project Join

For Select and Project - There is a Fixed Cost that we For Select and Project - There is a Fixed Cost that we Must Live WithMust Live With

For JoinFor Join Implementation Strategy Different Join Strategies

Objective:Objective: Minimize the Number of Blocks Involved

Note that Cost-Based and Relational Algebra Heuristic Note that Cost-Based and Relational Algebra Heuristic Optimization Can Complement One AnotherOptimization Can Complement One Another

Page 108: CH10.1 CSE 4100 Compiler Concepts for Database Systems Prof. Steven A. Demurjian Computer Science & Engineering Department The University of Connecticut

CH10.108

CSE4100

Optimization SummaryOptimization Summary Most Systems Implement Only a Few StrategiesMost Systems Implement Only a Few Strategies The Number of Strategies that are Considered by Any The Number of Strategies that are Considered by Any

Query Optimizer is Limited Query Optimizer is Limited Some Systems Reduce the Number of Strategies by Some Systems Reduce the Number of Strategies by

Making a Heuristic Guess of Strategy for Each QueryMaking a Heuristic Guess of Strategy for Each Query The Optimizer Considers Every Possible Strategy,

but Terminates as Soon as it Determines the Cost is Greater than the Pre-chosen Strategy

Thus Only a Few Competing Strategies Require Full Analysis of the Cost

The Overhead of Query Optimization is Reduced Remember - Trade off in Optimization TimeRemember - Trade off in Optimization Time

For PL - Optimization is Pre-Execution (Compile) For DB - Optimization is Part of Execution (Run)