a structured approach to sql query design
DESCRIPTION
This document outlines a graphical design methodology for complex SQL queries. The approach is first outlined, and then demonstrated using a real SQL (custom) report within the Oracle Applications modules Order Management and Inventory. The background to the writing of the document is outlined below.I was asked to redesign an Oracle AR invoice print program, as part of a project to implement intercompany invoicing along with a move from one to four operating units. I found that the program had five separate versions, and each one had more than 30 queries, scattered over data model and field and report triggers. I produced a design based on having a single report, with a single query. The design was approved, and the new report was written and is now in production, where it runs about a hundred times faster than the old ones.The new program is much simpler overall than the old ones, but of course the single query is quite complex, and during reviews of the design document I was asked whether I could extend the section on the query. I had included an entity-relationship diagram for the query, but agreed that it was difficult to understand the structure purely from this diagram. I realised that I needed a design methodology that went beyond the ERD, and tried to find suitable techniques on the internet. I was not successful, and therefore, during a relatively quite period when our project was in UAT, I made my own attempt to fill this gap. I used a simpler reporting requirement that had arisen as a testing ground, and this document is the result.TRANSCRIPT
AIMTECHNICAL DESIGN
A Structured Approach to SQL Query Design
Team: Technology
Creation Date: 22 May 2009
Created By: Brendan Furey ([email protected])
Last Updated: 19 August 2009
Control: document.doc
Version: 1.1
Approvals:
Document Control
Change Record
Date Author Version Change Reference
22-May-2009 BP Furey 1.0 Initial
19-Aug-2009 BP Furey 1.1Entity Overview: Cardinality reversal (Account); Entity/Subtype Definitions: Added sizes; Join Sequences: Note on outer joins; Other: Minor changes
File Ref: document.doc iiDocument Control
Contents
Document Control..................................................................................................... ii
Change Record.................................................................................................. ii
Introduction..............................................................................................................4
Technical Overview..................................................................................................5
Design Process..................................................................................................5Implementation Notes........................................................................................6
Diagramming Tool...................................................................................................6Subtypes................................................................................................................6Attributes................................................................................................................6Notation..................................................................................................................6
Additional Advantages of Approach....................................................................7Performance Tuning................................................................................................7Documentation........................................................................................................7Package Design......................................................................................................7
Worked Example: COGS No Charge Report............................................................8
Requirement Summary......................................................................................8Entity Relationship Diagrams.............................................................................8
Entity Overview.......................................................................................................8Entity/Subtype Listings.....................................................................................11
Entity/Subtype Structure........................................................................................11Entity/Subtype Definitions......................................................................................12
Query Diagrams...............................................................................................13Query Structure....................................................................................................13Main Query...........................................................................................................14Transaction View..................................................................................................15
Join Sequences...............................................................................................16Notes....................................................................................................................16
Query Code.....................................................................................................16Text......................................................................................................................16Notes....................................................................................................................18
Issues.................................................................................................................... 20
Issues..............................................................................................................20
References............................................................................................................21
File Ref: document.doc iiiDocument Control
Introduction
SQL is a declarative language for manipulating data stored in a relational database. Oracle’s PL/SQL is a procedural extension intended to implement logic that cannot be performed directly in SQL. It’s generally accepted that developing software procedurally involves greater effort and results in more complex systems than using declarative languages; in the case of SQL, performance is usually also much better when implementing a requirement entirely in SQL. This leads to a Best Practice guideline, sometimes succinctly expressed (eg REF-1), as:
‘Do it in a single SQL statement if at all possible’
Unfortunately, this guideline appears to be followed surprisingly rarely, particularly in ERP environments. Often, in both batch programs and reports, a set of data that could be selected in a single query will instead be selected by a large and complex program with multiple small SQL queries scattered throughout. There are a couple of possible reasons that may explain why this is so:
With each major release, Oracle increases the power of SQL and its ability to do internally what previously had to be programmed, but the developer community can be slow to keep pace with advances
ERP systems in particular tend to have very complex, highly granular data models, owing to the need for generality. This makes for rather complex SQL, which can be daunting to develop without good design techniques. In practice SQL is hardly ever designed and the temptation is to design a procedural program with simpler embedded SQL statements
The purpose of this document is to describe a structured, graphical approach to the design of SQL queries that may be a useful way of handling the complexity without reverting to procedural design. It focuses on subquery structure and join orders, rather than on other areas such as grouping and aggregation, or design patterns. The author has used it to design complex queries with up to 48 table instances, and the approach is demonstrated using a real (rather simpler) example of a custom report within Oracle’s Order Management and Inventory modules (see REF-2 for Oracle’s table specifications).
File Ref: document.doc 4Worked Example: COGS No Charge Report
Technical Overview
Design Process
The approach is based on entity-relationship diagramming, but applied in a different way from its usage in database design. The main steps in the design process are:
Produce one or more entity-relationship diagrams that include all the physical entities required for the query
o Where necessary, follow a top-down approach using higher level entities to group related entities together, and break them down in secondary diagrams
o Use subtypes to show the logical structure as well as the physical, based on the query requirements (eg display Ship To and Bill To customers as distinct subtypes for an Invoice Print query)
Tabulate the entity and subtype structures
o Include definition of subtypes
o Map bottom-level entities to physical tables
Produce a query structure diagram, showing proposed subqueries, including inline views and each section of any unions
o Use notes to explain the reasoning behind the structure
Produce one or more entity-relationship diagrams for each subquery (including the main query)
o Where necessary, follow a top-down approach using higher level entities to group related entities together, and break them down in secondary diagrams
o Mark which entities are constraining, or possibly constraining
Define a route through each diagram that a query plan could reasonably take, marking with numbered arrows the sequence of entities visited
o Begin with a possible driving table, then pass to entities that are linked to entities already visited, favouring the most constraining entities
o The sequence represents the order in which the tables will be joined in the code, but need not be that followed by the SQL engine
o The join sequence will be a good starting point in analysing any performance problems that may occur
Tabulate the complete join sequence
o Group by subquery, and any entity groupings for convenience
File Ref: document.doc 5Worked Example: COGS No Charge Report
Implementation Notes
Diagramming Tool
It is important for clarity that that entities and links can be sized and positioned flexibly. This effectively means that diagrams need to be be manually constructed rather than generated, and Oracle Designer and similar tools do not appear suitable. We have used Microsoft Visio.
Subtypes
Subtypes are often used at the logical phase of database design to represent the partitioning of an entity into a number of subentities, followed by a physical implementation in one of a number of ways: for example, a Party in Oracle's customer model may be one of several types, including Organisation and Person, and this is physically implemented by a party_type column on the table. The concept is used here more generally and more dynamically, to represent a division of an entity into groups of records, according to any data conditions specified in the query. Subtypes are depicted as two or more entities within another entity (but take care to avoid confusing with distinct entities within an entity group, and may be nested. Subtypes within a query diagram normally correspond to distinct table instances.
Attributes
Attributes do not appear on the diagrams, as they are not necessary for our purposes and cause clutter and distortion of entities, reducing clarity.
Notation
The following two points refer to both ERDs and query diagrams.
Entities
o Rounded boxes
o Broken lines indicate a complex entity containing subentities
o Entities appearing within another solid-lined entity are subtypes
Relationships
o Straight lines between entities
o Perpendicular bar denotes referencing end
o Circle denotes optionality
o Triple ending denotes many end of many to one relationship
The remaining points apply to query diagrams only
Constraining entities
o Asterisk against the name
Join sequence
o Numbered arrows
File Ref: document.doc 6Worked Example: COGS No Charge Report
Additional Advantages of Approach
Performance Tuning
The design process followed here results in a logical join sequence. When the Cost Based Optimiser fails to find a good execution plan, the starting point in analysis is usually to compare its join order with what the developer would expect, and if necessary hints (such as LEADING) can be added to obtain a better plan. This tends to more of an issue with large queries.
Documentation
The design process described results in a document that makes large queries much easier to understand for support staff.
Package Design
The type of ERDs shown here can be used in designing package structures for maintenance of logical entities. For example, where Oracle’s customer model is used, the logical entities are usually at a higher level than Oracle’s physical model and maintenance procedures would correspond to the logical level represented on Entity-Relationship diagrams.
File Ref: document.doc 7Worked Example: COGS No Charge Report
Worked Example: COGS No Charge Report
Requirement Summary
This is a custom report based on Oracle Applications (11.5.10) Inventory and Order Management modules (the table definitions can be obtained from REF-2). It lists order lines that are sold at zero price, and includes the Inventory COGS (Cost of Goods Sold) account distributions in two categories (material and overhead). Briefly, the requirements are:
List order lines with zero unit price, showing COGS Material and Overhead Inventory costs (where they exist), along with Warehouse, Ship To and Item data
Order lines may be of type Configuration or Non-Configuration, and the latter are non-shippable, so do not have Inventory records, in which case print the Order Line records with zero for the COGS costs
Report driven by the dates of a GL period, applied to the Inventory records for Configuration Lines, and the Order Lines for Non-Configuration Lines
Entity Relationship Diagrams
Entity Overview
The diagram below gives an overview, showing how the main entity groups relate to each other, with the complex entities broken down subsequently. Broken lines denote complex entities (but don’t always come out in Word!).
File Ref: document.doc 8Worked Example: COGS No Charge Report
Entity/Subtype Listings
Entity/Subtype Structure
The table below shows the entity structure and the subtype structure where applicable. Italics denote complex entities referenced within others.
Entity 1 Entity 2 Entity 3 Subtype 1 Subtype 2GL Period
MTL TransactionLogicalPhysical
Account
MTL Transaction Account
GL AccountOverheadMaterialOther
WarehouseOrder Header
LineOrder Line
ConfigurationModelComponent
Non-ConfigurationLine Type
Ship To
Ship To Site Use
AddressCustomer SiteParty SiteLocation
CustomerCustomer AccountParty
Item
Inventory ItemItem Category
Product Line Category
Category
Category SetProduct LineOther
File Ref: document.doc 11Worked Example: COGS No Charge Report
Entity/Subtype Definitions
The table below displays all the bottom level entities, with the tables that implement them, and the subtype conditions where applicable.
Entity TableSize in 1000’s
Subtype Condition
GL Period gl_periods 0.4MTL Transaction Account
mtl_transaction_accounts 54,943
GL Account gl_code_combinations 117
OverheadCondition on column values here and in linked MTL Transaction Account record
MaterialCondition on column values here and in linked MTL Transaction Account record
Other Other
MTL Transaction mtl_material_transactions 26,782
Physical Has child transaction
LogicalHas parent transaction (these arise from intercompany orders using virtual warehouses)
Warehouse mtl_parameters 0.1Order Header oe_order_headers_all 822
Order Line oe_order_lines_all 3,230
ConfigurationLine of type Model with linked Component Lines
Non-Configuration
Line not of type Model or Component
Model/Component
Component Lines are linked to a Model Line
Line Type oe_transaction_types_tl 0.1Ship To Site Use hz_cust_site_uses_all 489Customer Site hz_cust_acct_sites_all 327Party Site hz_party_sites 216Location hz_locations 145Customer Account hz_cust_accounts 54Party hz_parties 409Inventory Item mtl_system_items_b 1,565Item Category mtl_item_categories 34,294Category mtl_categories 64
Category Set mtl_category_sets 0.1Product Line Set name = ‘PLINE’Other Other name
File Ref: document.doc 12Worked Example: COGS No Charge Report
Query Diagrams
Query Structure
Notes
The query is driven by two different sets of source records, requiring the inner union
The union goes into an inline view in order to avoid duplicating all the the other tables for each section
File Ref: document.doc 13Worked Example: COGS No Charge Report
Transaction View
Notes
Observe that we have shown the GL Account entity without the subtypes that appeared in the ERD. This represents a design decision not to link to separate instances of the table for the subtypes, but instead link to a single instance and use the row-column pivotting technique to obtain the two COGS amounts on a single line. See the notes section after the query code for an explanation
We have used a different subtyping for our MTL Transaction from that in the ERD. The record linked to may be a logical or a physical transaction, and we link from it to its parent, if it exists (logical case)
File Ref: document.doc 15Worked Example: COGS No Charge Report
Join Sequences
The table below shows the (possible) join sequences by subquery
Entity 1 Entity 2 Entity 3 TableConfiguration
GL Period gl_periods
AccountMTL Transaction Account
mtl_transaction_accounts
GL Account gl_code_combinations
MTL TransactionAny mtl_material_transactionsPhysical (if previous Logical)
mtl_material_transactions (+)
Order Line (Configuration)Component oe_order_lines_allModel oe_order_lines_all
Line Type oe_transaction_types_tlNon-Configuration
GL Period gl_periodsOrder Line (Non-Configuration) oe_order_lines_allLine Type oe_transaction_types_tl
Main QueryTransaction (Inline View)
Item
Inventory Item mtl_system_items_bItem Category mtl_item_categories
Product Line CategoryCategory mtl_categoriesCategory Set (Product Line)
mtl_category_sets
Order Header oe_order_headers_all
Ship To
Ship To Site Use hz_cust_site_uses_all
AddressCustomer Site hz_cust_acct_sites_allParty Site hz_party_sitesLocation hz_locations
CustomerCustomer Account hz_cust_accountsParty hz_parties
Warehouse mtl_parameters
Notes
Outer Joins
Outer joins are a frequent source of errors in SQL, either by the join being incorrectly specified as outer (or inner), or by the outer join syntax being incorrectly implemented (usually by including the (+) on only some of the relevant clauses). ANSI join syntax (see notes on next section) makes it harder to get the implementation wrong. Specification errors will be less likely if the join type is part of the design. Outer joins are indicated by the same symbol as in Oracle native SQL - (+) - in the table above.
Query Code
TextSELECT /*+ LEADING (ilv) USE_NL (ooh) */
war.organization_code orgcode,
File Ref: document.doc 16Worked Example: COGS No Charge Report
loc.country,ooh.order_number order_num,ilv.line_number line_num,ilv.line_type,ilv.dept,ilv.reason,ooh.cust_po_number ponum,Substr(par.party_name,1,25) cust_name,ilv.shipped_quantity qty,Nvl (ilv.material*ilv.shipped_quantity, 0) extd_mat,Nvl (ilv.OH*ilv.shipped_quantity, 0) extd_ovh,msi.segment1 partnum, cat.segment1 prodline, ilv.period_name period
FROM (SELECT /*+ LEADING (per mta gcc trx trx_p ool_cpt ool_mdl lty) USE_NL (gcc trx trx_p ool_cpt ool_mdl lty) */
ool_mdl.header_id,ool_mdl.line_number,ool_mdl.inventory_item_id,ool_mdl.ship_to_org_id,ool_cpt.shipped_quantity,Max (CASE WHEN mta.cost_element_id = 1 OR gcc.segment4 IN ('1360', '1361') THEN
mta.rate_or_amount END) material,Max (CASE WHEN mta.cost_element_id = 2 OR gcc.segment4 IN ('1330', '1331') THEN
mta.rate_or_amount END) OH,lty.name line_type,ool_mdl.attribute10 dept,ool_mdl.attribute9 reason,mta.organization_id,per.period_name
FROM gl_periods per,mtl_transaction_accounts mta,gl_code_combinations gcc,mtl_material_transactions trx,mtl_material_transactions trx_p,oe_order_lines_all ool_cpt,oe_order_lines_all ool_mdl,oe_transaction_types_tl lty
WHERE mta.transaction_date BETWEEN per.start_date AND per.end_date AND per.period_name = '&&1' AND per.period_type = '1' AND gcc.code_combination_id = mta.reference_account AND trx.transaction_id = mta.transaction_id AND trx_p.transaction_id (+) = trx.parent_transaction_id AND ool_cpt.line_id = Nvl (trx.trx_source_line_id, trx_p.trx_source_line_id) AND ool_mdl.line_id = ool_cpt.link_to_line_id AND lty.transaction_type_id = ool_mdl.line_type_id AND mta.transaction_source_type_id = 2 AND mta.accounting_line_type = 1 AND (mta.cost_element_id IN (1, 2) OR
gcc.segment4 IN ('1360', '1361', '1330', '1331')) AND ool_mdl.unit_selling_price = 0 AND Nvl (ool_mdl.attribute9, '&&2') BETWEEN '&&2' AND '&&3' AND lty.name BETWEEN Nvl ('&&4', 'BO') AND Nvl ('&&5', 'US') AND lty.name <> 'TO' GROUP BY ool_mdl.header_id,
ool_mdl.line_number,ool_mdl.inventory_item_id,ool_mdl.ship_to_org_id,ool_cpt.shipped_quantity,lty.name,ool_mdl.attribute10,ool_mdl.attribute9,mta.organization_id,per.period_name
UNIONSELECT /*+ LEADING (per ool lty) USE_NL (lty) */
ool.header_id,ool.line_number,ool.inventory_item_id,ool.ship_to_org_id,Nvl (ool.ordered_quantity, 0),0,0,lty.name,ool.attribute10,
File Ref: document.doc 17Worked Example: COGS No Charge Report
ool.attribute9,ool.ship_from_org_id,per.period_name
FROM gl_periods per,oe_order_lines_all ool,oe_transaction_types_tl lty
WHERE ool.request_date BETWEEN per.start_date AND per.end_date AND per.period_name = '&&1' AND per.period_type = '1' AND lty.transaction_type_id = ool.line_type_id AND ool.unit_selling_price = 0 AND lty.name BETWEEN Nvl ('&&4', 'BO') AND Nvl ('&&5', 'TO') AND lty.name IN ('BO', 'TO') AND ool.link_to_line_id IS NULL AND ( (ool.top_model_line_id IS NOT NULL AND
ool.top_model_line_id != ool.line_id) OR (ool.top_model_line_id IS NULL) )
) ilv,mtl_system_items msi, mtl_item_categories mic, mtl_categories cat,mtl_category_sets mcs,oe_order_headers_all ooh,hz_cust_site_uses_all csu,hz_cust_acct_sites_all sit,hz_party_sites pst,hz_locations loc,hz_parties par,hz_cust_accounts cus,mtl_parameters war
WHERE msi.inventory_item_id = ilv.inventory_item_id AND msi.organization_id = ilv.organization_id AND mic.inventory_item_id = msi.inventory_item_id AND mic.organization_id = ilv.organization_id AND mic.category_set_id = mcs.category_set_id AND mcs.category_set_name = 'PLINE' AND cat.category_id = mic.category_id AND ooh.header_id = ilv.header_id AND csu.site_use_id = ilv.ship_to_org_id AND sit.cust_acct_site_id = csu.cust_acct_site_id AND pst.party_site_id = sit.party_site_id AND loc.location_id = pst.location_id AND cus.cust_account_id = sit.cust_account_id AND par.party_id = cus.party_id AND war.organization_id = ilv.organization_id ORDER BY 1, 6, 2, 3, 4
Notes
Row-Column Pivotting
The GL Account is regarded as having three subtypes for the purpose of this query. The query needs to bring back two of them for a given transaction, and display cost values in two columns corresponding to the subtypes, but one or the other may be missing. We prefer to avoid the complications and likely inefficiency of attempting to achieve this by outer-joining to two instances and instead use a row-column pivotting method, which is a useful general purpose technique that goes as follows (let’s say we have n columns, COL_1-COL_n whose values are obtained by expressions EXPRESSION_1- EXPRESSION_n and corresponding Where conditions CONDITION_1- CONDITION_n):
Join the table once for all conditions corresponding to the columns
o WHERE (CONDITION_1 OR … CONDITION_n)
Group by all columns except COL_1-COL_n
Add lines to Select list for i = 1 to n:
o Max (CASE WHEN CONDITION_i THEN EXPRESSION_i END) COL_i
File Ref: document.doc 18Worked Example: COGS No Charge Report
In Oracle 11g, there is a PIVOT clause native to SQL.
Hints
The Explain Plan for the query was found to favour hash joins, and poor performance was obtained. As a result a LEADING hint was added to each subquery, giving the preferred join order, following directly from the design sequences, and where necessary USE_NL hints were added to ensure nested loop joins. This gave much improved performance.
ANSI SQL
We would prefer to use ANSI join syntax, but cannot because the version of Oracle 10g we are using (10g 10.2.0.3.0 ) has a bug that causes some ANSI queries to fail spuriously with ORA-01445
File Ref: document.doc 19Worked Example: COGS No Charge Report
Issues
Issues
# Issue Description Note if closed1 ANSI join syntax Oracle bug, see above
File Ref: document.doc 20Worked Example: COGS No Charge Report
ReferencesREF Document Location
REF-1 Oracle, Ask Tom, ‘Considering SQL as a Service’http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:672724700346558185
REF-2 Oracle, eTRM, R11.5.10https://etrm.oracle.com/pls/trm11510/etrm_search.search
File Ref: document.doc 21Worked Example: COGS No Charge Report