page 1 querying relational databases without explicit joins ramon lawrence, ken barker querying...

32
Page 1 Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker R am on Law rence K en B arker U niversity ofIow a U niversity ofC algary [email protected] [email protected] Querying Relational Databases without Explicit Joins

Post on 19-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Ramon Lawrence Ken BarkerUniversity of Iowa University of Calgary

[email protected] [email protected]

Querying Relational Databaseswithout Explicit Joins

Querying Relational Databaseswithout Explicit Joins

Page 2

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Outline

Introduction, Motivation, and Background What is wrong with SQL? How can we replace SQL?

Querying by Context using Semantic Names An example query

Query Architecture Term dictionary, X-Specs, query processor

Query Processor Algorithms Field/table mapping discovery, join selection GUI extensions to simplify query construction

Future work and conclusions

Page 3

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Introduction and Motivation

Despite improvements in “core” database technology, advances in database query languages have not kept pace.

SQL is still the fundamental basis for most access languages and tools.

SQL is often difficult to use for beginning users or when formulating queries on large and complex database schema.

Thus, the motivation for designing a high-level query language for users that is also backwards compatible with SQL.

Page 4

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

What is wrong with SQL?

There is nothing wrong with SQL. However, SQL is not a simple query language for many reasons:

Querying by structure does not hide complexities introduced due to database normalization.

Structures (fields and tables) may be assigned poor names that do not adequately describe their semantics.

Notion of a “join” is confusing for beginning users especially when multiple joins are present.

SQL forces structural access which does not provide logical query transparency and restricts logical schema evolution.

Page 5

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

SQL as a Standard

SQL is an universally-accepted standard supported by all major relational DB vendors.

Object-oriented query languages are also developed similar to SQL.

Most vendors provide graphical query tools that simplify SQL construction.

SQL is “relatively” standardized across database systems and platforms.

Conclusion: Any new query language should be backwards compatible with SQL to guarantee its usefulness.

However, it is desirable to totally hide SQL formulation at the user-level.

Page 6

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Previous Work

There have been several research systems and prototypes in the area. They fall in 3 categories:

1) Graphical Query Tools and Models Query by example (QBE), database query tools

2) Query by Word Phrases SemQL (WordNet) Information retrieval techniques, web searching

3) User-directed Querying Kaleidoscope (logic)

Page 7

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Query Issues

A desirable query language should: Allow the user to be able to systematically and

deterministically produce results. Provide a suitable formalism for representing and

querying data. Hide the naming and structures of the database by

providing logical and physical access transparency. Allow the user to browse the contents of the

database to determine query concepts. Provide a graphical query interface. Be generally applicable to multiple database systems

and/or data models.

Page 8

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Querying by Context (QBC)

Querying by context (QBC) is a methodology for querying relational databases by semantics.

Querying is performed by selecting semantic names that represent query concepts.

Semantic names are assigned once by the DBA to describe database semantics.

Users query the database by selecting semantic names from the context view.

The context view contains all concepts present in the database with appropriate semantic names.

A query processor maps the user’s selections and criteria to an actual SQL query.

Page 9

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Querying by Context Example

Order Database SchemaTables Fields

Categories CategoryID, CategoryNameCustomers CustomerID, CompanyNameEmployees EmployeeID, LastName, FirstNameOrderDetails OrderID, ProductID, UnitPrice, QuantityOrder OrderID, CustomerID, EmployeeID, OrderDate,

ShipviaProducts ProductID, ProductName, SupplierID, CategoryIDShippers ShipperID, CompanyNameSuppliers SupplierID, CompanyName

Page 10

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Querying by Context Example (2)

User query: Retrieve all orders and customer names where an order contains a product from category ‘Produce’.

SQL: SELECT O.OrderID, CU.CompanyName

FROM [Categories] AS C, [Customers] AS CU, [Orders] AS O, [Products] AS P, [Order Details] AS OD

WHERE C.CategoryName = 'PRODUCE’ AND (C.CategoryID = P.CategoryID) AND (P.ProductID = OD.ProductID) AND (OD.OrderID = O.OrderID) AND (O.CustomerID = CU.CustomerID)

Page 13

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

The query architecture consists of three separate processes:

Capture process: independently extracts database schema information and metadata into a XML document called a X-Spec.

Integration process: combines X-Spec(s) into a structurally-neutral hierarchy of database concepts called an integrated context view.

Query process: allows the user to formulate queries on the integrated view that are mapped by the query processor to structural queries (SQL) and the results are integrated and formatted.

Query by Context Processes

Page 14

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

The Integrated Context View

Before semantic querying can begin, the DBA must assign semantic names to each field and table in the database.

Semantic names and schema mappings are stored in an XML document called an X-Spec.

Integration of one or more X-Specs produces a structurally-neutral hierarchy of concepts called an integrated context view.

We will only look at querying one database.

A context view (CV) is a valid Universal Relation. Each field is assigned a semantic name which uniquely

identifies its semantic connotation.

Query Architecture

Architecture Components: 1) Integrated Context View

• user’s view of integration 2) X-Spec Editor

• stores schema & metadata• uses XML

3) Standard Dictionary• terms to express semantics

4) Integration Algorithm• combines X-Specs into integrated context view

5) Query Processor• accepts query on view• determines data source mappings and joins• executes queries and formats results

Local Transactions

X-Spec

X-Spec Editor

Standard Dictionary

Integration Algorithm

Integrated Context View

Query Processor and ODBC Manager

Database

Client

Subtransactions

Client

Global Query Layer

Database

X-Spec

Page 16

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

What is a semantic name? A semantic name is a universal, semantic

identifier in a domain. Similar to a field name in the Universal Relation. Semantics are guaranteed unique by construction.

Users query based on the semantic names provided by the DBA.

Systematic construction of semantic names allows:

The system to insert join conditions into query. The context view to be organized hierarchically by

semantic concept to reduce burden on user.

Page 17

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

User Query Formulation

Users query the context view by: browsing and selecting semantic names from the

context view for selection/projection specifying ordering and selection criteria

Users do not have to specify joins between concepts:

Some joins are implicit by virtue of hierarchical semantic names. (e.g. [Order;Product] of [Order]).

Joins are automatically inserted when required if not directly specified by user.

User can explicitly direct joins by browsing CV which automatically connects concepts.

Page 18

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Query Processor

Query processor translates from semantic query to SQL which requires:

Mapping from semantic names to field and table names using supplied X-Spec mappings. One challenge is determining a particular field mapping to use

if multiple are present. (e.g. [Order] Id)

Insertion of joins to preserve user query semantics. If user specifies no relationship between concepts, chose

shortest semantic join path, otherwise select shortest physical join path.

If user specifies (some) join semantics, then join determination is simplified. System does not prevent user from specifying complex, non-

standard joins like outer joins.

Page 19

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Selecting Joins Using Join Graphs Determining joins to use is simplified by defining

a join graph. A join graph is an undirected graph where:

Each node Ni is a table in the database. There is a link from node Ni to node Nj if there is a join

between the two tables.

A join path is a sequence of joins connecting two nodes in the graph.

A join tree is a set of joins connecting two or more nodes.

A join matrix M stores the shortest join paths between any two nodes (tables).

Page 20

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Join Graph for Order Database

O rders

O rderD etails

Em p loyeesProducts

C ustom ers

C ategoriesProducts

Products

Suppliers

Shippers

1 N

N

1

1N

1N1 N

1

N

1

N

Products

Page 21

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Join Discovery Results

Join discovery in a database with a connected, acyclic join graph is simple as there exists only one join tree for any set of tables.

For a cyclic join graph, there may exist more than one join tree for a set of tables and each tree may have different semantics.

Users may eliminate possibilities when browsing the context view or using direct specification.

Otherwise, system selects shortest join path first considering semantic names, and then physical join paths.

Page 22

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Reducing Join Ambiguity Join ambiguity can be reduced during query

formulation without the user’s knowledge. Example: Retrieve all orders and customer names where

an order contains a product in category ‘Produce’.

Semantic names selected: [Order] Id, [Customer] Name, [Category] Name no join ambiguity if acyclic graph, however…

Semantic names with no ambiguity: [Order] Id, [Order;Customer] Name,

[Order;Product;Category] Name

Names indicate path from starting context “Order”.

Page 23

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Reducing Join Ambiguity (2) From a user interface perspective, the second

set of semantic names can be determined if: When the user browses the [Order] context, the

[Customer] name information is merged into the [Order] context using a hidden connection on [Order] Id.

Similarly, connect [Product] and [Category] information through [Order] Id (Order to Product) and [Product] Id (Product to Category).

As the user browses the view and uses these connections, this information can be exploited to determine appropriate join paths.

Page 24

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Query by Context Discussion

Important benefits of querying by context: System table and field names are not presented to

the user who queries based on semantic names. Database structure is not shown to the user. Field and table mappings are automatically

determined based on X-Spec information. Join conditions are inserted as needed when

available to join tables and will preserve user’s query semantics as specified.

Structural neutrality of context view allows QBC to be extended to non-relational databases and be used as a query language for integrated databases.

Page 25

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Conclusions

User querying can be simplified by semantic naming of schema constructs (by the DBA) that hierarchically organizes concepts into a view.

Query by context provides logical query transparency that is suited for databases with schema evolution or integrated systems.

Users are able to transparently query integrated systems by concept instead of structure.

Handling join ambiguity is an important component in mapping to SQL.

Page 26

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

Future Work

Continuing to refine a prototype of the system called Unity.

A comparison study of query by context versus traditional SQL, database query tools, and query by example.

The query processor is being extended to resolve more complex queries.

Joins on non-standard keys, joins across databases, etc.

Page 27

Querying Relational Databases without Explicit Joins Ramon Lawrence, Ken Barker

References Publications:

Unity - A Database Integration Tool, R. Lawrence and K. Barker, TRLabs Emerging Technology Bulletin, January 2000.

Multidatabase Querying by Context, R. Lawrence and K. Barker, DataSem2000, pg 127-136, Oct. 2000.

Using Unity to Semi-Automatically Integrate Relational Schema, Demonstration to appear at ICDE’2002.

Integrating Relational Database Schemas using a Standardized Dictionary, SAC’2001 - ACM Symposium on Applied Computing, March, 2001.

Further Information: http://www.cs.uiowa.edu/~rlawrenc/