a query translation scheme for rapid implementation of wrappers presented by preetham swaminathan...

29
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman

Upload: piers-lloyd

Post on 13-Jan-2016

230 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

A Query Translation Scheme for Rapid Implementation of

Wrappers

Presented By

Preetham Swaminathan

03/22/2007

Yannis Papakonstantinou, Ashish Gupta, Hector Garcia-Molina, Jeffery Ullman

Page 2: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Introduction

• As part of the TSIMMIS project a lot of hard coded wrappers have been developed for a variety of sources including legacy systems.

• Some Observations– Only small part of code deals with access details of

source– Lot of code deals with communication, buffering etc.– Or code implements query and data transformation

that can be expressed in a high level declarative fashion.

Page 3: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Introduction

• Based on observations Wrapper implementation toolkit for rapid wrapper building developed.

• Toolkit contains– Library of commonly used functions– Facility to translate queries into source specific

commands and queries.– Translating results into a model useful to the

application.

• Main focus on the Query translation component of toolkit. (Converter)

Page 4: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Converter

• Converter – Query translation component of the toolkit.

• An implementor gives converter a set of templates.– These templates describe queries accepted by

wrapper.– If application query matches template implementer

provides an action.– The action is executed to produce native query for the

source which answers the query.

Page 5: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Example

• Consider data source that can only do selections on attribute dept.

• Source does not understand the notion of projecting attributes.

• Template describing the sourceselect * from $X where $X.dept = ‘toy’

• The following query does not match this template because it consists of a projection.select emp.name from emp where emp.dept=‘toy’

Page 6: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Example

• The wrapper could process the above query as follows – Transform the query into one without a projection.– Perform a projection on the result of the query – also

known as process of filtering.

• Wrapper toolkit can handle this type of query transformation.– Convertor not only generates native queries for

source but also filters describing additional processing on the results.

Page 7: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Converter

• Converters in the toolkit targets MSL query language.

• MSL is logic based language for simple object oriented data model called OEM.

• Converter is configured with templates written in QDTL.

• Each template is associated with an action.• Converter takes as input MSL query and

generates – Commands for source and – Filter to be applied to the results.

Page 8: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Converter

• Converter will process– Directly supported queries – queries that

syntactically match template.– Logically supported queries – Indirectly supported queries – can be

processed as a combination of a direct query and a filter.

Page 9: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

OEM Model

• OEM stands for Object Exchange model.• OEM does not support classes, methods and

inheritance.• Classes and methods can be emulated.• Example:

<ob1 person {sub1,sub2,sub3,sub4,sub5}>

<sub1 last_name, ‘Smith’>

<sub2 first_name, ‘John’>

<sub3 role , ‘faculty’>

<sub4 department, ‘CS’>

<sub5 telephone, ‘415-514-1292’>

Page 10: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

OEM Model

• At each source top level OEM objects are defined.– They provide entry points into object structure.– Sub-objects can be requested as explained below

using the following MSL query.(Q1) *P:-<P person {<L last_name ‘Smith’>}>

• Tail is of form <object id label value>

• Matching – When field is a constant then pattern binds only with

objects that have same constant value– When field is a variable the pattern can bind with any

OEM object.

Page 11: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

A Detailed Query Translation Example

• Build a wrapper for a university “lookup” facility that contains information about employees and students.

• Accessed from command line of computers and offers limited query capabilities.– Can return only the full records of persons

including all fields like firstname, lastname and telephone.

– No way for the user to retrieve just one field.

Page 12: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Only queries that are accepted are– Retrieve person records by specifying last

name.(L2) lookup –ln Smith

– Retrieve person records by specifying first and last name.(L3) lookup –ln Smith –fn John

– Retrieve all person records(L4) lookup

Page 13: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Using Query description translation language (QDTL) the description for lookup facility can be written as below.(D1)(QT1.1) Query ::= *O:-<O person {<lastname $LN>}>(QT1.2) Query ::= *O:-<O person {<lastname $LN>

<firstname $FN>}>(QT1.3) Query ::= *O:-<O person V>

• Identifiers preceded by $ are constant place holders

• Upper case identifiers are variable place holders.

Page 14: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Each template describes many more queries than those that match syntactically.

• Each template describes following classes of queries.– Directly supported queries.– Logically supported queries.– Indirectly supported queries.

Page 15: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Directly Supported Queries– A query q is directly supported by a template t

if q can be derived by substituting the constant placeholders of t by constants and the variables of t by variables.

– *P:-<P person {<last_name ‘Smith’>}> is directly supported by template QT1.1 by substituting O with P and $LN with ‘Smith’.

Page 16: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Logically supported queries– A query q is logically supported by a template t if q is

logically equivalent to some query q` directly supported by t .

*O:-<O person {<first_name ‘John’> <last_name ‘Smith’>}>*O:-<O person {<last_name ‘Smith’> <first_name ‘John’>}>*O:-<O person {<LO last_name ‘Smith’>}>

AND <O person {<LO L V> <first_name ‘John’>}>– All these queries are equivalent to

*O:-<O person {<first_name ‘John’>

<last_name ‘Smith’>}> (supported by QT1.2)

Page 17: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Indirectly supported queries– A query q is indirectly supported by template t

if q can be broken down into a directly supported query and then filter is applied on the results.

(Q6) *Q:-<Q person {<last_name ‘Smith’>

<role ‘student’>}>

– The above query is not logically supported by any templates in the description.

Page 18: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Query Translation

• Converter realizes that the answer to the following query contains answers to the original query (subset of the following query)(Q7) *Q:-<Q person {<last_name ‘Smith’>}

• Thus the converter matches Q6 to template QT1.1 as if it were Q7 binding $LN to ‘Smith’ and generates the filter*O:-<O person {<role ‘Student’>}>

• The filter is an MSL query that is applied to the result of Q7 to produce the result of Q6

Page 19: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Native Query Formulation

(D2)

(QT2.1) Query::=*O:-<O person {<last_name $LN>}>

(AC2.1) {sprintf(lookup_query, ’lookup –ln %s’, $LN);}

(QT2.2) Query::=*O:-<O person{<last_name $LN>

<first_name $FN>}>

(AC2.2){sprintf(lookup_query, ‘lookup –ln %s –fn %s’, $LN,$FN);}

(QT2.3) Query::=*O:-<O person V>

(AC2.3) {sprintf(lookup_query, ‘lookup’);}

Page 20: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Non-terminals(D4) /* A description with nonterminals */(QT4.1) Query ::= *OP :- <OP person {__OptLN __OptFN __OptRole}>

/*Query Template*/

(NT4.2) __OptLN ::= <last name $LN> /*Nonterminal template*/

(NT4.3) __OptLN ::= /* empty nonterminal template*/

(NT4.4) __OptFN ::= <first name $FN>

(NT4.5) __OptFN ::= /* empty */

(NT4.6) __OptRole ::= <role $R>

(NT4.7) __OptRole ::= /* empty */

Page 21: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Nonterminals - Actions(D5) (QT5.1) Query ::= *OP :- <OP person {_OptLN _OptFN

_OptRole}>(AC5.1) {sprintf(lookup query, 'lookup %s %s %s', $ _OptLN,

$ _OptFN, $ _OptRole)} ;(NT5.2) _OptLN ::= <last name $LN>(AC5.2) {sprintf($_OptLN,'-ln %s',$LN);}(NT5.3) _OptLN ::=(AC5.3) {$_OptLN = '';}(NT5.4) _OptFN ::= <first name $FN>(AC5.4) {sprintf($ _OptFN, '-fn %s', $FN);}(NT5.5) _OptFN ::=(AC5.5) {$_OptFN = '';}(NT5.6) _OptRole ::= <role $R>(AC5.6) {sprintf($_OptRole,'-role %s',$R);}(NT5.7) _OptRole ::=(AC5.7) {$_OptRole = '';}

Page 22: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Wrapper Architecture

• Wrapper Consists of – Implementer

• provides the driver that has the primary control of query processing

• Provides the QDTL description for the converter• Provides the Data Extraction (DEX) template for

the extractor component of the toolkit.

– Converter– Driver

Page 23: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Wrapper Architecture

Page 24: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Wrapper Architecture

• Wrappers generated with the toolkit behave as server in a client server architecture.

• Clients use client support library to issue queries and receive OEM results.

• The server support library component of the toolkit receives queries and sends it to driver component for processing.

• Driver invokes the converter which finds a query that supports the input query and returns native queries.

Page 25: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Wrapper Architecture

• Driver submits the native queries to information source and receives result as OEM objects.

• If filter was generated during processing the driver passes the OEM result and the filter to the filter processor.

• Data Extractor (DEX) is used to parse the result and identify required data.

• DEX is configured with a description of source output and what part of source output needs to be extracted.

Page 26: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Correspondence of OEM to Relational Models

• OEM objects are represented relationally by flattening them into tuples of 3 relations top, object and member.

• OEM objects can be converted using a few straight forward rules.– For an object o with object id oid, label l and

atomic value v the tuple can be written asobject(oid,l,v)

– If o is a set object then the tuple becomesobject(oid,l,set)

Page 27: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

OEM to SQL

– If o has sub objects oi where 1 ≤ i ≤ n identified by oid then we introduce tuple member(oid,oidi)

– Finally if o is a top level object defined by oid then we introduce tupletop(oid)

– Relational representation of MSL queries is obtained by querying the top, object and member relations that represent the object structure referenced in the query.

Page 28: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Example

• Consider the query *O:-<O person {<LM last_name ‘Smith’>}>

• The above MSL query can be written as the following datalog query.

answer(O):- top(O), object(O,person,set), member(O,LM), object(LM, last_name, ’Smith’)

• Paper contains an algorithm that for a given MSL finds supporting queries from QDTL and if required creates a filter to be applied to OEM result objects.

Page 29: A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector

Conclusions

• Toolkit that facilitates implementation of wrappers developed.

• Heart of toolkit is the converter that maps incoming queries into native commands of the source.

• Converter provides translation flexibility of systems like Yacc, but gives substantially more power (translates a wider class of queries)