combining keyword search and forms for ad hoc querying of databases

36
Combining Keyword Search and Forms for Ad Hoc Querying of Databases (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences Department University of Wisconsin-Madison {ericc, baid, xchai, anhai, naughton} @cs.wisc.edu

Upload: aleta

Post on 31-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Combining Keyword Search and Forms for Ad Hoc Querying of Databases. (Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton) Computer Sciences Department University of Wisconsin-Madison {ericc, baid, xchai, anhai, naughton} @cs.wisc.edu. Contents. Motivation Query Forms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Combining Keyword Search and Forms for Ad Hoc

Querying of Databases(Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton)

Computer Sciences DepartmentUniversity of Wisconsin-Madison

{ericc, baid, xchai, anhai, naughton} @cs.wisc.edu

Page 2: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Contents

• Motivation• Query Forms• Generating forms• Keyword Search for Forms• Displaying Returned Forms• Experimental Analysis• Related Work and References

Page 3: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Traditional Access Methods for Databases

• Advantages: high-quality results

• Disadvantages:– Query languages: long

learning curves– Schemas: Complex

Small user population “The usability of a database is as important as its capability”

Relational/XML Databases are structured or semi-structured, with rich meta-data Typically accessed by structured query languages: SQL

Page 4: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Motivation

Information discovery in databases requires: Knowledge of schema Knowledge of a query language (Example: SQL)

Challenges?• Hard for users uncomfortable with a formal

query language.

Page 5: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

MotivationWhat is the solution?Form Based Interfaces and Keyword Search

Approach • User submits keyword query• System returns ranked list of relevant forms• User selects one of forms and builds structured

query

Page 6: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Relational Schema of DBLife

Entity tables:

person(id, name, homepage, title, group,organization, country)

publication(id, name, booktitle, year, pages, cites, clink, link)

topic(id, name)

organization(id, name)

conference(id, name)

Page 7: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Relationship Tables

related_people(rid, pid1, pid2, strength)

related_topic(rid, pid, tid, strength)

related_organization(rid, pid, oid, strength)

give_tutorial(rid, pid, cid)

give_conf_talk(rid, pid, cid)

give_org_talk(rid, pid, oid)

serve_conf(rid, pid, cid, assignment)

write_pub(rid, pid, pub_id, position)

co_author(rid, pid1, pid2, strength)

Page 8: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Query FormsInterface for a query template.

Example:Completed form over the person relation of DBLife.

Page 9: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

• Query represented is

SELECT * FROM person WHERE organization = ‘Microsoft Research’

• General template for the above form

SELECT * FROM person WHERE name op value AND homepage op value AND title op value AND group op value AND organization op value AND country op value

Page 10: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

How to generate forms?

Step 1: Specify a subset of SQL as the target language

to implement the queries supported by forms. SQL’

Page 11: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

SQL’:

Let B = (SELECT select-list

FROM from-list

WHERE qualification

[GROUP BY grouping-list

HAVING group-qualification] UNION | INTERSECT)

Note: Nested queries are not allowed in FROM and WHERE clauses.

Page 12: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Step 2: Determine set of skeleton templates specifying the main

clauses and join conditions based on chosen subset of SQL and SD.

Let Ri be a relation following a relation schema Si S∈ D

Case 1: If Ri does not reference other relations with foreign keys.

SELECT * FROM Ri WHERE predicate-list

Case 2: If Ri references other relations with foreign keys.

SELECT * FROM <Ri and relations referenced>

WHERE < Join relations and for each attribute have “attr op value” predicate >

Page 13: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Example:

Relation : Give_Tutorial give_tutorial(rid,pid,cid)Relations Referenced: Person and Conferenceperson(id,name,homepage,title,group,organization,country)conference(id,name)

Skeleton Template:

SELECT *FROM give_tutorial t, person p, conference c WHERE t.pid = p.id AND t.cid = c.id AND p.name op expr AND … AND c.name op expr

Page 14: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Step 3: Finalize templates by modifying skeleton

templates based on form specificity. How specific or general we want the forms to

be?Form Specificity

Form Complexity Data Specificity

Page 15: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Initial State of the form

Adjusting form specificity:

Increase its complexity by adding more parameters.Decrease its complexity by removing parameters.Increase data specificity by binding more existing parameters to constants.Decrease data specificity by unbinding parameters with fixed vales.

Page 16: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Approach followed in this paper:

To adjust Form Complexity Divide SQL’ into 4 query classes:• SELECT: basic SELECT-FROM-WHERE construct• AGGR: SELECT with aggregation• GROUP: AGGR with GROUP BY and HAVING clauses• UNION-INTERSECT: a UNION or INTERSECT of two SELECT

To adjust Data Specificity • Bind “value” fields of the “attr op value” predicates in the

WHERE clause to data values.

Page 17: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Step 4: Map each template to a form

Standard form components:• Label• Drop down list• Input box• Button

Page 18: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Keyword Search for Forms Basic Idea Used to find relevant forms which are used to pose structured

queries. Basic Approach Naïve AND Returns forms containing all the terms from keyword query. Naïve OR Some forms would be returned if the query includes at least one

term. Drawback?Keyword query must have schema term(s).

Page 19: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Approaches proposed in this paper:

Check whether data terms from user query appear in database.

If yes, modify query with relevant schema terms.

• Double Index OR Evaluation done using OR semantics.

• Double Index AND Evaluation done using AND semantics.

Page 20: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Example: Information Need: For which conferences a researcher named “Widom” has

served on program committee. Keyword Query: “Widom Conference” Here, Data term = “Widom” Schema term = “Conference” Results obtained:• Naïve AND - No forms returned as “Widom” does not appear on any form.• Naïve OR - Ignores “Widom” and returns all forms that contain

“Conference”• DI OR – Rewritten query will be “Widom person conference” as “Widom”

appears in person table and evaluated with OR semantics.• DI AND - Two queries generated “person conference” and “widom

conference” ,evaluated with AND semantics and union of results returned.

DB Life

person(id, name, homepage, title, group,organization, country)

conference(id, name)

Page 21: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Double Index OR Implementation

Indexes Used: • DataIndex- Inputs a data term and returns a set of <tuple-id, table> pairs. • FormIndex-Inputs a term and returns a set of form-ids.

Input- Keyword QueryOutput- Set of form-id’s.

Step 1:• Probe DataIndex with each query term qi in a query Q.• If qi is a data term, DataIndex will return a set of <tuple-id,table> pairs.• Add each table to the set FormTerms.• Add qi to FormTerms.

Step 2:• Probe FormIndex with terms in FormTerms.• Return form containing at least one of these terms.

Page 22: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

DI OR

Input: A keyword query Q = [q1 q2.... qn]Output: A set of form-ids F’Algorithm:FormTerms = {}, F’ = {}// Replace any data terms with table namesfor each qi Q∈if DataIndex(qi) returns <table, tuple-id> pairsAdd each table to FormTermsAdd qi to FormTerms // qi could be a form term// Get form-ids based on FormTermsFormIndex(FormTerms) => F’ // OR semanticsreturn F’

Page 23: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Double Index AND

• Generating all possible queries that result from replacing user supplied data terms with schema terms.

• Use AND semantics and return union of query results.

Problem?• Performing AND query with all the terms in FormTerms is wrong.

Why is this so?• Data term may appear in multiple unrelated tables such that no form would contain all these tables.

Concept of Bucket• For query “q1 AND q2” : “a S∈ q1 AND b S∈ q2,” where Sqi is a

“bucket” containing the form terms associated with qi, and a and b are two form terms from Sq1 and Sq2 correspondingly.

Page 24: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Double Index AND Implementation

Input- Keyword query.Output- Set of form-id’s.Step 1: • For each qi , initially bucket Sqi is empty.

• If the query contains data terms, DataIndex will return <table,tuple-id> pairs.

• For each table, add table to Sqi and FormTerms.

• Add qi to Sqi and FormTerms

Step 2:• Generate and add to SQ’ all distinct queries, each of which taking one term

from each Sqi.

• For each query in SQ’, probe the FormIndex and retrieve forms that have all terms in query.

Page 25: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

DI AND

Input: A keyword query Q = [q1 q2.... qn]Output: A set of form-ids F’Algorithm:FormTerms = {}, F’ = {}// Replace any data terms with table namesfor each qi Q∈Sqi = {} // Bucket for qiif DataIndex(qi) returns <table, tuple-id> pairsfor each tableif table FormTerms∉Add table to Sqi and FormTermsif qi FormTerms∉Add qi to Sqi and FormTerms// Get form-ids based on SqiSQ’ = EnumQueries( Sqi) // Enumerate all unique queries,∀// each having one term from each Sqifor each Q’ SQ’∈FormIndex(Q’) => F’ // A.D semantics on FormIndexreturn F’

Page 26: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Example:• User wants to search for a person “John Doe” • “John Doe” is present in person table but is not involved in any

relationship.

What will be the output? {Forms from person table + Forms from tables which reference person} will

be returned.

User Action: User tries to enter “John Doe” in the field name in a form which is join of

say person and conference tables.

Output? No results returned ------ > DEAD FORMS

Page 27: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Double Index Join

• Used to perform a check to see if a form will return an answer if instantiated with data terms in the user query.

How is the check performed?

Step 1: • Given keyword query Q, probe DataIndex with each query term qi.• When qi is a data term that leads to set of <table ,tuple-id> pairs, look up

each table T in a schema graph for SD and find reference tables that reference T.

• For each reference table, check to see if it contains any tuple-id of T.• If No, retrieve the forms that contain both T and refTable and record these

“dead” forms in say X.

Step 2:• Return F’ – X. This filters the dead forms.

Page 28: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

DI Join

Input: A keyword query Q = [q1 q2.... qn]Output: A set of form-ids F’Algorithm:FormTerms = {}, F’ = {}, X = {}for each qi Q∈Sqi = {}if DataIndex(qi) returns <table, tuple-id> pairsfor each table Tlet I be the set of tuple-ids from Tif T FormTerms∉Add T to Sqi and FormTermsSchemaGraph(T) returns refTablesfor each refTableif DataIndex(refTable:tid) is NULL for every tid I∈FormIndex(T AND refTable) => Xif qi FormTerms∉Add qi to Sqi and FormTerms// Get form-ids based on form termsSQ’ = EnumQueries( Sqi)∀for each Q’ SQ’∈FormIndex(Q’) => F’return F’ – X

Page 29: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Displaying Returned Forms

How are the returned forms ranked?

• Based on scoring function of Lucene index.

• Lucene score for a query Q and a document D is:score(Q,D) = coord(Q,D) * queryNorm(Q) * Σt in Q( tf(t in D) * idf(t)2 * t.getBoost() * norm(t,D) )

Page 30: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Problem?“Sister Forms”Illustration: User query – “Widom” Result of the query :

Impossible to find what user is looking for.

Page 31: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

What is the solution?Grouping Forms:Approach 1: • Group consecutive sister forms with same score- first level groups• Group forms by the four query classes• Display the classes in the order of SELECT, AGGR, GROUP, and UNION-INTERSECT.Result of “Widom” query:

Problem?Non-consecutive sister forms join different first level groups having the same description.

Page 32: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Solution?Approach 2:• First group the returned forms by their table.• Order the groups by the sum of their scores.• Advantage No repetition

Page 33: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Experimental Analysis

Experimental Setup• Data set-DBLife • Generated set of forms F1• 14 skeleton templates, one for each of 5 Entity

tables and 9 Relationship tables• Created templates-1 SELECT, 5 AGGR,6 GROUP, 2

UNION-INTERSECT, so F1 had 196 forms.• Real life user study was done with 7 graduate

students who found answers for 6 information needs.

Page 34: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Experimental Analysis

• Comparing Naïve, Double-Index, and Double-Index-Join• Ranking and Displaying Forms• Which is the best approach? Why? Let’s find out.

Microsoft Office Word 2007 Document

Page 35: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Related Work and References• Jayapandian[11] proposed automatic form generation for a database based on a

sample query workload. [11] M. Jayapandian, H. V. Jagadish. Automating the Design and Construction of

Query Forms. ICDE 2006

• Liu [14] proposed to automatically distinguish between schema terms and value terms in keyword query.

[14] F. Liu, C. Yu, W. Meng, A. Chowdhury. Effective Keyword Search in Relational Databases. SIGMOD 2006

• BANKS[3] proposed supporting the “attribute = value” construct in keyword

queries. [3] G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword

Searching and Browsing in Databases using BANKS. ICDE, 2002.

• Luo [16] proposed to detect empty result queries by “remembering” results from previously executed empty results queries.

[16] G. Luo. Efficient Detection of Empty-Result Queries. VLDB 2006.

Page 36: Combining Keyword Search and Forms for Ad Hoc Querying of Databases

Thank You!