variations in searching for information

Variations in Searching for Information

CMPT 455/826 - Week 11, Day 2

1

Approximate Query Processing

• Abstract1

– This article describes query processing in the DBO database system.

– Like other database systems designed for ad hoc analytic processing, DBO is able to compute the exact answers to queries over a large relational database in a scalable fashion.

– Unlike any other system designed for analytic processing, DBO can constantly maintain a guess as to the final answer to an aggregate query throughout execution, along with statistically meaningful bounds for the guess’s accuracy.

– As DBO gathers more and more information, the guess gets more and more accurate, until it is 100% accurate as the query is completed.

– This allows users to stop the execution as soon as they are happy with the query accuracy, and thus encourages exploratory data analysis.

1. Scalable Approximate Query Processing with the DBO Engine by Chris Jermaine, Subramanian Arumugan, Abhijit Pol, and Alin Dobra

Approximate Query Processing

• Purpose:– To get fast intermediate results on queries that could take longer

than the extra precision is worth

• Technique:– Uses random sampling rather than sequential processing to

keep accumulating more and more exact information

• Comments:– The paper is very technical, but the concept is what is important

to consider

Inconsistent Databases

• Abstract2

– Query answering from inconsistent databases • amounts to finding “meaningful” answers to queries posed over database

instances • that do not satisfy integrity constraints specified over their schema.

– A declarative approach to this problem relies on • the notion of repair, • that is, a database that satisfies integrity constraints • and is obtained from the original inconsistent database • by “minimally” adding and/or deleting tuples.

2. Repair Localization for Query Answering from Inconsistent Databases by Thomas Eiter, Michael Fink, Gianluigi Greco, and Domenico Lembo Sapienza


• Purpose:– A database may become inconsistent in many ways

• This is particularly challenging in the context of data integration, – where a number of data sources, heterogeneous and widely distributed,

must be presented to the user as if they were a single (virtual) centralized database, which is often equipped with a rich set of constraints expressing important semantic properties of the application at hand.

– Since, in general, the integrated sources are autonomous, the data resulting from the integration are likely to violate these constraints.

– The standard approach through data cleaning • may be insufficient

• even if only few inconsistencies are present in the data


• Technique:– The notion of a repair for an inconsistent database

• a repair is a new database which satisfies the constraints in the schema and minimally differs from the original one.

– The suitability of a possible repair depends on » the underlying semantics adopted for the inconsistent database, » and on the kinds of integrity constraints allowed on the schema.

• multiple repairs might be possible

• the standard way of answering a user query is – to compute the answers that are true in every possible repair

• Comments:


• Comments:– The major problem here is having inconsistent information in a

database. • A more important problem is the reason behind the inconsistency in

information throughout the database.– It is difficult to decide what form information should be represented in

when combining differing database schemes. • If this is not done carefully it is likely that the database will end up with

misleading or inconsistent data.– The query is checked against all the possible repairs to the database.

• The answer is based on some evaluation between the repairs that are available, but how likely is it that the query was answered in the desired way?

– Instead of doing extra work with rewriting queries as they are asked • why not use the information found out by these techniques to determine a

more permanent fix for the inconsistency of the data– If a consistent answer can be determined from an inconsistent database, then it

seems likely that the information could be made consistent in the database for future queries.

Dynamic Spatial Queries

• Abstract3

– Conventional spatial queries are usually meaningless in dynamic environments

• since their results may be invalidated • as soon as the query or data objects move.

– In this paper we formulate two novel query types, • A time-parameterized query• A continuous query

3. Spatial Queries in Dynamic Environments by Yufei Tao and Dimitris Papadias


• Purpose:– As opposed to traditional, “instantaneous”, queries

• that are evaluated only once to return a single result,

– continuous queries • may require constant evaluation and updates of the results

• as the query conditions or database contents change


• Technique:– A time-parameterized query returns:

• the objects that satisfy the corresponding spatial query at the time when the query is issued

• the expiry time of the result given the current motion of the query and database objects

• the change that causes the expiration of the result

– A continuous query retrieves • tuples of the form <result, interval>, • where each result is accompanied by a future interval, during which it is

valid.

• NOTE: A continuous query can be answered by repetitive execution of TP queries until some termination clause is satisfied.


• Comments:– In addition to getting the correct result from the spatial queries,

should have addressed how a dynamic database could be updated.

• E.g. Dynamic environment such as automated car park involves both vehicles moving in and out of the parking lot and the database being updated on the number of available lots at a given time.

– There are issues how expiry time is dealt with, • what happens when the entity changes direction or velocity, does

the expiry time remain valid?

Querying the Semantic Web

• Abstract4

– The Resource Description Framework (RDF) • enables the creation and exchange of metadata as any other Web data.

– There is a need for sufficiently expressive declarative query languages • for querying Web pages that make use of RDF

– We propose RQL, a new query language • adapting the functionality of semistructured or XML query languages • to the peculiarities of RDF • but also extending this functionality • in order to uniformly query both RDF descriptions and schemas.

4. Querying the Semantic Web with RQL by G. Karvounarakis, A. Magganaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, and K. Tolle


• Purpose:– RQL adapts the functionality

• of semistructured or XML query languages

• to the peculiarities of RDF

• but also extends this functionality in order

• to uniformly query both RDF descriptions and schemas.

– With RQL users are able to query resources • described according to their preferred schema,

• while discovering how the same resources

• are also described using another classification schema.


• Technique:– We introduce a formal data model and type system

• for description bases created according to the RDF Model & Syntax and Schema specifications

– In order to support superimposed RDF descriptions, • the main modeling challenge is

– to represent properties as self-existent individuals, – as well as to introduce a graph instantiation mechanism permitting

multiple classification of resources.


• Comments:– The typed system used for RQL is extremely useful

• in that it is actually read from the RDF schema - the type system is specific to the schema being used.

– However all types fit into a finite list of types, • which contains literal types, resource types, class types, property

types and others.

– The discussion on typing as it relates to RDF • would be useful in considering various other approaches to typing

for other means of modeling (ER or class diagrams).

– In ER modeling this could be achieved • through choosing property names/attributes for a relationship and

including them in the diagram (and not just “is-a”).

Entity Search Engine

• Abstract5

– The Web has become a rich collection of data-rich pages, • on the “surface Web” of static URLs • as well as the “deep Web” of database-backed contents

– The richness of data, • while a promising opportunity, • has challenged us to effectively find data we need, • from one or multiple sources.

– We are motivated by the need of • large scale on-the-fly integration for online structured data.

5. Entity Search Engine: Towards Agile Best Effort Information Integration over the Web by Tao Cheng and Kevin Chen-Chuan Chang


• Purpose:– How do we identify and integrate the structured data

• embedded in unstructured result pages?


• Technique:– search engines search for pages by keywords.

– such as Google, Yahoo, or MSN,

• while being ”IR-style” with a scalable text processing framework, • they are not data aware.

– Integration services exist online for specific domains. – such as Expedia.com or PriceGrabber.com

• They provide “DB-style” precise querying, • but they can hardly scale the amount of data and the number of

sources on the Web.

– We propose a solution • where the two extremes meet, • with a synergistic “marriage” in the middle.


• Comments:– There are still problems with sites that embed their data in

inaccessible formats that cannot be queried

variations in searching for information

Documents