interfaces for querying collections. information retrieval activities selecting a collection...

Interfaces for Querying Collections

Information Retrieval Activities

Selecting a collection– Lists, overviews, wizards, automatic selection

Submitting a request– Queries & expressiveness– Graphical interfaces– Natural language

Examining the response– Next class

Simple Query Interface

Complex Query Interface

Primary HCI Styles

Command language

Form filling

Menu selection

Direct manipulation

Natural language

Others?

Boolean QueriesMost commercial full-text retrieval systems

(until recently) supported only Boolean queries.

Many studies show users have difficulty with Boolean expression– And and Or not as used in English

• “cats and dogs”• “tea or coffee”

– Syntax specifying nesting is often cryptic

Boolean model does not include ranking– Earlier systems used reverse chronological order

Web-based Boolean Queries

Search engines based on Boolean or extended Boolean engines needed to make their systems usable by the Web audience

Reduce expressiveness for ease of use– Use “all the words” and “any of the words”– Boolean-based search engines added the +

prefix

Ranking performed using statistical algorithms and Web-specific heuristics

Command Line Search

Command line interfaces for search

Example Queries from Melvyl:– FIND PA darwin and TW species or TW descent– FIND TW Mt St. Helens AND DATE 1981

Command Line Search

Still in use …

Form and Menus Melvyl

Faceted Queries

Boolean queries often return too many or too few results

– Conjunctions reduce sets too quickly– Disjunctions grow sets too quickly

Solution: – Try out smaller queries to see if they have an

appropriately sized set of results– Combine the smaller queries that are successful into

larger query.Example:

1. (osteoporosis OR “bone loss”)2. (drugs OR pharmaceuticals)3. (preventions OR cure)4. 1 AND 2 AND 3

Post-Coordinate or Quorum Ranking

Results are first ranked based on how many facets of the query they match.

Faceted Search with Quorum ranking allows specifying each concept in multiple ways yet ranking based on number of concepts included in document.

Further extension is to allow users to weight each facet.– Found on the web to help balance different

goals of search (e.g. selecting a car or house)

Result Size Problem Occurs with Web Search Too

Graphical Query Specification

Graphical interfaces can be static, direct manipulation, or combine the two.

Direct manipulation– Continuous representation of objects– Physical actions replace complex syntax– Rapid incremental reversible operations on

objects– Immediate feedback on actions

Graphical Boolean Queries

Graphical queries are more accurate and faster than command-line queries in some studies

Venn diagrams are common graphical approach– Limit to three elements in conjunction

VQuery– Let users draw ellipses to create their own

queries

VQuery

Process-Based Graphs

Can graphically represent the query as a process of selection.

Filter-flow model presents a set of filters.– One attribute and set of potential values per

filter, multiple values treated as disjunction– Branches in flow indicate disjunctions– Serialized filters indicate conjunctions

Fewer errors made with filter-flow than with SQL

Filter-Flow

Block-diagram Visualization

Users arrange blocks to specify query.

STARS– Users initially type in natural language query– Query terms are turned into blocks– Blocks are then arranged into query– Blocks in same row represent conjunction– Blocks in same column represent disjunction– Allows for previewing the query results by

simple rearrangement of blocks

Magic Lenses

Lenses act as filters on an overview visualization.– Disjunction is represented by independent

lenses– Conjunction is expressed by placing multiple

lenses over one another– Lenses can include addition information

• Where the term must appear• Term frequency requirements• Switches to use stemming• …

Magic Lenses

Phrases and Proximity

Specifying phrases and proximity constraints can be used to vastly improve precision.

Phrase search is often used in the context of the Web.– But the phrase must be literal– “President Lincoln” does not match “President

Abraham Lincoln”

Proximity constraints allow for more general queries– Examples:

• LEXIS-NEXIS “white w/3 house” means “white within three words of house”

Natural Language and Free Text Queries

Many systems treat question as a bag of words

Natural language processing can be used to try to better determine the information need.

– Extract noun (and verb) phrases– Find noun (and verb) phrases in same sentence

Ask.com uses sites preselected to answer particular question forms.

– Need to recognize type of question

Ask.com

interfaces for querying collections. information retrieval activities selecting a collection...

Documents

vquery slide

class slide

house slide

boolean queries search

search example queries

smaller queries

commandline queries

complex query interface