enhancing internet search engines to achieve concept-based retrieval

25
Enhancing Internet Search Engines to Achieve Concept-based Retrieval F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999

Upload: chyna

Post on 31-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Enhancing Internet Search Engines to Achieve Concept-based Retrieval. F. Lu, T. Johnsten, V. Raghavan, and D. Traylor InForum ‘99 May 5 -6, 1999. Agenda. Information on the Internet. Boolean Retrieval Model and the Internet. Concept-Based Retrieval (RUBRIC / CS 3 ). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Enhancing Internet Search Engines to Achieve Concept-

based Retrieval

F. Lu, T. Johnsten, V. Raghavan,

and D. Traylor

InForum ‘99

May 5 -6, 1999

Page 2: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Agenda

• Information on the Internet.

• Boolean Retrieval Model and the Internet.

• Concept-Based Retrieval (RUBRIC / CS3).

• CS3 and Boolean Search Engines.

• Future Work.

Page 3: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Information on the Internet

• Large volume.

• Rapid growth rate.

• Wide variations in quality and type.

Page 4: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Boolean Retrieval Model and the Internet

• Most Internet search engines are based on the Boolean Retrieval Model.

• Boolean Retrieval Model is relatively easy to implement.

• Limitations:– Inability to assign weights to query or document terms.

– Inability to rank retrieved documents.

– Naïve users have difficulty in using

Page 5: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Concept-Based Retrieval

• Address shortcomings of Boolean Retrieval Model.

• Search Requests specified in terms of concepts structured as rule-base trees.

Page 6: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Development of Rule-Base Trees (General)

• Top-down refinement strategy.

• Support for AND / OR relationships.

• Support for user-defined weights.

Page 7: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 8: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Development of Rule-Base Trees (CS3)

• Concept-Set Structuring System (CS3)

• CS3 supports the creation, storage and modification of user-defined concepts

• Post-processing of results of sub-queries

• CS3 user-interface.

Page 9: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

CS3 User Interface

Page 10: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Evaluation of Rule-Base Trees (RUBRIC)

• Run-time, bottom-up analysis.

• Propagation of weight values (MIN / MAX).

• Disadvantage of run-time analysis.

Page 11: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 12: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Evaluation of Rule-Base Trees (CS3)

• Static, bottom-up analysis.

• Construct Minimal Term Set (MTS).

• Propagation of terms.

• CS3 user-interface.

Page 13: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

MTS-Minimal Term Set A MTS for a topic is a set of terms such that if

each term in the set appears in the document, the document would get a RSV larger than 0. If not, the RSV would be 0.

A topic could have more than one MTSs. A user can choose from those MTSs to perform a

search to his needs.

Page 14: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 15: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 16: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 17: Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Page 18: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Concept-Based Retrieval and Boolean Search Engines

• CS3 is designed to interface with existing Boolean search engines.

• U.S. Department of Energy’s “Information-Bridge” search engine.

• U.S. Department of Transportation’s “National Transportation Library” search engine.

Page 19: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

System ArchitectureClient (Java/ Applet )

CORBA CGI

Server (JAVA) Server (JAVA/C++)

JDBC

ORACLE

DOE

InfoBridge…

etc.

Page 20: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Information-Bridge and CS3

• Search request: Boolean Vs. Concept

• Output: Non-Ranked Vs. Ranked.

• Calculation of RSV:– Given a document D and a set S of MTS

expressions satisfied by D, the RSV of D is equal to the sum of all the weights of S plus the maximum weight in S.

Page 21: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Information-Bridge and CS3 (Example)

• Boolean search request (“Environmental Science Network” Form):– (“Hydrogeology” OR “Dnapl” OR (“Colloid*”

AND “Environmental Transport”)).

• Concept (CS3):– “Hydrogeology”.– Rule-Base Tree.

Page 22: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

CS3 Hydrogeology Rule Base

Page 23: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

CS3 search results

Page 24: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Current and Future Work

• Conduct experiments to evaluate effectiveness (future).

• Investigate alternative methods to compute RSVs [KADR00, KDR01*].

• Learning edge weights through relevanace feedback [KR00].

• Thesaurii based rulebase generation [KLR00].

Page 25: Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Relevant URLs

www.cacs.usl.edu/~linc-projects/cs3/

[LJRT99*]

RaghavanHome Publications since 1991