the university of kansas vitalseek dr. susan gauch

20
The University of Kansas Vitalseek Dr. Susan Gauch

Post on 21-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

The University of Kansas

Vitalseek

Dr. Susan Gauch

The University of Kansas

Overview

• Provide technical and research capabilities for a Kansas City startup company

• Partner with Today Communications, Inc. to provide high quality, online medical information

• Develop innovative, quality based rankings of online Web pages

• Transition technology for easy adaptation on behalf of potential clients, easy maintenance for sponsoring company

The University of Kansas

Project Goals

• System to support online entry of human judgments for a wide variety of medical Web sites on a large number of criteria

• Novel search engine combining traditional keyword-based retrieval with user-selected quality criteria

• Speed• Scalability• Reliability

The University of Kansas

Ranking System

• Ranking System- Online entry, viewing, validation, modification

- Over 150 criteria per site

- Sites rated- Overall

- Per Topic (50 topics)

The University of Kansas

Spider

• Automatically collect Web pages from Web sites– Keys off of sites as they are entered in ratings database

• Continuous loop– Visit sites

– Index content

– Revisit sites

• Multiple, concurrent spiders on a dedicated machine– Co-ordination

– Speed

The University of Kansas

Indexing Documents

• Initially, all documents are indexed together– Time Bottleneck (4+ days to index)

– Space Bottleneck (resulting file exceeds system limits)

• Revised version– Each site indexed separately

• Can visit, index in a loop site by site

– But, must select a subset of the collections to process for each query

• Classic distributed information retrieval problem

The University of Kansas

Retrieval System - Broker

• Given a query and a set of criteria• Phase I – Broker

– Select those web sites that meet the criteria• E.g., Privacy, Authority, Navigation

– Select those sites that have the best content from among the first set

• Number of documents with the query words

– Send the query to the top N sites (approx. 10)

The University of Kansas

Retrieval System – Query Processing

• For each site, – Identify the top documents for the query

• page weight with-respect-to query terms

• Site weight with-respect-to user criteria

• Combine these factors and rank the pages

• Fuse results from all sites– Merge the lists of pages based on weights

– Rearrange as necessary to provide results from a mix of sites on each page

The University of Kansas

Partner System

• Allows Vitalseek to be back end search engine• Results appear as though from partner• Web-based system for

– Entering partners

– Customizing results

– Customizing search criteria

The University of Kansas

Challenges

• Combining user criteria and keywords– Initial versions, used a weighted combination

– Abandoned in favor of filtering version

• Scalability– Thousands of sites

– Millions of pages• Spidering and indexing speed

• System limits

– Priority-based pruning of index files

• High-tech start-up demands, university research lab schedule

The University of Kansas

Vitalseek.com

The University of Kansas

Viewpoint filters

The University of Kansas

Type of Site Filters

The University of Kansas

Site Filters

The University of Kansas

Content Filters

The University of Kansas

Topic Filters

The University of Kansas

Resource Filters

The University of Kansas

Query: kidney

The University of Kansas

Query: kidney in Diabetes

The University of Kansas

URAC accredited sites only