chorus what is « search » a functional view ------------------------- 2008-04-21 henri gouraud wp2

Post on 14-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CHORUS What is « Search »A functional view

-------------------------2008-04-21

Henri GouraudWP2

Overall goalOverall goal

Break down search into essential (necessary) components

Identify issues associated with each component Facilitate matching of use-cases with functional

overview For a given use-case, identify “critical” components

– Those for which there is no known solution– Those for which existing solutions are not performing

Identify use-cases where the model breaks– Repair/extend model– Identify potential « new models »

----> Prepare Gap Analysis

This analysis tries to be « Media » independant This analysis tries to be « Media » independant

Functions are media independant– Document discovery– Meta-data extraction– User Interface– .....

Techniques necessary to implement each function are media dependant ...– Text extraction– Speech to text– Image signatures– ....

... and are at varying levels of maturity and performance

Top level visionTop level vision

Search engines come into play when « direct » search into the document repository fails (volume, performance, ...)

Indexing

Matching

Documents

Data-base

Querying

At the core: matchingAt the core: matching

Matching

Data-base

Query-meta-data

Document-meta-data

Matching happens between two « computer based » chunks of data– Query-meta-data, derived from the user input (and his context)– Document-meta-data derived from the documents being searched

The Matching processThe Matching process

Simple or boolean– AND, OR, NEAR, Parentheses, Regular expression, ...

Accurate of fuzzy– Spelling, phonetic, « similar to », ...

Typed– Author:xx, Title:xx, ...

Centralized/distributed– Across single LAN, across WAN, peer 2 peer, ...

Issues– New media types: algorythms– Performance

• single query response time• query throughput

The document sideThe document side

Matching

Data-baseContent

Build

Crawl

Push

Pull

D-meta-data

Document

Transform

The main issue: the « Transform »  step– Extracting useful information from the documents

The document sideThe document side

Document discovery– Pull=crawling, push=OK– Completeness, freshness,

Building the SE data-base– Scalabality, reliability– Incremental– Distributed

Transform: elaborating D-meta-data– Deal with existing meta-data, multi pass process, ...– Dealing with multiplicity of content type and formats– For each type, specific meta-data elaboration process

Issue– Algorythm (for each media type)– Performance (relates to document repository size and churn rate)

The user sideThe user side

User

Results

Transform UIQuery

UI

Matching

Data-base

Q-meta-data

Organize

The two main issues– Transforming the user query into Q-meta-data– Organizing the results into manageable form

Navigation

The user sideThe user side

Capturing the « user intent »– The DWIM dream– Providing useful hints (what is « searchable »?)

Organizing the results– Assume multiple results, i.e. choice or refinement–

Issues– Algorythm (for each media type)– Clustering, structuring, summarizing, ...– User Interface (for each terminal type)– Performance (under the ½ sec threshold)

Librarian

The big pictureThe big picture

Intra-doc navigation

User

Results

TransformQuery

NavigationMatching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Q-meta-data

D-meta-data

UI

UI

The big picture issuesThe big picture issues

On the document side, acquiring D-meta-data that will speed up the matching process– Performnce trade-off

On the document side, acquiring D-meta-data that will be relevant on the user side– That will fit « naturally » with the potential user queries– That will assist in organizing results into « manageable » form

Librarian

Context, personalizationContext, personalization

User contextContent context

Intra-doc navigation

User

Results

TransformQuery

Matching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Q-meta-data

D-meta-data

UI

UI

Navigation

Librarian

A Functional breakdown of Search Engine (it is much more complex)

User contextContent context

Intra-doc navigation

User

Results

Query

NavigationMatching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Q-meta-data

D-meta-data

UI

UITransform

Corpora

Librarian

Search vs AlertsSearch vs Alerts

User contextContent context

Intra-doc navigation

User

Results

TransformQuery

Stored queries

Matching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Q-meta-data

D-meta-data

UI

UI

Navigation

Librarian

Acting on resultsActing on results

User contextContent context

Intra-doc navigation

User

Results

Transform UIQuery

UI

Stored queries

Matching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Act

User as a “librarian”

Q-meta-data

D-meta-data

Navigation

Some global cross-functional issuesSome global cross-functional issues

IP, access rights, usage rights, Security, privacy, …

Business model

Architecture, APIs, standards, … Software engineering Scalability

The Research triangle for Search EnginesThe Research triangle for Search Engines

Librarian

User contextContent context

Intra-doc navigation

User

Results

Query

NavigationMatching

Data-baseContent

Transform

Build

Crawl

Push

Pull

Document

Organize

Q-meta-data

D-meta-data

UI

UITransform

Next stepsNext steps

Quantify limits associated with each functional component– Main driving parameter (size/churn, user population, media

type, ...)– Influence on other functional components

--> Identify main use-case typology terms

Compare/describe research and industry use-cases according to the proposed functional description– Prepare for gap analysis– Identify expected functional level progress– Identify « mismatch » cases, alternative/complementary models

top related