chorus what is « search » a functional view ------------------------- 2008-04-21 henri gouraud wp2

CHORUS What is « Search »A functional view

-------------------------2008-04-21

Henri GouraudWP2

Overall goalOverall goal

Break down search into essential (necessary) components

Identify issues associated with each component Facilitate matching of use-cases with functional

overview For a given use-case, identify “critical” components

– Those for which there is no known solution– Those for which existing solutions are not performing

Identify use-cases where the model breaks– Repair/extend model– Identify potential « new models »

----> Prepare Gap Analysis

This analysis tries to be « Media » independant This analysis tries to be « Media » independant

Functions are media independant– Document discovery– Meta-data extraction– User Interface– .....

Techniques necessary to implement each function are media dependant ...– Text extraction– Speech to text– Image signatures– ....

... and are at varying levels of maturity and performance

Top level visionTop level vision

Search engines come into play when « direct » search into the document repository fails (volume, performance, ...)

Indexing

Matching

Documents

Data-base

Querying

At the core: matchingAt the core: matching

Matching

Data-base

Query-meta-data

Document-meta-data

Matching happens between two « computer based » chunks of data– Query-meta-data, derived from the user input (and his context)– Document-meta-data derived from the documents being searched

The Matching processThe Matching process

Simple or boolean– AND, OR, NEAR, Parentheses, Regular expression, ...

Accurate of fuzzy– Spelling, phonetic, « similar to », ...

Typed– Author:xx, Title:xx, ...

Centralized/distributed– Across single LAN, across WAN, peer 2 peer, ...

Issues– New media types: algorythms– Performance

• single query response time• query throughput

The document sideThe document side

Matching

Data-baseContent

D-meta-data

Document

Transform

The main issue: the « Transform » step– Extracting useful information from the documents

The document sideThe document side

Document discovery– Pull=crawling, push=OK– Completeness, freshness,

Building the SE data-base– Scalabality, reliability– Incremental– Distributed

Transform: elaborating D-meta-data– Deal with existing meta-data, multi pass process, ...– Dealing with multiplicity of content type and formats– For each type, specific meta-data elaboration process

Issue– Algorythm (for each media type)– Performance (relates to document repository size and churn rate)

The user sideThe user side

Results

Transform UIQuery

Matching

Data-base

Q-meta-data

Organize

The two main issues– Transforming the user query into Q-meta-data– Organizing the results into manageable form

Navigation

The user sideThe user side

Capturing the « user intent »– The DWIM dream– Providing useful hints (what is « searchable »?)

Organizing the results– Assume multiple results, i.e. choice or refinement–

Issues– Algorythm (for each media type)– Clustering, structuring, summarizing, ...– User Interface (for each terminal type)– Performance (under the ½ sec threshold)

Librarian

The big pictureThe big picture

Intra-doc navigation

Results

TransformQuery

NavigationMatching

Data-baseContent

Transform

Document

Organize

Q-meta-data

D-meta-data

The big picture issuesThe big picture issues

On the document side, acquiring D-meta-data that will speed up the matching process– Performnce trade-off

On the document side, acquiring D-meta-data that will be relevant on the user side– That will fit « naturally » with the potential user queries– That will assist in organizing results into « manageable » form

Librarian

Context, personalizationContext, personalization

User contextContent context

Results

TransformQuery

Matching

Data-baseContent

Transform

Document

Organize

Q-meta-data

D-meta-data

Navigation

Librarian

A Functional breakdown of Search Engine (it is much more complex)

Results

NavigationMatching

Data-baseContent

Transform

Document

Organize

Q-meta-data

D-meta-data

UITransform

Corpora

Librarian

Search vs AlertsSearch vs Alerts

Results

TransformQuery

Stored queries

Matching

Data-baseContent

Transform

Document

Organize

Q-meta-data

D-meta-data

Navigation

Librarian

Acting on resultsActing on results

Results

Transform UIQuery

Stored queries

Matching

Data-baseContent

Transform

Document

Organize

User as a “librarian”

Q-meta-data

D-meta-data

Navigation

Some global cross-functional issuesSome global cross-functional issues

IP, access rights, usage rights, Security, privacy, …

Business model

Architecture, APIs, standards, … Software engineering Scalability

The Research triangle for Search EnginesThe Research triangle for Search Engines

Librarian

Results

NavigationMatching

Data-baseContent

Transform

Document

Organize

Q-meta-data

D-meta-data

UITransform

Next stepsNext steps

Quantify limits associated with each functional component– Main driving parameter (size/churn, user population, media

type, ...)– Influence on other functional components

--> Identify main use-case typology terms

Compare/describe research and industry use-cases according to the proposed functional description– Prepare for gap analysis– Identify expected functional level progress– Identify « mismatch » cases, alternative/complementary models

chorus what is « search » a functional view ------------------------- 2008-04-21 henri gouraud wp2

Documents

wp2 progress overview

valse-song solo and chorus solo duet duet trio solo and...

111114 wp2 accentuate

wp2 technologies final

wp2 2nd review

workpackage 2: implementation infrastructure. wp2:...

new microsoft word document · fortune plango vulnera primo...

wp2 review meeting milan, october 05, 2011 modern eniac wp2...

wp2 presentation

3.6 relazione wp2

wp2 report

gouraud, des origines au...

elf wp2 modeling guidelines -...

wp2 final report

chorevolution wp2 enablers

sap aris wp2

oer2go.orgoer2go.org/mods/fr-ebooksgratuits/beq.ebooksgratuits.co… ·...

data management expert panel - wp2. wp2 overview

advances in wp2

ecobreed wp2 report