chorus what is « search » a functional view ------------------------- 2008-04-21 henri gouraud wp2
TRANSCRIPT
![Page 1: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/1.jpg)
CHORUS What is « Search »A functional view
-------------------------2008-04-21
Henri GouraudWP2
![Page 2: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/2.jpg)
Overall goalOverall goal
Break down search into essential (necessary) components
Identify issues associated with each component Facilitate matching of use-cases with functional
overview For a given use-case, identify “critical” components
– Those for which there is no known solution– Those for which existing solutions are not performing
Identify use-cases where the model breaks– Repair/extend model– Identify potential « new models »
----> Prepare Gap Analysis
![Page 3: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/3.jpg)
This analysis tries to be « Media » independant This analysis tries to be « Media » independant
Functions are media independant– Document discovery– Meta-data extraction– User Interface– .....
Techniques necessary to implement each function are media dependant ...– Text extraction– Speech to text– Image signatures– ....
... and are at varying levels of maturity and performance
![Page 4: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/4.jpg)
Top level visionTop level vision
Search engines come into play when « direct » search into the document repository fails (volume, performance, ...)
Indexing
Matching
Documents
Data-base
Querying
![Page 5: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/5.jpg)
At the core: matchingAt the core: matching
Matching
Data-base
Query-meta-data
Document-meta-data
Matching happens between two « computer based » chunks of data– Query-meta-data, derived from the user input (and his context)– Document-meta-data derived from the documents being searched
![Page 6: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/6.jpg)
The Matching processThe Matching process
Simple or boolean– AND, OR, NEAR, Parentheses, Regular expression, ...
Accurate of fuzzy– Spelling, phonetic, « similar to », ...
Typed– Author:xx, Title:xx, ...
Centralized/distributed– Across single LAN, across WAN, peer 2 peer, ...
Issues– New media types: algorythms– Performance
• single query response time• query throughput
![Page 7: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/7.jpg)
The document sideThe document side
Matching
Data-baseContent
Build
Crawl
Push
Pull
D-meta-data
Document
Transform
The main issue: the « Transform » step– Extracting useful information from the documents
![Page 8: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/8.jpg)
The document sideThe document side
Document discovery– Pull=crawling, push=OK– Completeness, freshness,
Building the SE data-base– Scalabality, reliability– Incremental– Distributed
Transform: elaborating D-meta-data– Deal with existing meta-data, multi pass process, ...– Dealing with multiplicity of content type and formats– For each type, specific meta-data elaboration process
Issue– Algorythm (for each media type)– Performance (relates to document repository size and churn rate)
![Page 9: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/9.jpg)
The user sideThe user side
User
Results
Transform UIQuery
UI
Matching
Data-base
Q-meta-data
Organize
The two main issues– Transforming the user query into Q-meta-data– Organizing the results into manageable form
Navigation
![Page 10: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/10.jpg)
The user sideThe user side
Capturing the « user intent »– The DWIM dream– Providing useful hints (what is « searchable »?)
Organizing the results– Assume multiple results, i.e. choice or refinement–
Issues– Algorythm (for each media type)– Clustering, structuring, summarizing, ...– User Interface (for each terminal type)– Performance (under the ½ sec threshold)
![Page 11: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/11.jpg)
Librarian
The big pictureThe big picture
Intra-doc navigation
User
Results
TransformQuery
NavigationMatching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Q-meta-data
D-meta-data
UI
UI
![Page 12: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/12.jpg)
The big picture issuesThe big picture issues
On the document side, acquiring D-meta-data that will speed up the matching process– Performnce trade-off
On the document side, acquiring D-meta-data that will be relevant on the user side– That will fit « naturally » with the potential user queries– That will assist in organizing results into « manageable » form
![Page 13: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/13.jpg)
Librarian
Context, personalizationContext, personalization
User contextContent context
Intra-doc navigation
User
Results
TransformQuery
Matching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Q-meta-data
D-meta-data
UI
UI
Navigation
![Page 14: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/14.jpg)
Librarian
A Functional breakdown of Search Engine (it is much more complex)
User contextContent context
Intra-doc navigation
User
Results
Query
NavigationMatching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Q-meta-data
D-meta-data
UI
UITransform
Corpora
![Page 15: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/15.jpg)
Librarian
Search vs AlertsSearch vs Alerts
User contextContent context
Intra-doc navigation
User
Results
TransformQuery
Stored queries
Matching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Q-meta-data
D-meta-data
UI
UI
Navigation
![Page 16: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/16.jpg)
Librarian
Acting on resultsActing on results
User contextContent context
Intra-doc navigation
User
Results
Transform UIQuery
UI
Stored queries
Matching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Act
User as a “librarian”
Q-meta-data
D-meta-data
Navigation
![Page 17: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/17.jpg)
Some global cross-functional issuesSome global cross-functional issues
IP, access rights, usage rights, Security, privacy, …
Business model
Architecture, APIs, standards, … Software engineering Scalability
![Page 18: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/18.jpg)
The Research triangle for Search EnginesThe Research triangle for Search Engines
Librarian
User contextContent context
Intra-doc navigation
User
Results
Query
NavigationMatching
Data-baseContent
Transform
Build
Crawl
Push
Pull
Document
Organize
Q-meta-data
D-meta-data
UI
UITransform
![Page 19: CHORUS What is « Search » A functional view ------------------------- 2008-04-21 Henri Gouraud WP2](https://reader036.vdocuments.net/reader036/viewer/2022082712/56649f4f5503460f94c71937/html5/thumbnails/19.jpg)
Next stepsNext steps
Quantify limits associated with each functional component– Main driving parameter (size/churn, user population, media
type, ...)– Influence on other functional components
--> Identify main use-case typology terms
Compare/describe research and industry use-cases according to the proposed functional description– Prepare for gap analysis– Identify expected functional level progress– Identify « mismatch » cases, alternative/complementary models