1 integration of the textual data for information retrieval : re-use the linguistic information of...
Post on 17-Jan-2016
221 Views
Preview:
TRANSCRIPT
1
INTEGRATION OF THE TEXTUAL DATA FOR
INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY
Omar LAROUK
ELICO -ENS SIB
University of Lyon-France
Computational linguistics & Information Science
École Nationale Supérieure des Sciences de l’Information et des Bibliothèques
ENSSIB-Lyon, Université de Lyon, France
FRAMEWORK PRESENTATION
- IMPORTANT PRODUCTION OF TEXTS (web)- IMPORTANT RETRIEVAL OF TEXTS BY USERS
INFORMATION
• AIM OF PROJECT : -LOOK FOR A SYSTEM WHO :
-TO AUTOMATE MANUAL PROCESS OF INDEXING-TO STRUCTURE THE INFORMATION-TO PERMIT TO OBTAIN ‘pertinent’ INFORMATION
FRAMEWORK PRESENTATION
METHOD :METHOD :
Elaboration of Filtering DATABASES from Elaboration of Filtering DATABASES from informations extracted from the textual data
– Use of linguistic techniques for
processing documents (textual data). – Search strategies that allow the user to
exploit documents.
Why linguistic reference in information retrieval
• We think that looking of information is a presupposition of the speaker in a context at first of indexing of the document (during the creation) as if the designer pronounce a contextual sentence.
• The absence of words presupposed in the answer given by the server (search engines) can be interpreted as the lack at the truth of this sentence in this context of indexation.
Information Retrieval on the WEB Co-operative process for reformulation
WEB
Answers
A0, A1 , ...Ak , ..., An
user(s)
Queries
Q0, Q1 , ...Qk , ..., Qn
ex: Google,
Yahoo
Title, Abstract,
• A Search Engine is used to retrieve electronic documents on the Web but they not use linguistic analysis. When a user types a query:
– the engine returns a ranked list of documents that contain exactly all the words of its request (query processing).
– looking for documents made by the user
requires writing a query in the form of few words, so that the search engine is to return the URL of classified documents containing those words exactly.
Linguistic problem of Search Engines :
20
user-Queries system-Answersstepssteps:
Q0 A0
Q1 A1
A2
Ak+1
Q2
Qk
Qn-1 An-1
s0
s1
sk
sn
sn-1
An = nn is the filtered final answers with the
oriented needs where netn
Qn An
... ... ...
Documentary Information system
• Documentary automatization is blocked by the problem of indexing and interrogation
Use natural language as textual data for automatic indexing because language integrates the contextual and temporal factor through connectors, verbs, adverbs, etc..
• We use the LINGUISTICAL APPROACH
Queries : simple or complex PredicatesPredicates
• Questions : simple Predicates – /project/, /law/, /station/, /flag/, /black/,
/white/,..
• combining predicate :
– / white flag/, /white and black/, /.....
• Propagation of the predicates left and/or right– /...... project of law ...... / ;
– / ...... flag white and black/...... //;; ...
15
LINGUISTIC INFORMATION OF VICINITY: Hierarchic informational
LINGUISTIC INFORMATION OF VICINITY: Hierarchic informational
NP or linguistical expression
predicate_right Predicate
left_predicate
Les produits informatiques de la société in French
computer products company in English
Reformation of questions
We propose to processing natural language using the contribution of reformulate the queries for computer-human dialogue.
The reformulation of questions in natural language using classical technique for information retrieval in databases, but the new systems of question/answers are a very open in the web content.
we present a study based on the formulation of ‘questions key’ based on ‘core predicates’ enrichment by left or right of center of terms.
How the search engine gives responses ?
• For example, if you search 'informati*', it does not contain, computerization (informatisation in French, ….).
Lack of technical language in search engines__________________________________________The search tools and web database following are not
equipped with linguistic tools :
-Information retrieval in structured documents , -Multilingual Information retrieval,- Personalized Information Retrieval ,- Information automatic processing of natural
language ,- Search for information based on Ontology , -Multimedia Information Retrieval , -Answers/Questions, -etc.
top related