1 integration of the textual data for information retrieval : re-use the linguistic information of...

18
1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France Computational linguistics & Information Science École Nationale Supérieure des Sciences de l’Information et des Bibliothèques ENSSIB-Lyon, Université de Lyon, France

Upload: beryl-ryan

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

1

INTEGRATION OF THE TEXTUAL DATA FOR

INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY

Omar LAROUK

ELICO -ENS SIB

University of Lyon-France

Computational linguistics & Information Science

École Nationale Supérieure des Sciences de l’Information et des Bibliothèques

ENSSIB-Lyon, Université de Lyon, France

Page 2: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

FRAMEWORK PRESENTATION

- IMPORTANT PRODUCTION OF TEXTS (web)- IMPORTANT RETRIEVAL OF TEXTS BY USERS

INFORMATION

• AIM OF PROJECT : -LOOK FOR A SYSTEM WHO :

-TO AUTOMATE MANUAL PROCESS OF INDEXING-TO STRUCTURE THE INFORMATION-TO PERMIT TO OBTAIN ‘pertinent’ INFORMATION

Page 3: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

FRAMEWORK PRESENTATION

METHOD :METHOD :

Elaboration of Filtering DATABASES from Elaboration of Filtering DATABASES from informations extracted from the textual data

– Use of linguistic techniques for

processing documents (textual data). – Search strategies that allow the user to

exploit documents.

Page 4: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Why linguistic reference in information retrieval

• We think that looking of information is a presupposition of the speaker in a context at first of indexing of the document (during the creation) as if the designer pronounce a contextual sentence.

• The absence of words presupposed in the answer given by the server (search engines) can be interpreted as the lack at the truth of this sentence in this context of indexation.

Page 5: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Information Retrieval on the WEB Co-operative process for reformulation

WEB

Answers

A0, A1 , ...Ak , ..., An

user(s)

Queries

Q0, Q1 , ...Qk , ..., Qn

ex: Google,

Yahoo

Title, Abstract,

Page 6: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

• A Search Engine is used to retrieve electronic documents on the Web but they not use linguistic analysis. When a user types a query:

– the engine returns a ranked list of documents that contain exactly all the words of its request (query processing).

– looking for documents made by the user

requires writing a query in the form of few words, so that the search engine is to return the URL of classified documents containing those words exactly.

Linguistic problem of Search Engines :

Page 7: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

20

user-Queries system-Answersstepssteps:

Q0 A0

Q1 A1

A2

Ak+1

Q2

Qk

Qn-1 An-1

s0

s1

sk

sn

sn-1

An = nn is the filtered final answers with the

oriented needs where netn

Qn An

... ... ...

Page 8: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Documentary Information system

• Documentary automatization is blocked by the problem of indexing and interrogation

Use natural language as textual data for automatic indexing because language integrates the contextual and temporal factor through connectors, verbs, adverbs, etc..

• We use the LINGUISTICAL APPROACH

Page 9: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Queries : simple or complex PredicatesPredicates

• Questions : simple Predicates – /project/, /law/, /station/, /flag/, /black/,

/white/,..

• combining predicate :

– / white flag/, /white and black/, /.....

• Propagation of the predicates left and/or right– /...... project of law ...... / ;

– / ...... flag white and black/...... //;; ...

Page 10: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

15

LINGUISTIC INFORMATION OF VICINITY: Hierarchic informational

LINGUISTIC INFORMATION OF VICINITY: Hierarchic informational

NP or linguistical expression

predicate_right Predicate

left_predicate

Les produits informatiques de la société in French

computer products company in English

Page 11: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France
Page 12: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France
Page 13: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France
Page 14: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France
Page 15: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France
Page 16: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Reformation of questions

We propose to processing natural language using the contribution of reformulate the queries for computer-human dialogue.

The reformulation of questions in natural language using classical technique for information retrieval in databases, but the new systems of question/answers are a very open in the web content.

we present a study based on the formulation of ‘questions key’ based on ‘core predicates’ enrichment by left or right of center of terms.

Page 17: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

How the search engine gives responses ?

• For example, if you search 'informati*', it does not contain, computerization (informatisation in French, ….).

Page 18: 1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France

Lack of technical language in search engines__________________________________________The search tools and web database following are not

equipped with linguistic tools :

-Information retrieval in structured documents , -Multilingual Information retrieval,- Personalized Information Retrieval ,- Information automatic processing of natural

language ,- Search for information based on Ontology , -Multimedia Information Retrieval , -Answers/Questions, -etc.