ch1 intro to information retrieval-lina nemri

Upload: ihab-mortada

Post on 05-Apr-2018

229 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    1/23

    Information Retrieval

    Lebanese UniversityFaculty of Economics and Business

    Administration 1st Branch

    Class: M1

    Instructor: Dr. Lina A. Nimri

    1

  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    2/23

  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    3/23

    Introduction

    Modern Information Retrieval, Chapter 1

    Ricardo Baeza-Yates, Berthier Ribeiro-Neto

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    4/23

    Introduction Examples of information need in the context of the

    world wide web:

    Find all documents containing information oncomputer courses which:

    (1) are offered by universities in South England, and(2) are accredited by the BCS/IEE bodies,

    To be relevant, the document must include information on admissionrequirements, and e-mail and phone number for contact purpose.

    Find all docs containing information on college tennisteams which:

    (1) are maintained by a USA university and

    (2) participate in the NCAA tournament.

    Information Retrieval 4

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    5/23

    5

    Information Retrieval

    Retrieval System

    Query

    Set of retrieved documents

    Documents

    User Information Need

    Search Engine

    Useful or relevantinformation to the user

    Primary goal of an IR system

    Retrieve all the documents which are relevant to a user query,

    while retrieving as few non-relevant documents as possible.

    Representation, storage, organisation, and access to

    information items

    (Usually) keyword-based representation

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    6/23

    Data Retrieval

    Determine which documents contain thekeywords in the user query is not always enough

    to satisfy the user information need. Data Retrieval retrieves objects which satisfy

    clearly defined conditions, such as regularexpressions or relational algebra expressions.

    Data Retrieval system deals with data with well-defined structure and semantics

    6

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    7/23

    Information Retrieval System

    Retrieving information about a subject

    Deals with natural language text which is

    not well structured and could besemantically ambiguous

    It must interpret the contents of

    documents and rank them according tothe degree of relevance to the user need.

    7

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    8/23

    Area of interest

    Digital Libraries

    Information experts

    World Wide Web - Very difficult task The hyperspace is vast

    The absence of a well defined data model

    (format or representation form)

    8

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    9/23

    Effective retrieval

    The effective retrieval of relevantinformation is directly affected by: The user task

    The logical view of the document(documents representation) adopted by

    the retrieval system.

    9

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    10/23

  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    11/23

    Pulling

    The user can browse the documents when hismain objectives are not clear in the beginningand whose purpose might change during the

    interaction with the system. Combination of retrieval and browsing is not yet

    a well established approach.

    11

    Retrieval

    Browsing

    Database

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    12/23

    Documents

    Unit of retrieval A passage of free text

    composed of text, strings of characters

    from an alphabet composed of natural language

    newspaper article, a journal paper, adictionary definition, email messages

    size of documents arbitrary

    newspaper article vs. journal paper vs. email

    12

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    13/23

    What is a document?

    13

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    14/23

    Representation of documents Documents are represented thru a set of index terms or

    keywords or term descriptors extracted directly form text specified by human subjects (information science) metadata

    Most concise representation Poor quality of retrieval

    Full text representation Most complete representation High computational cost

    Large collections

    Reduce set of representative keywords Elimination of stop words Stemming Identification of noun phrases Further compression and indexing

    14

    Document term

    descriptors toaccess texts

    Generation ofdescriptors fortext

    By hand

    By analysing the text

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    15/23

    Logical View of the documents

    15

    structure

    Accents

    spacing stopwordsNoun

    groups stemmingManual

    indexingDocs

    structure Full text Index terms

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    16/23

    The retrieval functions

    16

    Information need

    Query

    Formulation

    Documents

    Document representation

    Indexing

    Retrieved documents

    Retrievalfunctions

    Relev

    ance

    feedb

    ack

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    17/23

    Queries

    Information Need: Simple queries

    composed of two or three, perhaps even

    dozens, of keywords e.g., as in web retrieval

    Boolean queries

    neural networks AND speech recognition Context Queries

    Proximity search, phrase queries

    17

    User termdescriptorscharacterisingthe user need

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    18/23

    Best-Match retrieval

    Compare the terms in a documentand query

    Compute similarity between eachdocument in the collection and thequery based on the terms that theyhave in common

    Sorting the documents in order ofdecreasing similarity with the query

    The outputs are a ranked list and

    displayed to the user - the top onesare more relevant as judged by thesystem

    18

    Document termdescriptors toaccess texts

    User termdescriptorscharacterisingthe user need

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    19/23

    Conceptual view of text

    retrieval system

    19

    QueriesDocuments

    Similarity

    Computation

    RetrievedDocuments

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    20/23

    Expanded view of text

    retrieval system

    20

    Queries DocumentsIndexing

    Indexed

    Documents

    Similarity

    Computation

    RetrievedDocuments

    RankedDocuments

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    21/23

    Process of retrieving info

    21

    User Interface

    Text Operations

    QueryOperations

    Indexing

    Similarity Computation(Searching)

    Ranking

    DocumentRepositoryManager

    Index

    Userneed

    Logical view Logical view

    Inverted

    file

    Query

    Retrieved docs

    Text

    TextUser feedback

    Ranked docs

    Textrepository

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    22/23

    Key Topics

    Indexing text documents

    Retrieving text documents

    Evaluation Query reformulations

    Search Engines=

    IR + Link Structure + Name Interpretation

    22

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/
  • 8/2/2019 Ch1 intro to information retrieval-Lina Nemri

    23/23

    Information Retrievalvs Information Extraction

    Information Retrieval Given a set of query terms and a set of document

    terms select only the most relevant documents[precision], and preferably all the relevant [recall].

    Information Extraction Extract from the text what the document means.

    IR systems can FIND documents but need notunderstand them

    23

    http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/http://www.sims.berkeley.edu/~hearst/irbook/