cs533-information retrieval systems · cs533-information retrieval systems mehmet ali aasoĞlu...
TRANSCRIPT
![Page 1: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/1.jpg)
CS533-Information Retrieval Systems
Mehmet Ali ABBASOĞLU
Gülsüm Ece BIÇAKCI
![Page 2: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/2.jpg)
2/16Bilkent University, CS533
Introduction
Conclusion
Empirical Results
Previous Work
Model Description
![Page 3: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/3.jpg)
Introduction
Document indexing & retrieval
Difficult
Reason: lack of indexing model
Bilkent University, CS533 3/16
![Page 4: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/4.jpg)
Introduction
Document Document New
Indexing Retrieval Language
Model
Non-Parametric
Bilkent University, CS533 4/16
![Page 5: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/5.jpg)
Introduction
• Model
Bilkent University, CS533 5/16
abstraction of the retrieval task itself
probabilistic sense, an explanatory model of the data
Predict the probability of the next word in an ordered sequence
![Page 6: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/6.jpg)
Introduction
• Model
Bilkent University, CS533 6/16
Estimate the probability of generating query
Rank the documents to these probabilities
Approach to Retrieval
![Page 7: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/7.jpg)
Introduction
• Term frequency
• Document frequency
• Document length statistics
integral part of the model,
but not used heuristically
Bilkent University, CS533 7/16
![Page 8: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/8.jpg)
Previous Work
2- Poisson
- a subset of terms occuring in a document
Robertson & Sparck Jones – Croft & Harper
- estimate the probability of the relevance of each document to the query
Bilkent University, CS533 8/16
![Page 9: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/9.jpg)
Previous Work
Fuhr
- integration of indexing and retrieval models
INQUERY inference by Turtle & Croft
- making inferences of concepts from features
Bilkent University, CS533 9/16
![Page 10: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/10.jpg)
Model Description
• Aim is to estimate p( Q | Md ), probability of the query given the language model of document d.
• They define maximum likelihood estimate of the probability of term t
Bilkent University, CS533 10/16
![Page 11: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/11.jpg)
Model Description
• For non-occuring terms instead of taking Pml=0, they take cft / cs
• For the situations that arbitrary sized sampled data, maximum likelihood estimator could be reasonably confident. To estimate larger amount of data:
Bilkent University, CS533 11/16
![Page 12: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/12.jpg)
Model Description
• They define risk as
• They define p(Q | Md) as
Bilkent University, CS533 12/16
![Page 13: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/13.jpg)
Results
• They performed recall / precision experiments on two data sets
– TREC topics 202-250 on TREC disks 2 and 3, TREC 4 ad-hoc task
– TREC topics 51-100 on TREC disk 3 using concept fields.
• Implementation is done using Labradorinformation retrieval engine.
Bilkent University, CS533 13/16
![Page 14: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/14.jpg)
Results
• On the eleven point recall/precision:
– The language modeling approach achieves better precision at all levels of recall.
– Most of the improvements are statistically significant according to Wilcoxon test.
Bilkent University, CS533 14/16
![Page 15: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/15.jpg)
Conclusion
• They presented a novel way of text retrieval based on probabilistic language modeling.
• Their language models are accurate representations of the data
• They claim that – Users of their methodology can understand their
approach to retrieval.– Users also will have some sense of term distribution.
• Their language modeling has significantly better recall/precision results than INQUERY.
Bilkent University, CS533 15/16
![Page 16: CS533-Information Retrieval Systems · CS533-Information Retrieval Systems Mehmet Ali AASOĞLU Gülsüm Ece IÇAKI](https://reader030.vdocuments.net/reader030/viewer/2022041113/5f20dd497d9b224e3f4e9ad1/html5/thumbnails/16.jpg)
Bilkent University, CS533 16/16