andisheh keikha ryerson university ebrahim bagheri ryerson university may 7 th 2014 1

Download Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th 2014 1

If you can't read please download the document

Upload: philomena-sims

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Andisheh Keikha Ryerson University Ebrahim Bagheri Ryerson University May 7 th 2014 1
  • Slide 2
  • 2 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions
  • Slide 3
  • 3 Simple search Query: keywords Find documents which have those keywords Rank them based on query Result: ranked documents
  • Slide 4
  • 4 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions
  • Slide 5
  • 5 Query length Correlated with performance in the search task Query is small collection of keywords Hard to find relevant documents only based on 2,3 words Solution Query reformulation Query expansion
  • Slide 6
  • 6 Query Expansion Selection of new terms Relevant documents WordNet (Synonym, hyponym, ) Disambiguation
  • Slide 7
  • 7 Query Expansion Selection of new terms Weighting those terms
  • Slide 8
  • 8 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions
  • Slide 9
  • 9 Probabilistic Methods What is the probability that this document is relevant to this query? The event that the document is judged as relevant to query The document description
  • Slide 10
  • 10 Language Models What is the probability of generating query Q, given document d, with language model M d. Maximum likelihood estimate of the probability Maximum likelihood estimate of the probability
  • Slide 11
  • 11 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions
  • Slide 12
  • 12
  • Slide 13
  • 13 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions
  • Slide 14
  • 14 Searching on google
  • Slide 15
  • 15 Searching on google I want all of these searches show the same results, since they have same meaning, and it is the intent of the user to know all of them, when searching for one.
  • Slide 16
  • 16 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions Query Expansion Query Expansion(Tasks to Decide) Document Ranking
  • Slide 17
  • 17 How? New Semantic Query Expansion Method New Semantic Document Ranking Method
  • Slide 18
  • 18 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions Query Expansion Query Expansion(Tasks to Decide) Document Ranking
  • Slide 19
  • 19 Example: Gain Weight Desirable keywords in expanded query: Gain, weight, muscle, mass, fat Gain weight Muscle Mass Fat What are these relations?
  • Slide 20
  • 20 Digging in dbpedia and wikipedia http://en.wikipedia.org/wiki/Weight_gain http://dbpedia.org/page/Muscle http://dbpedia.org/page/Adipose_tissue
  • Slide 21
  • 21 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions Query Expansion Query Expansion(Tasks to Decide) Document Ranking
  • Slide 22
  • 22 How to map query phrases into Wikipedia components? Which properties and their related entitles should be selected? Can those properties be selected automatically for each phrase? Or should it be fixed for the whole algorithm? If its automatic, what is the process?
  • Slide 23
  • 23 Is dbpedia and Wikipedia enough to decide, or should we use other ontologies? How should we weight the extracted entities (terms, senses) in order to select the expanded query among them.
  • Slide 24
  • 24 Search Process Query Processing Document Ranking Search Result Clustering and Diversification What is the Goal Contributions Query Expansion Query Expansion(Tasks to Decide) Document Ranking
  • Slide 25
  • 25 Are the documents annotated? Yes Rank documents using the extracted entitles from the query expansion phase. No Rank the documents based on the semantics of the expanded query other than the terms or phrases. Define probabilities over senses other than terms in the query and documents.
  • Slide 26
  • 26 Are the documents annotated? Yes Rank documents using the extracted entitles from the query expansion phase. No Rank the documents based on the semantics of the expanded query other than the terms or phrases. Define probabilities over senses other than terms in the query and documents. Documents are not annotated, so how?
  • Slide 27
  • 27 Semantic Similarity between two non-annotated documents ( the expanded query and the document) There are papers on using WordNet ontology, with topic specific PageRank algorithm, for similarity of two sentences (phrase or word). The application on information retrieval has not been seen yet.
  • Slide 28
  • 28 Semantic Similarity between two non-annotated documents ( the expanded query and the document) There are papers on using WordNet ontology, with topic specific PageRank algorithm, for similarity of two sentences (phrase or word). The application on information retrieval has not been seen yet. Find the aspects of different algorithms which are more beneficial in the information retrieval domain (two large documents)
  • Slide 29
  • 29 Semantic Similarity between two non-annotated documents ( the expanded query and the document) There are papers on using WordNet ontology, with topic specific PageRank algorithm, for similarity of two sentences (phrase or word). The application on information retrieval has not been seen yet. More reasonable is to apply the algorithm on dbpedia (instead of WordNet) in the entity domain (instead of sense domain)
  • Slide 30
  • 30 Applying a search result clustering and diversification, based on the different semantics of the query.
  • Slide 31
  • 31 1. B. Selvaretnam, M. B. (2011). Natural language technology and query expansion: issues, state-of-the-art and perspectives. Journal of Intelligent Information Systems, 38(3), 709-740. 2. C. Carpineto, G. R. (2012). A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing Surveys, 44(1), 1-50. 3. Hiemstra, Djoerd. "A linguistically motivated probabilistic model of information retrieval." In Research and advanced technology for digital libraries, pp. 569-584. Springer Berlin Heidelberg, 1998. 4. S. W. S. R. K. Sparck Jones, "A probabilistic model of information retrieval : development and comparative experiments Part 1," Information Processing & Management, vol. 36, no. 6, pp. 779-808, 2000. 5. Sparck Jones, Karen, Steve Walker, and Stephen E. Robertson. "A probabilistic model of information retrieval: development and comparative experiments: Part 2." Information Processing & Management 36.6 (2000): 809-840. 6. a. R. N. A. Di Marco, "Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction," Computational Linguistics, vol. 39, no. 3, pp. 709-754, 2013. 7. Di Marco, Antonio, and Roberto Navigli. "Clustering and diversifying web search results with graph-based word sense induction." Computational Linguistics 39, no. 3 (2013): 709-754. 8. Pilehvar, Mohammad Taher, David Jurgens, and Roberto Navigli. "Align, disambiguate and walk: A unified approach for measuring semantic similarity." InProceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013). 2013.