a graph-based technical paper recommender …mgunes/cs765/1-paperrecommender.pdfa graph-based...

A Graph-Based Technical Paper Recommender System

Using Fused Similarity Metrics - Related Work

By Janelle Blankenburg

Outline● Introduction● Related Work

○ Collaborative Filtering Methods○ Content-Based Filtering Methods

■ Citation-Based Methods● Context-Aware Methods

■ Keyword-Based Methods

● Conclusion● Project Idea

2

Introduction

3

Problem Description● Technical paper recommendation● General process

○ Identify keywords related to research interests○ Input keywords into online textual search service

■ Google Scholar, CiteSeer, arXiv

○ Modify keyword list and repeat

4

Motivation and Challenges● Simple keyword search is not sufficient

○ Highly technical terms○ Variations of phrasing of terms

● To alleviate repetitive process, create technical paper recommender systems that rely on more input than simple keywords

5

Motivation and Challenges Cont.● Sample Inputs:

○ Authorship details○ Access data○ One or multiple papers

● Systems use input data to generate a user model as basis for providing recommendations

6

Related Work

7

Collaborative Filtering (CF)● Provide recommendations to the user based on the

preferences of like-minded users

8

Content-Based Filtering (CBF)● Provide recommendations based on similarity of new items

to the user’s existing set of items

9

Citation-Based Methods

These methods use an existing citation graph to make recommendations

10

Finding Relevant Papers Based On Citation Relations

11Y. Liang, Q. Li, and T. Qian, “Finding relevant papers based on citation relations,” Web-age information management, pp. 403–414, 2011.

Finding Relevant Papers Based On Citation Relations● Local Relation Strength

○ !

● Global Relation Strength○

12Y. Liang, Q. Li, and T. Qian, “Finding relevant papers based on citation relations,” Web-age information management, pp. 403–414, 2011.

Research Paper Rec. Systems: A random-walk based approach● Citation Graph

○● Connectivity Matrix

○ !● Correlation Matrix

○

13

M. Gori and A. Pucci, “Research paper recommender systems: A random-walk based approach,” in Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, pp. 778–781, IEEE, 2006.

Research Paper Rec. Systems: A random-walk based approach● Classic PageRank

○ !● Biased PageRank

○● PaperRank

○

14

M. Gori and A. Pucci, “Research paper recommender systems: A random-walk based approach,” in Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on, pp. 778–781, IEEE, 2006.

Automatically Building Research Reading Lists

15

M. D. Ekstrand, P. Kannan, J. A. Stemper, J. T. Butler, J. A. Konstan, and J. T. Riedl, “Automatically building research reading lists,” in Proceedings of the fourth ACM conference on Recommender systems, pp. 159–166, ACM, 2010.

Automatically Building Research Reading Lists● Half-life Utility:

○

○

16

M. D. Ekstrand, P. Kannan, J. A. Stemper, J. T. Butler, J. A. Konstan, and J. T. Riedl, “Automatically building research reading lists,” in Proceedings of the fourth ACM conference on Recommender systems, pp. 159–166, ACM, 2010.

Context-Aware Methods

These methods use words around a given citation to recommend which paper to cite in that specified location

17

Utilizing Context in Generative Bayesian Models for Linked Corpus

18S. Kataria, P. Mitra, and S. Bhatia, “Utilizing context in generative bayesian models for linked corpus.,” in AAAI, vol. 10, p. 1, 2010.

A Supervised Learning Method for Context-Aware Cit. Rec. in a Large Corpus● ● 30 Features in 4 categories:

○ Popularity-Based○ Candidate Meta-Data○ Author-Based○ Meta-Data Similarity

19

L. Rokach, P. Mitra, S. Kataria, W. Huang, and L. Giles, “A supervised learning method for context-aware citation recommendation in a large corpus,” INVITED SPEAKER: Analyzing the Performance of Top-K Retrieval Algorithms, p. 1978, 1978.

A Supervised Learning Method for Context-Aware Cit. Rec. in a Large Corpus● Full ML-based Recommender System

○ Uses all 30 Features○ Combines LibSVM, Logistics Model Tree, LogitBoost, Artificial

Neural Networks using backpropagation, Random Forest, Rotation Forest, and AdaBoost■ Average estimated probabilities of each classifier

20

L. Rokach, P. Mitra, S. Kataria, W. Huang, and L. Giles, “A supervised learning method for context-aware citation recommendation in a large corpus,” INVITED SPEAKER: Analyzing the Performance of Top-K Retrieval Algorithms, p. 1978, 1978.

Keyword-Based Methods

These methods analyze various components of a paper, such as the title and abstract, in order to provide recommendations

21

PubMed: A Probabilistic Topic Based Model for Content Similarity● Uses terms from title and abstract● Fit Poisson distributions for each term in the document:

22 J. Lin and W. J. Wilbur, “Pubmed related articles: a probabilistic topic based model for content similarity,” BMC bioinformatics, vol. 8, no. 1, p. 423, 2007.

Who Should I Cite: Learning Lit. Search Models From Cit. Behavior● Retrieval model feature score: ● 18 Features in 6 categories:

○ Similar Terms

○ Cited By Others

○ Recency

○ Cited Using Similar Terms

○ Similar Topics

○ Social Habits

23

S. Bethard and D. Jurafsky, “Who should i cite: learning literature search models from citation behavior,” in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 609–618, ACM, 2010.

Who Should I Cite: Learning Lit. Search Models From Cit. Behavior● Apply Iterative Training Algorithm to build a retrieval model

● Types of Classifiers

○ Logistic classifier trained with quadratic prior

○ SVM-MAP classifier

● Results:

24

S. Bethard and D. Jurafsky, “Who should i cite: learning literature search models from citation behavior,” in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 609–618, ACM, 2010.

Recommending Academic Papers Via Users’ Reading Purpose● Main Components:

○ Abstract Splitting

○ Paper Similarity Model■ TF*IDF Model■ Topic Model ■ Concept-Based Topic

Model

○ Candidate Selection

○ Paper Ranking

25Y. Jiang, A. Jia, Y. Feng, and D. Zhao, “Recommending academic papers via users’ reading purposes,” in Proceedings of the sixth ACM conference on Recommender systems, pp. 241–244, ACM, 2012.97\

Conclusion

26

Conclusion● Content-Based Filtering Methods

○ Citation-Based Methods■ Use existing citation graph to make recommendations

○ Context-Aware Methods■ A subset of citation-based methods

■ Use words around a given citation to recommend papers to cite

○ Keyword-Based Methods■ Analyze various components of paper (title and abstract) to

make recommendations

27

Project Idea

28

Brief Description of Project● Content-Based Filtering Method

○ Title, abstract, and body of user input paper

● Create a graph of papers in dataset○ Edge exists if fused similarity metric between papers exceeds

a specified threshold

● Compare input paper to those in graph○ Rank via highest similarity based on fused metric

29

Questions?

30

a graph-based technical paper recommender …mgunes/cs765/1-paperrecommender.pdfa graph-based...

Documents