context based citation recommendation

CONTEXT BASED CITATION RECOMMENDATION

Which work you are referring ? Know similar works …

Guided By : Dr. Animesh Mukherjee & Dr. Pawan Goyal

Citation and Citation Context

• Citations – crucial for assignments of academic credit. Helps support claims in one’s own work.

• Citation Context (c) – sequence of words that appear around a particular citation.

• Citation context contains words that describe or summarize the cited papers.

• Semantics of cited documents should be close to citation context.[1]

• Citation Recommendation Engine – helps checking completeness of citations when authoring a paper and find prior work related to topic under investigation and to find missing relevant citations.[1]

[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf

http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf



Citation and Citation Context


[1]




Motivation and Approach

• Motivation : Since all the words in the citation context are used to describe the same citation, the semantics of these words should be similar.

• This motivates to learn the semantic embedding for words in citation contexts and cited documents and to recommend citations based on semantic distance.

• Proposed to learn distributed semantic representations of words and documents.

• Using the representations, we train a Neural Network Model that estimate probability of citing a paper given a citation context.[1]

• Neural Network Model will tune distributed representations of words and documents so that semantic similarity between citation context and cited paper will be high.[1]





Related work : Global Citation Recommendation

[1] McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf

• It suggest a list of references for an entire given manuscript.

McNee et al. (2002) used a partial list of reference as the input query and recommended additional references based on collaborative filtering.[1]

Strohman et al. (2007) assumed that the input is an incomplete paper manuscript and recommended papers with high text similarity using bibliography similarity.

Bethard and Jurafsky (2010) introduced features such as topical similarity, author behavioral patterns, citation count, and recency of publication to train a classifier for global recommendation.

Related work : Local Citation Recommendation


• Input query is a particular context where a citation should be made and output is a short list of papers that need to be cited given context.

He et al. (2010) assumed that user has provided placeholders for citation in a query manuscript. A probabilistic model was trained to measure the relevance between the citation contexts and the cited documents.[1]

In another work (He et al. 2011), they first segmented the document into a sequence of disjointed candidate citation contexts, and then applied the dependency feature model to retrieve a ranked list of papers for each context.[1]

IBM translation model-1 was trained to learn the probability of citing a document given a word. Lu et al. (2011) used the translation model to learn the translation probabilities between words in the citation context and the cited document.[1]




Citation Recommendation Model : Problem Definition


• We propose to model the citation context given the cited paper : p(c|d)

• Given a dataset of training samples with |C| pairs of citation context and cited document , the objective is to maximize the log-likelihood:[1]

• By assuming that words in the citation contexts are mutually conditional independent :

• Objective function can be now written as :




[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf[2] http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

Neural Probabilistic Model

• In a neural probabilistic model, the conditional probability p(w | d) can be defined using a soft max function:[1,2]

where |V| is the size of the vocabulary that consists of all words appearing in the citation contexts, and is a neural network based scoring function with parameter.

The scoring function is defined as:

where, is a logistic function that rescales the inner product of the word representation and the document representation to




http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf


[1] http://www.cse.psu.edu/~zzw109/pubs/AAAI2015-NeuralProbabilisticModelCitationRecommendation.pdf[2] http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf[3] http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

Word Representation Learning• Negative sampling is used to learn the distributed representations of words using

the surrounding words.[1]

• Given a pair of citation context c and cited document d, the skip-gram model will go through all the words in the citation context using a sliding window.[1]

• For each word that appears in citation context c, words that appear within M words before/after are treated as positive samples. Suppose is one of the positive sample for .The training objective is defined as:

where is a negative sample randomly generated by noise distribution (.), and k is the number of negative samples.[1,2,3]






http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/



Document Representation Learning

• Given a pair of citation context c and cited document d, we assume that each word w that appears in the context c and the cited document d meets the real data distribution [1, 2]

• Any other random words are noise data generated from noise distribution

• Assumption : the size of noise data is k times of the real data size.

• the posterior probabilities of a pair of word w and document d come from real/noise data distribution are:





Document Representation Learning (Cont …)


• Since we want to fit the neural probabilistic model to the real distribution , we can rewrite :

• Given each word that appears in the citation context, and the cited document ,we compute its contribution to the log-likelihood along with k randomly generated noise words ……. using the following objective function:

• The training objective is to learn both the word representations and document representations that maximize the equation.[1]





• Noise-contrastive estimation treats the normalization constraint of to be a constant Z(s) so we can rewrite equation [1, 2]

as,

• The parameters of the neural network are and Z(w), where is the projection function that maps words and documents in to the n-dimensional space.

• While learning we will tune both document representation matrix and word representation matrix to optimize the probability p().

Document Representation Learning (Cont …)






[1]




Citation Recommendation


• Using the fine-tuned word and document representations, we can get the normalized probability distribution p(w|d).

• The table of p(d|w) is pre-calculated using Bayes’ rule and stored as an inverted index.

• Given a query q = [, ….. ], the task is to recommend a list of document R = [, …. ] that need to be cited. [1]

• We use term – frequency – inverse – context – frequency (TF-ICF) to measure p(). [1]




Experiment (Data and Metrics)


• Snapshot of CiteSeer paper and citation database – Oct. 2013

• Dataset splitted into two parts – paper crawled before 2011(training data) and paper crawled after 2011(testing data).

• Citations are extracted along with their citation context.(One line before and after the sentence where citation occur).

• No of citation context and citation pairs : |C| = 8,992,476 (training set) and 1,628,698 (testing set). [1]

• No of recommendation is limited to 10 for each query. Dimension of representational vectors set to n = 600 and no of negative samples k = 10.[1]

• Metrics : MAP, Recall, MRR, nDCG.




Baseline Comparison of Neural Probabilistic Model (NPM)


[1]





Baseline Comparison of Neural Probabilistic Model (NPM)

[1]




Conclusion and Future Work

• Used the distributed representations of words and documents to build a neural network based citation recommendation model.

• The proposed model then learns the word and document representations to calculate the probability of citing a document given a citation context.

• A comparative study on a snapshot of CiteSeer dataset with existing state-of-the-art methods showed that the proposed model significantly improves the quality of context-based citation recommendation.

• Since only neural probabilistic models were considered for context-based local recommendations, one could explore for a combined model.

References

1. Wenyi Huangy, ZhaohuiWuz, Chen Liangy, Prasenjit Mitrayz, C. Lee Giles - A Neural Probabilistic Model for Context Based Recommendation - http://www.cse.psu.edu/~zzw109/pubs/AAAI2015NeuralProbabilisticModelCitationRecommendation.pdf

2. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean - Distributed Representations ofWords and Phrases and their Compositionality - http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

3. McNee et al. (2002) : www-users.cs.umn.edu/~mcnee/konstan-nectar-2006.pdf

4. Deep Learning, NLP, and Representations – (Posted : July 7, 2014) http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

5. Standard IR measures : https://en.wikipedia.org/wiki/Information_retrieval#Performance_and_correctness_measures

http://www.cse.psu.edu/~zzw109/pubs/AAAI2015NeuralProbabilisticModelCitationRecommendation.pdf

http://www.cse.psu.edu/~zzw109/pubs/AAAI2015NeuralProbabilisticModelCitationRecommendation.pdf






https://en.wikipedia.org/wiki/Information_retrieval#Performance_and_correctness_measures

https://en.wikipedia.org/wiki/Information_retrieval#Performance_and_correctness_measures

Any Question ?

Thank You !!

context based citation recommendation

Documents