temporal link prediction in knowledge networks
TRANSCRIPT
Temporal Link Prediction in Knowledge Networks
Julia Perl, Jrme Kunegis
Wikipedia Knowledge Network
Knowledge Network consists of articles which are interlinked
Nodes = Wikipedia articles
Links = Links between Wikipedia articles
Wikipedia Knowledge Network
Appropriate links provide instant pathways to locations within and outside the project that are likely to increase readers' understanding of the topic at hand. When writing or editing an article, it is important to consider not only what to put in the article, but what links to include to help the reader find related information[...]An article is said to be underlinked if words are not linked that are needed to aid understanding of the article.[...]An overlinked article contains an excessive number of links, making it difficult to identify links likely to aid the reader's understanding significantly.[http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking accessed last on Dec. 14, 2013]
Research Questions
Our Research Questions
How to predict new interlinks between articles to avoid underlinking? (Link Prediction) How to predict interlinks between articles that should be removed to avoid overlinking or wrong links?(Unlink Prediction)
HypothesisStructural changes can be predicted from the network structure.
Link and UnlinkPrediction
Link and Unlink Prediction
Additions
Removals
Training
Link Prediction Problem
The Snapshot View
Unlink Prediction Problem
Link and Unlink Prediction
Unlink prediction is more difficult than link prediction
The snapshot view does not provide information on links that have been removed.
The Snapshot View
Temporal Link and Unlink Prediction
Prediction Models
Model 0: Baseline ModelSnapshot Model: measures computed from adjacency matrix
Model 1: Add-Remove Model Classic adjacency matrix and removal adjacency matrix
Model 2: Temporal Add-Remove ModelTemporal Values in adjacency and removal adjacency matrix
Model 3: Temporal Preferential Attachment & Preferential Detachment Estimate growth and decay for each node based on temporal evolution
Temporal data
Snapshot View
Hypothesis: Usage of temporal information improves the classification of links and unlinks significantly.
Model 0: Baseline Model
Snapshot View: all measures are computed from the adjacency
matrix
Training
Adjacency matrix A,
Compute characteristics from A
d(i) Degree of article i
CN(i,j) Number of common neighbors of articles i and j
P3(i,j): Number of paths of length 3 between articles i and j
Model 1: Add-Remove Model
Classic adjacency matrix and removal adjacency matrix
Adjacency matrix A
Removal adjacency matrix A
+
Compute characteristics from A
d(i) Remove-degree of article i
dRatio(i) Ratio of deletes and adds
CN(i,j)
P3(i,j)
Model 2: Temporal Add-Remove Model
Temporal Values in adjacency and removal adjacency matrix
Difference between two articles that have recently connected with the same other articles or long ago.
More recent common neighbors higher likelihood for link
Functions , decreasing in time
Seconds
Years
Model 3: Temporal PA & PD
Estimate growth and decay for each node based on temporal
evolution
Preferential Attachment (PA): number of new links proportional to node degree
Disregards temporal evolution
Based on temporal evolution estimate number of
new links (Link Prediction)
removed links (Unlink Prediction)
Set Up
Set Up
Five large Wikipedia datasets
Our datasets comprise several year (up to ten) of data.
Compute AUC-value of features for Link and Unlink Prediction
Ready for your feedback and other interesting datasets :)
Set Up
Julia [email protected] Workshop 12/16/13
of
Julia [email protected] 05/14/2013
of