introduction to link prediction - eth z · introduction to link prediction . coss | | ! what is...

| | COSS

Machine Learning and Modelling for Social Networks Lloyd Sanders, Olivia Woolley, Iza Moize, Nino Antulov-Fantulin D-GESS: Computational Social Science

Introduction to Link Prediction

| | COSS

§  What is link prediction? §  What are some examples of link prediction? §  A How To

§  The setup §  Local Methods §  Global Methods

§  Other Methods §  Challenges of Link Prediction §  References

L Sanders 2

Overview

| | COSS

Abstractly: Given a snapshot of a network, can one predict the next most likely links to form in the network?

L Sanders 3

Statement of the Problem

http://ifisc.uib-csic.es/~jramasco/ComplexNets.html

Zachary Karate Club

| | COSS L Sanders 4

Link Prediction in Industry (Recommender Systems)

Proposing matches

Proposing friendships

Proposing items to purchase


Link Prediction in Science

Artist’s rendition of Human Metabolic network Wikipedia


Link Prediction in Science

Wikipedia

Investigating link connections in bio. networks is costly and time consuming.


Link Prediction

RESEARCH ARTICLE

Link Prediction in Criminal Networks: A Toolfor Criminal Intelligence AnalysisGiulia Berlusconi1, Francesco Calderoni1*, Nicola Parolini2, Marco Verani2,Carlo Piccardi3*

1UniversitàCattolica del Sacro Cuore and Transcrime, Milano, Italy, 2MOX, Department of Mathematics,Politecnico di Milano, Milano, Italy, 3Department of Electronics, Information and Bioengineering, Politecnicodi Milano, Milano, Italy

* [email protected] (FC); [email protected] (CP)

AbstractThe problem of link prediction has recently received increasing attention from scholars innetwork science. In social network analysis, one of its aims is to recover missing links,namely connections among actors which are likely to exist but have not been reportedbecause data are incomplete or subject to various types of uncertainty. In the field of crimi-nal investigations, problems of incomplete information are encountered almost by definition,given the obvious anti-detection strategies set up by criminals and the limited investigativeresources. In this paper, we work on a specific dataset obtained from a real investigation,and we propose a strategy to identify missing links in a criminal network on the basis of thetopological analysis of the links classified as marginal, i.e. removed during the investigationprocedure. The main assumption is that missing links should have opposite features withrespect to marginal ones. Measures of node similarity turn out to provide the best character-ization in this sense. The inspection of the judicial source documents confirms that the pre-dicted links, in most instances, do relate actors with large likelihood of co-participation inillicit activities.

IntroductionCriminal intelligence analysis aims at supporting investigations, e.g. by producing link chartsto identify and target key actors. Law enforcement agencies increasingly use Social NetworkAnalysis (SNA) for criminal intelligence, analyzing the relations among individuals based oninformation on activities, events, and places derived from various investigative activities [1–3].SNA provides added value compared to more traditional approaches like link analysis, byenabling in-depth assessment of the internal structure of criminal groups and by providingstrategic and tactical advantages. For instance, SNA can inform law enforcement officers in theidentification of aliases during large investigations and in the collection of evidence for prose-cution [2]. Furthermore, the network analysis of criminal groups under investigation may helpidentify effective strategies to achieve network destabilization or disruption [3, 4].

PLOSONE | DOI:10.1371/journal.pone.0154244 April 22, 2016 1 / 21

a11111

OPEN ACCESS

Citation: Berlusconi G, Calderoni F, Parolini N,Verani M, Piccardi C (2016) Link Prediction inCriminal Networks: A Tool for Criminal IntelligenceAnalysis. PLoS ONE 11(4): e0154244. doi:10.1371/journal.pone.0154244

Editor: Daniele Marinazzo, Universiteit Gent,BELGIUM

Received: November 3, 2015

Accepted: April 11, 2016

Published: April 22, 2016

Copyright: © 2016 Berlusconi et al. This is an openaccess article distributed under the terms of theCreative Commons Attribution License, which permitsunrestricted use, distribution, and reproduction in anymedium, provided the original author and source arecredited.

Data Availability Statement: All network data areavailable at Figshare (https://dx.doi.org/10.6084/m9.figshare.3156067).

Funding: The Polisocial Award program is the onlyfunding source and it supports the authors NP andMV. No other fund was available. The funders had norole in study design, data collection and analysis,decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declaredthat no competing interests exist.

Possible missing links predicted between suspects in organized

crime

| | COSS

Link prediction is used to predict future possible links in the network (E.g., Facebook). Or, it can be used to predict

missing links due to incomplete data (E.g., Food-webs – this is related to sampling that Olivia spoke of earlier).

L Sanders 8

Link Prediction

| | COSS

Abstractly: Given a snapshot of a network, can one predict the next most likely links to form in the network?

L Sanders 9

Statement of the Problem

Graph

This is a latex document containing equations for use to put into slides for

the link prediction lecture

G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)

The number of all possible links

|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P

Definition of an edge, two nodes

e = (x, y)

Similarity of two nodes is defined as

s

xy

1



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

1

Edge

Similarity Score



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

1

| | COSS

§  For a given graph §  Split the data into a training set, and a test set §  Choose a link prediction algorithm §  Run the algorithm on the training set, and test it on the

test set. §  Check the accuracy §  Compare other link prediction algorithms

L Sanders 10

How To: The Setup


How To: The Setup

Split Edges into a training and test (probe) set

Set of all possible edges on graph



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

1

| | COSS

The Link Prediction algorithm will spit out a list, ranked with edges which are most likely to appear at the top, descending.

L Sanders 12

How To: Output of algorithm

Taking the first n links from the list, and calculating the intersection with the probe set of length n, gives a simple measure of accuracy

| | COSS

Local methods predict links based solely on your local contact structure. The main idea is that of triangle closing

L Sanders 13

Similarity measures: Local

Next most likely link?

| | COSS

Local methods predict links based solely on your local contact structure. The main idea is that of triangle closing

L Sanders 14

Similarity measures: Local


How to: Similarity Measures

L. Lü, T. Zhou / Physica A 390 (2011) 1150–1170 1153

s45 = 0.6. To calculate AUC, we need to compare the scores of a probe link and a nonexistent link. There are six pairs in total:s13 > s12, s13 < s14, s13 = s34, s45 > s12, s45 = s14 and s45 > s34. Hence, the AUC value equals (3 ⇥ 1 + 2 ⇥ 0.5)/6 ⇡ 0.67.For precision, if L = 2, the predicted links are (1, 4) and (4, 5). Clearly, the former is wrong while the latter is right, and thusthe precision equals 0.5.

3. Similarity-based algorithms

The simplest framework of link prediction methods is the similarity-based algorithm, where each pair of nodes, x and y,is assigned a score sxy, which is directly defined as the similarity (or called proximity in the literature) between x and y. Allnon-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be ofhigher existence likelihoods. In despite of its simplicity, the study on similarity-based algorithms is themainstream issue. Infact, the definition of node similarity is a nontrivial challenge. Similarity index can be very simple or very complicated andit may work well for some networks while fails for some others. In addition, the similarities can be used in a more skilledway, such as being locally integrated under the collaborative filtering5 framework [34].

Node similarity can be defined by using the essential attributes of nodes: two nodes are considered to be similar ifthey have many common features [35]. However, the attributes of nodes are generally hidden, and thus we focus onanother group of similarity indices, called structural similarity, which is based solely on the network structure. The structuralsimilarity indices can be classified in various ways, such as local vs. global, parameter-free vs. parameter-dependent,node-dependent vs. path-dependent, and so on. The similarity indices can also be sophisticatedly classified as structuralequivalence and regular equivalence. The former embodies a latent assumption that the link itself indicated a similaritybetween two endpoints (see, for example, the Leicht–Holme–Newman index [36] and transferring similarity [37]), while thelatter assumes that two nodes are similar if their neighbors are similar. Readers are encouraged to see Ref. [38] for themathematical definition of regular equivalence and Ref. [39] for a recent application on the prediction of protein functions.

Here we adopt the simplest method, where 20 similarity indices are classified into three categories: the former 10 arelocal indices, followed by 7 global indices, and the last 3 are quasi-local indices, which do not require global topologicalinformation but make use of more information than local indices.

3.1. Local similarity indices

(1) Common Neighbors (CN). For a node x, let � (x) denote the set of neighbors of x. In common sense, two nodes, x and y,are more likely to have a link if they have many common neighbors. The simplest measure of this neighborhood overlap isthe directed count, namely

sCNxy = |� (x) \ � (y)|, (2)

where |Q | is the cardinality of the set Q . It is obvious that sxy = (A2)xy, where A is the adjacency matrix: Axy = 1 if xand y are directly connected and Axy = 0 otherwise. Note that, (A2)xy is also the number of different paths with length 2connecting x and y. Newman [40] used this quantity in the study of collaboration networks, showing a positive correlationbetween the number of common neighbors and the probability that two scientists will collaborate in the future. Kossinetsand Watts [14] analyzed a large-scale social network, suggesting that two students having many mutual friends are veryprobable to be friends in future. The following six indices are also based on the number of common neighbors, yet withdifferent normalization methods.

(2) Salton Index [6]. It is defined as

sSaltonxy = |� (x) \ � (y)|pkx ⇥ ky

, (3)

where kx is the degree of node x. The Salton index is also called the cosine similarity in the literature.(3) Jaccard Index [41]. This index was proposed by Jaccard over a hundred years ago, and is defined as

sJaccardxy = |� (x) \ � (y)||� (x) [ � (y)| . (4)

(4) Sørensen Index [42]. This index is used mainly for ecological community data, and is defined as

sSørensenxy = 2|� (x) \ � (y)|kx + ky

. (5)

5 Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration amongmultiple agents, viewpoints,data sources, etc. [33].

Common Neighbours

Counting number of paths of a certain length

Set of NN. Cardinality is the degree of the node


How to: Similarity Measures

1154 L. Lü, T. Zhou / Physica A 390 (2011) 1150–1170

(5) Hub Promoted Index (HPI) [43]. This index is proposed for quantifying the topological overlap of pairs of substrates inmetabolic networks, and is defined as

sHPIxy = |� (x) \ � (y)|min{kx, ky}

. (6)

Under thismeasurement, the links adjacent to hubs are likely to be assignedhigh scores since the denominator is determinedby the lower degree only.

(6) Hub Depressed Index (HDI). Analogously to the above index, we also consider a measurement with the opposite effecton hubs, defined as

sHDIxy = |� (x) \ � (y)|max{kx, ky}

. (7)

(7) Leicht–Holme–Newman Index (LHN1) [36]. This index assigns high similarity to node pairs that have many commonneighbors compared not to the possible maximum, but to the expected number of such neighbors. It is defined as

sLHN1xy = |� (x) \ � (y)|kx ⇥ ky

, (8)

where the denominator, kx ⇥ ky, is proportional to the expected number of common neighbors of nodes x and y in theconfiguration model [44]. We use the abbreviation LHN1 to distinguish this index to another index (named as LHN2 index)also proposed by Leicht, Holme and Newman.

(8) Preferential Attachment Index (PA). The mechanism of preferential attachment can be used to generate evolving scale-free networks,where the probability that a new link is connected to the node x is proportional to kx [45]. A similarmechanismcan also lead to scale-free networks without growth [46], where at each time step, an old link is removed and a new link isgenerated. The probability that this new link will connect x and y is proportional to kx ⇥ ky. Motivated by this mechanism,the corresponding similarity index can be defined as

sPAxy = kx ⇥ ky, (9)

which has beenwidely used to quantify the functional significance of links subject to various network-based dynamics, suchas percolation [47], synchronization [48] and transportation [49]. Note that, this index does not require the information ofthe neighborhood of each node, as a consequence, it has the least computational complexity.

(9) Adamic–Adar Index (AA) [50]. This index refines the simple counting of common neighbors by assigning the less-connected neighbors more weights, and is defined as

sAAxy =X

z2� (x)\� (y)

1log kz

. (10)

(10) Resource Allocation Index (RA) [51]. This index is motivated by the resource allocation dynamics on complexnetworks [52]. Consider a pair of nodes, x and y, which are not directly connected. The node x can send some resourceto y, with their common neighbors playing the role of transmitters. In the simplest case, we assume that each transmitterhas a unit of resource, and will equally distribute it to all its neighbors. The similarity between x and y can be defined as theamount of resource y received from x, which is

sRAxy =X

z2� (x)\� (y)

1kz

. (11)

Clearly, this measure is symmetric, namely sxy = syx. Note that, although resulting from different motivations, the AA indexand RA index have very similar form. Indeed, they both depress the contribution of the high-degree common neighbors. AAindex takes the form (log kz)�1 while RA index takes the form k�1

z . The difference is insignificant when the degree, kz , issmall, while it is considerable when kz is large. In other words, RA index punishes the high-degree common neighbors moreheavily than AA.

Liben-Nowell et al. [58] and Zhou et al. [51] systematically compared a number of local similarity indices on manyreal networks: the former [58] focuses on social collaboration networks and the latter [51] considers disparate networksincluding the protein–protein interaction network, electronic grid, Internet, US airport network, etc. According to extensiveexperimental results on real networks (see results in Table 1), the RA index performs best, while AA and CN indices have thesecond best overall performance among all the above-mentioned local indices.

The PA index has the worst overall performance, yet we are interested in it for it requires the least information. Notethat, PA performs even worst than pure chance for the Internet at router level and the power grid. In these two networks,the nodes have well-defined positions and the links are physical lines. Actually, geography plays a significant role and linkswith very long geographical distances are rare. As local centers, the high-degree nodes have longer geographical distancesto each other than average, and thus have a lower probability of directly connecting to each other, which leads to the bad

L. Lü, T. Zhou / Physica A 390 (2011) 1150–1170 1153

s45 = 0.6. To calculate AUC, we need to compare the scores of a probe link and a nonexistent link. There are six pairs in total:s13 > s12, s13 < s14, s13 = s34, s45 > s12, s45 = s14 and s45 > s34. Hence, the AUC value equals (3 ⇥ 1 + 2 ⇥ 0.5)/6 ⇡ 0.67.For precision, if L = 2, the predicted links are (1, 4) and (4, 5). Clearly, the former is wrong while the latter is right, and thusthe precision equals 0.5.

3. Similarity-based algorithms

The simplest framework of link prediction methods is the similarity-based algorithm, where each pair of nodes, x and y,is assigned a score sxy, which is directly defined as the similarity (or called proximity in the literature) between x and y. Allnon-observed links are ranked according to their scores, and the links connecting more similar nodes are supposed to be ofhigher existence likelihoods. In despite of its simplicity, the study on similarity-based algorithms is themainstream issue. Infact, the definition of node similarity is a nontrivial challenge. Similarity index can be very simple or very complicated andit may work well for some networks while fails for some others. In addition, the similarities can be used in a more skilledway, such as being locally integrated under the collaborative filtering5 framework [34].

Node similarity can be defined by using the essential attributes of nodes: two nodes are considered to be similar ifthey have many common features [35]. However, the attributes of nodes are generally hidden, and thus we focus onanother group of similarity indices, called structural similarity, which is based solely on the network structure. The structuralsimilarity indices can be classified in various ways, such as local vs. global, parameter-free vs. parameter-dependent,node-dependent vs. path-dependent, and so on. The similarity indices can also be sophisticatedly classified as structuralequivalence and regular equivalence. The former embodies a latent assumption that the link itself indicated a similaritybetween two endpoints (see, for example, the Leicht–Holme–Newman index [36] and transferring similarity [37]), while thelatter assumes that two nodes are similar if their neighbors are similar. Readers are encouraged to see Ref. [38] for themathematical definition of regular equivalence and Ref. [39] for a recent application on the prediction of protein functions.

Here we adopt the simplest method, where 20 similarity indices are classified into three categories: the former 10 arelocal indices, followed by 7 global indices, and the last 3 are quasi-local indices, which do not require global topologicalinformation but make use of more information than local indices.

3.1. Local similarity indices

(1) Common Neighbors (CN). For a node x, let � (x) denote the set of neighbors of x. In common sense, two nodes, x and y,are more likely to have a link if they have many common neighbors. The simplest measure of this neighborhood overlap isthe directed count, namely

sCNxy = |� (x) \ � (y)|, (2)

where |Q | is the cardinality of the set Q . It is obvious that sxy = (A2)xy, where A is the adjacency matrix: Axy = 1 if xand y are directly connected and Axy = 0 otherwise. Note that, (A2)xy is also the number of different paths with length 2connecting x and y. Newman [40] used this quantity in the study of collaboration networks, showing a positive correlationbetween the number of common neighbors and the probability that two scientists will collaborate in the future. Kossinetsand Watts [14] analyzed a large-scale social network, suggesting that two students having many mutual friends are veryprobable to be friends in future. The following six indices are also based on the number of common neighbors, yet withdifferent normalization methods.

(2) Salton Index [6]. It is defined as

sSaltonxy = |� (x) \ � (y)|pkx ⇥ ky

, (3)

where kx is the degree of node x. The Salton index is also called the cosine similarity in the literature.(3) Jaccard Index [41]. This index was proposed by Jaccard over a hundred years ago, and is defined as

sJaccardxy = |� (x) \ � (y)||� (x) [ � (y)| . (4)

(4) Sørensen Index [42]. This index is used mainly for ecological community data, and is defined as

sSørensenxy = 2|� (x) \ � (y)|kx + ky

. (5)

5 Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration amongmultiple agents, viewpoints,data sources, etc. [33].

Jaccard Index

Resource Allocation

Many more similarity measures available

Intuition: If we share neighbors who have a low degree, we are likely to be connected. Being connected through a hub is ‘meaningless’

| | COSS

§  Given the adjacency matrix, one can take the global structure of the graph into account when making predictions.

§  These are often based on number of shortest path measures.

§  Common neighbors is simply the Adj. matrix squared.

L Sanders 17

Similarity measures: Global



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

E

P ✓ U � E

T

List outputted by algorithm of links ranked most likely to least likely to form

on the graph

L : e

L

2 U � E

T

Common neighbors as a global method.

s

CN

xy

= (A

2)

xy

s

xy

= exp(↵A)|xy

=

1X

i=0

↵

i

i!

A

i|xy

1

Try it for yourself to verify it!

| | COSS

§  What about paths of greater length? How can we include those? We can weight longer paths less.

L Sanders 18

Global Methods



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

E

P ✓ U � E

T


on the graph

L : e

L

2 U � E

T


s

CN

xy

= (A

2)

xy

s

xy

= exp(↵A)|xy

=

1X

i=0

↵

i

i!

A

i|xy

1



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

E

P ✓ U � E

T


on the graph

L : e

L

2 U � E

T


s

CN

xy

= (A

2

)

xy

s

xy

= exp(↵A)|xy

=

1X

i=0

↵

i

i!

A

i|xy

exp(↵A) = 1 + ↵A+

(↵

2

A

2

)

2!

+

(↵

3

A

3

)

3!

+ · · ·

Global method: Splitting the data further

E

T

= E

T

0[ E

P

0

Suppose we want to carry out the following mapping

F (AT

0) w AP

0

min||F (AT

0)�AP

0||Frobenius

1


Link Prediction on Bipartite Graphs

Wikipedia


Link Prediction on Bipartite Graphs

Fig. 2. In this curve fitting plot of the Slovak Wikipedia, the hyperbolic sine is a goodmatch, indicating that the hyperbolic sine pseudokernel performs well.

where coefficients are decreasing with path length. Keeping only the odd com-ponent, we arrive at the matrix hyperbolic sine [16].

sinh(αA) =∞!

i=0

α1+2i

(1 + 2i)!A1+2i

Figure 2 shows the hyperbolic sine applied to the (positive) spectrum of thebipartite Slovak Wikipedia user–article edit network.

The Odd von Neumann Pseudokernel The von Neumann kernel for uni-partite graphs is given by the following expression [13].

KNEU(A) = (I − αA)−1 =∞!

i=0

αiAi

We call its odd component the odd von Neumann pseudokernel:

KoddNEU(A) = αA(I − α2A2)−1 =

∞!

i=0

α1+2iA1+2i

The hyperbolic sine and von Neumann pseudokernels are compared in Fig-ure 3, based on the path weights they produce.

Consider an odd function for the graph kernel



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

E

P ✓ U � E

T


on the graph

L : e

L

2 U � E

T


s

CN

xy

= (A

2

)

xy

s

xy

= exp(↵A)|xy

=

1X

i=0

↵

i

i!

A

i|xy


E

T

= E

T

0[ E

P

0


F (AT

0) w AP

0

min||F (AT

0)�AP

0||Frobenius

min||UF (⇤)U

tr �AP

0||Frobenius

min||F (⇤)� U

trAP

0U ||

Frobenius

Eigen-value decomposition

A = U⇤U

tr

A

3

1

Or something more sophisticated



G = (V,E)

G

t

= (V

t

, E

t

)

G

t

0= (V

t

, E

t

0)


|U | = |V |(|V |� 1)

2

Divide the edges up

E = E

T [ E

P


e = (x, y)


s

xy

E

P ✓ U � E

T


on the graph

L : e

L

2 U � E

T


s

CN

xy

= (A

2

)

xy

s

xy

= exp(↵A)|xy

=

1X

i=0

↵

i

i!

A

i|xy

exp(↵A) = 1 + ↵A+

(↵

2

A

2

)

2!

+

(↵

3

A

3

)

3!

+ · · ·

sinh(↵A) = ↵A+

(↵A)

3

3!

+

(↵A)

5

5!

+ · · ·


E

T

= E

T

0[ E

P

0


F (AT

0) w AP

0

1

| | COSS

Your choice of link prediction algorithm makes an implicit assumption of the graph kernel – the mechanism for how the graph grows. Different graph types (social, academic, user-item) grow under different mechanisms. Therefore different link prediction algorithms will work better on other graphs.

L Sanders 21

Key Tip


Objective Function

Taking the Frobenius Norm

| | COSS Lloyd Sanders 23

Matrix Decomposition

In this section, we describe how algebraic link prediction methods applyto bipartite networks. Let G = (V,E) be a (not necessarily bipartite) graph.Algebraic link prediction algorithms are based on the eigenvalue decompositionof its adjacency matrix A:

A = UΛUT

To predict links, a spectral transformation is usually applied:

F (A) = UF (Λ)UT

where F (Λ) applies a real function f(λ) to each eigenvalue λi. F (A) then containslink prediction scores that, for each node, give a ranking of all other nodes, whichis then used for link prediction. If f(λi) is positive, F is a graph kernel, otherwise,we will call F a pseudokernel.

Several spectral transformations can be written as polynomials of the adja-cency matrix in the following way. The matrix power Ai gives, for each vertexpair (u, v), the number of paths of length i between u and v. Therefore, a polyno-mial of A gives, for a pair (u, v), the sum of all paths between u and v, weightedby the polynomial coefficients. This fact can be exploited to find link predictionfunctions that fulfill the two following requirements:

– The link prediction score should be higher when two nodes are connected bymany paths.

– The link prediction score should be higher when paths are short.

These requirements suggest the use of polynomials f with decreasing coefficients.

3.1 Odd Pseudokernels

In bipartite networks, only paths of odd length are significant, since an edgecan only appear between two vertices if they are already connected by paths ofodd lengths. Therefore, only odd powers are relevant, and we can restrict thespectral transformation to odd polynomials, i.e. polynomials with odd powers.

The resulting spectral transformation is then an odd function and exceptin the trivial and undesired case of a constant zero function, will be negativeat some point. Therefore, all spectral transformations described below are onlypseudokernels and not kernels.

The Hyperbolic Sine In unipartite networks, a basic link prediction functionis given by the matrix exponential of the adjacency matrix [13–15]. The matrixexponential can be derived by considering the sum

exp(αA) =∞!

i=0

αi

i!Ai

In this section, we describe how algebraic link prediction methods applyto bipartite networks. Let G = (V,E) be a (not necessarily bipartite) graph.Algebraic link prediction algorithms are based on the eigenvalue decompositionof its adjacency matrix A:

A = UΛUT

To predict links, a spectral transformation is usually applied:

F (A) = UF (Λ)UT

where F (Λ) applies a real function f(λ) to each eigenvalue λi. F (A) then containslink prediction scores that, for each node, give a ranking of all other nodes, whichis then used for link prediction. If f(λi) is positive, F is a graph kernel, otherwise,we will call F a pseudokernel.

Several spectral transformations can be written as polynomials of the adja-cency matrix in the following way. The matrix power Ai gives, for each vertexpair (u, v), the number of paths of length i between u and v. Therefore, a polyno-mial of A gives, for a pair (u, v), the sum of all paths between u and v, weightedby the polynomial coefficients. This fact can be exploited to find link predictionfunctions that fulfill the two following requirements:

– The link prediction score should be higher when two nodes are connected bymany paths.

– The link prediction score should be higher when paths are short.

These requirements suggest the use of polynomials f with decreasing coefficients.

3.1 Odd Pseudokernels

In bipartite networks, only paths of odd length are significant, since an edgecan only appear between two vertices if they are already connected by paths ofodd lengths. Therefore, only odd powers are relevant, and we can restrict thespectral transformation to odd polynomials, i.e. polynomials with odd powers.

The resulting spectral transformation is then an odd function and exceptin the trivial and undesired case of a constant zero function, will be negativeat some point. Therefore, all spectral transformations described below are onlypseudokernels and not kernels.

The Hyperbolic Sine In unipartite networks, a basic link prediction functionis given by the matrix exponential of the adjacency matrix [13–15]. The matrixexponential can be derived by considering the sum

exp(αA) =∞!

i=0

αi

i!Ai

(Spectral) Matrix Decomposition/Eigendecomposition

Adjacency Matrices are real, square, and symmetric. We can break them down into Eigenvectors and eigenvalues. The set of eigenvalues is the spectrum of the graph, hence the namesake.

Easy computation of functions on the Adj. matrix


Decomposition


Link Prediction to Regression

Taking the F-norm is analogous to the RMSE between the two matrices. Function F acting upon Lambda only effects the diagonal elements. Therefore trying to minimize the F-norm one can only effect the diagonals of the source matrix to be equivalent (or close) to the right term. Hence the problem is reduced to a curve fitting problem, stated above.


Choosing your Graph Kernel

Kunegis et al. 2013


Predictions

| | COSS

§  For various kernels it is important to check there generalized accuracy, to see which is an appropriate descriptor of the link generation process

§  To measure the notion “accuracy”, we use a measurement for ordered lists: Mean Average Precision

§  This should be your ‘go to’ when assessing the accuracy of ranked lists.

L Sanders 28

Mean Averaged Precision


Average Precision

Average Precision is method to evaluate the precision and ranking of a predicted list of retrieved objects


Average Precision Example

Relevant links

Ranked list

1.0 0.5 0.67 0.5 0.4 Precision

AP = ½ ( 1.0 + 0.67 ) = 0.84

Relevant links

Ranked list

0 0 0.33 0.5 0.4 Precision

AP = 1/3 ( 0.33 + 0.5 ) = 0.28

MAP = (0.28 + 0.84)/2 = 0.68


Comparison of Algorithms

| | COSS

§  Random Walks on graphs – Average Commute Time §  Maximum Likelihood methods – Constructing graphs §  Spectral Extrapolation methods – Continuing Eigenvalue

trends §  And more… [Lu & Zhou, 2011]

L Sanders 32

Other Methods for Link Prediction

| | COSS

§  Cold Start Problem §  Temporal Networks

§  Links can be created and destroyed over time §  New nodes come in and out of the system

§  Link prediction with links of a different nature – e.g., ‘negative’ links.

L Sanders 33

Challenges for Link Prediction

| | COSS

§  Link prediction in complex networks: A survey; Lü and Zhou, Physica A; 2011 §  The Link Prediction Problem for Social Networks; Liben-Nowell & Kleinberg §  Spectral evolution in dynamic networks; Kunegis et al.; Knowl. Inf. Syst.; 2013 §  Online Dating Recommender Systems: The split-complex number approach; Kunegis et al.; ACM

2012 §  Mean Average Precision

§  https://en.wikipedia.org/wiki/Precision_and_recall §  https://en.wikipedia.org/wiki/Information_retrieval §  Victor Lavrenko: youtube.com/watch?v=pM6DJ0ZZee0 §  https://web-beta.archive.org/web/20110504130953/http://sas.uwaterloo.ca/stats_navigation/

techreports/04WorkingPapers/2004-09.pdf

L Sanders 34

References

| | COSS

Our algorithms consider only the edges that connect the same set of nodes in the training and test set – usually taken

as the nodes of the giant component of the graph

L Sanders 35

How To: The Setup – quick note

Networkx.github.io

This assumes the graph is static – no new nodes enter the system

By definition: The algorithm will not predict edges for nodes not within the giant component More about this later

| | COSS

§  Why do you take the average of the precisions at the relevant documents, and not at all places. §  My intuition is about the ranking. Relevant docs/links first. So if you

take the first example we have and sum over all docs/links retrieved, you get a lower MAP. But suppose you only have 2 relevant docs in a list of five. You rank them first: therefore your AP is 1.Which is what you would want. But if you divide by 5 instead of 2, you get 0.2 accuracy. Which is not in line with what we would want in terms of an accuracy metric for a ranked list.

§  But, giving the precision at a given rank has the effect of weighting the order of the correct guesses. Think: 1 relative doc @ pos 1: AP = 1. @ pos 3: AP = 0.33.

L Sanders 36

MAP/AP FAQs

| | COSS

§  What if the denominator is zero (e.g., no relevant docs/links)? §  One simply sets the AP to zero. One could think of this as applying

a penalty to the algorithm for retrieving/predicting docs/links when none should be.

L Sanders 37

MAP/AP FAQs


The Cold Start Problem – new nodes

?


Algebraic Methods – How to

Split data appropriately, giving another split, to allow for learning certain parameters of graph

kernels

Kunegis et al. 2013

introduction to link prediction - eth z · introduction to link prediction . coss | | ! what is...

Documents