marianna bolla institute of mathematics budapest ...math.bme.hu/~marib/prezentacio/bollabecs.pdf ·...

63
Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables Spectral Clustering and Biclustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics [email protected] Vienna, April 16, 2012

Upload: others

Post on 19-Jan-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Spectral Clustering and Biclustering

Marianna BollaInstitute of Mathematics

Budapest University of Technology and [email protected]

Vienna, April 16, 2012

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Motivation

To recover the structure of large edge-weighted graphs,for example: biological, social, economic, or communicationnetworks.

To find a clustering (partition) of the vertices such that theinduced subgraphs on them and the bipartite subgraphsbetween any pair of them exhibit regular behavior ofinformation flow within or between the vertex subsets.

To find biclustering of a contingency table (e.g., microarray)such that clusters of equally functioning genes equallyinfluence conditions of the same cluster.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Motivation

To recover the structure of large edge-weighted graphs,for example: biological, social, economic, or communicationnetworks.

To find a clustering (partition) of the vertices such that theinduced subgraphs on them and the bipartite subgraphsbetween any pair of them exhibit regular behavior ofinformation flow within or between the vertex subsets.

To find biclustering of a contingency table (e.g., microarray)such that clusters of equally functioning genes equallyinfluence conditions of the same cluster.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Motivation

To recover the structure of large edge-weighted graphs,for example: biological, social, economic, or communicationnetworks.

To find a clustering (partition) of the vertices such that theinduced subgraphs on them and the bipartite subgraphsbetween any pair of them exhibit regular behavior ofinformation flow within or between the vertex subsets.

To find biclustering of a contingency table (e.g., microarray)such that clusters of equally functioning genes equallyinfluence conditions of the same cluster.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Spectral clustering of edge-wighted graphs

G = (V ,W) edge-weighted graph, |V | = n, W: weight matrix ofedgeswij = wji ≥ 0 (i 6= j) and wii = 0 (i=1,. . . ,n).

di :=∑n

j=1 wij (i = 1, . . . , n) generalized degrees

d := (d1, . . . , dn)T : degree vector,√

d := (√

d1, . . . ,√

dn)T

D := diag (d1, . . . , dn): degree matrix

w.l.g.∑n

i=1

∑nj=1 wij = 1 will be supposed

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Laplacian and modularity matrices

L = D−W: LaplacianLD = I−D−1/2WD−1/2: normalized LaplacianSpec (LD) ∈ [0, 2]If G is connected (W is irreducible), then 0 is a single eigenvaluewith corresponding unit-norm eigenvector

√d.

MD = D−1/2WD−1/2 −√

d√

dT

: normalized modularity matrixB, Phys. Rev. E (2011) Spec (MD) ∈ [−1, 1]1 cannot be an eigenvalue if G is connected, and 0 is always aneigenvalue with eigenvector

√d.

The spectral gap of G : 1− ‖MD‖ (spectral norm)

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Quadratic placement problems

Fact: the spectral decomposition of either LD or MD solves thefollowing quadratic placement problem.for a given positive integer k (1 < k < n), minimize

Qk(X) =∑i<j

wij‖ri − rj‖2

on the conditions

n∑i=1

di ri rTi = Ik−1,

n∑i=1

di ri = 0,

where the vectors r1, . . . , rn are (k − 1)-dimensional representativesof the vertices, which form the row vectors of the n × (k − 1)matrix X.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Normalized Laplacian eigenvalues

G is connected, 0 = λ0 < λ1 ≤ · · · ≤ λn−1 ≤ 2eigenvalues of LD with corresponding unit-norm, pairwiseorthogonal eigenvectors u0 =

√d,u1, . . . ,un−1.

In B, Tusnady, Discrete Math. (1994): the minimum of Qk(X)under the constraints for the representatives is

k−1∑i=1

λi

and is attained by the following representation:r∗1, . . . , r

∗n are row vectors of the matrix

X∗ = (D−1/2u1, . . . ,D−1/2uk−1).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Explanation

Instead of X the augmented n × k matrix X can as well be used,which is obtained from X by inserting the column x0 = 1 of all 1’s.In fact, x0 = D−1/2u0=1, where u0 =

√d is the eigenvector

belonging to the eigenvalue 0 of LD . Then

Qk(X) = Qk(X) = tr (D1/2X)T (In −D−1/2WD−1/2)(D1/2X),

and Qk(X) is minimized on the constraint XT DX = Ik ,or equivalently,D1/2X is suborthogonal.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Continuous relaxation of a discrete minimization

This problem is the continuous relaxation of minimizing

Qk(X(Pk)) = tr (D1/2X(Pk))T (In −D−1/2WD−1/2)(D1/2X(Pk))

over the set of k-partitions Pk = (V1, . . . ,Vk) of the vertices suchthat Pk is planted into X in the way that the columns of X(Pk) areso-called partition-vectors belonging to Pk :the coordinates of the ith column are zeros, except those indexingvertices of Vi which are equal to

1√Vol (Vi )

, i = 1, . . . , k.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Representation, spectral relaxation

Qk(X(Pk)) is the normalized cut of Pk = (V1, . . . ,Vk):

Qk(X(Pk)) =k−1∑a=1

k∑b=a+1

(1

Vol (Va)+

1

Vol (Vb)

)w(Va,Vb)

=k∑

a=1

w(Va, Va)

Vol (Va)= k −

k∑a=1

w(Va,Va)

Vol (Va)

Minimum k-way normalized cut of G = (V ,W):

fk(G ) = minPk∈Pk

Qk(X(Pk)),

where Vol (U) =∑

i∈U di : volume of U ⊂ Vw(X ,Y ) =

∑i∈X

∑j∈Y wij : weighted cut between X ,Y ⊂ V

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Estimation, references

Because of the spectral relaxation:

fk(G ) ≥k−1∑i=0

λi =k−1∑i=1

λi

B, Tusnady, Discrete Math. (1994) general k, called weighted cutAzran, Ghahramani, Siam J. Comput (2000) general kMeila and Shi, NIPS (2001): k = 2B, M-Saska, Studia Sci. Math. Hun. (2002) general kUpper estimate: depends on the corresponding eigenvectors.Point of spectral clustering: optimizing over Pk is NP-hard.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Isoperimetric number

Definition

The Cheeger constant of the weighted graph G = (V ,W ) is

h(G ) = minU ⊂ V

Vol (U) ≤ 1/2

w(U, U)

Vol (U)

Theorem

(B, M-Saska, Discrete Math. (2004)). Let λ1 be the smallestpositive eigenvalue of LD . Then

λ1

2≤ h(G ) ≤ min{1,

√2λ1}.

If λ1 ≤ 1 (G is not the complete graph), then

h(G ) ≤√λ1(2− λ1).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Normalized Newman–Girvan modularity

Newman–Girvan, Physical Review E (2004) N-G mod.B, Physical Review E (2011) normalized N-G mod.:

Mk(W,Pk) =k∑

a=1

1

Vol (Va)

∑i ,j∈Va

(wij − didj) =k∑

a=1

w(Va,Va)

Vol (Va)− 1

SinceMk(W,Pk) = k − 1− Qk(X(Pk)),

maximizing the k-way normalized Newman-Girvan modularity isequivalent to the normalized cut problem and it can be solved bythe same spectral relaxation.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Spectral gap and variance

Weighted k-variance of the vertex representatives:

S2k (X) = min

Pk=(V1,...,Vk )

k∑a=1

∑j∈Va

dj‖rj − ca‖2

where ca = 1Vol (Va)

∑j∈Va

djrj .

In B, Tusnady, Discrete Math. (1994)

Theorem

In the representation X∗ = (D−1/2u0, D−1/2u1) = (1, D−1/2u1):

S22 (X∗) ≤ λ1

λ2

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

f2(G ) is the symmetric version of h(G ): f2(G ) ≤ 2h(G ) =⇒

f2(G ) ≤ 2√λ1(2− λ1), λ1 ≤ 1.

Theorem

Suppose that G = (V ,W) is connected, and λi ’s are theeigenvalues of LD . Then

∑k−1i=1 λi ≤ fk(G ) and in the case when

the optimal k-dimensional representatives can be classified into kwell-separated clusters in such a way that the maximum clusterdiameter ε satisfies the relationε ≤ min{1/

√2k,√

2 mini

√Vol (Vi )} with k-partition

(V1, . . . ,Vk) induced by the clusters above, then

fk(G ) ≤ c2k−1∑i=1

λi ,

where c = 1 + εc ′/(√

2− εc ′) and c ′ = 1/mini

√Vol (Vi ).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Normalized modularity eigenvalues

MD = I− LD −√

d√

dT

eigenvalues:1− λ1 ≥ · · · ≥ λn−1 ≥ −1 with the same eigenvectors and 0 witheigenvector

√d .

1 cannot be an eigenvalue if G is connected / W is irreducible

Large absolute value positive eigenvalues of MD areresponsible for clusters with high intra- and low inter-clusterdensities.

If we minimize Mk(W,Pk) instead of maximizing over Pk :small negative eigenvalues of MD are responsible for clusterswith low intra- and high inter-cluster densities.

If we take into account eigenvalues from both ends of thenormalized modularity spectrum, we can recover so-calledregular cluster pairs.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Volume regularity

Lemma

Expander Mixing Lemma for weighted graphs: SupposingVol (V ) = 1, for all X ,Y ⊂ V ,

|w(X ,Y )− Vol (X )Vol (Y )| ≤ ‖MD‖ ·√Vol (X )Vol (Y )

For simple graphs: Alon, Combinatorica (1986)Hoory, Linial, Widgerson, Bulletin of AMS (2006)For edge-weighted graphs: Chung, Graham, Random structuresand algorithms (2008), in context of quasi-random properties.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

What if the gap is not at the ends of the spectrum?

We want to partition the vertices into clusters so that a relationformulated in the Lemma (1-cluster case) between theedge-densities and volumes of the cluster pairs would hold.We will use a slightly modified version of the volume regularity’snotion introduced by Alon, Coja-Oghlan, Han, Kang, Rodl, andSchacht, Siam J. Comput. (2010):

Definition

Let G = (V ,W) be a weighted graph with Vol (V ) = 1. Thedisjoint pair (A,B) is α-volume regular if for all X ⊂ A, Y ⊂ B wehave

|w(X ,Y )− ρ(A,B)Vol (X )Vol (Y )| ≤ α√Vol (A)Vol (B)

where ρ(A,B) = w(A,B)Vol (A)Vol (B) is the relative inter-cluster density of

(A,B).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Outline

For general deterministic edge-weighted graphs we’ll prove that theexistence of k − 1 eigenvalues of MD separated from 0 by ε, isindication of a k-cluster structure, while the eigenvaluesaccumulating around 0 are responsible for the pairwise regularities.

The clusters themselves can be recovered by applying the k-meansalgorithm for the vertex representatives obtained by theeigenvectors corresponding to the structural eigenvalues.

Our theorem bounds the volume regularity’s constants of thedifferent cluster pairs by means of ε and the k-variance of thevertex representatives (based on the structural eigenvectors).Estimates for the intra-cluster densities are also given.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Result

Theorem

G = (V ,W) is edge-weighted graph on n vertices, Vol (V ) = 1and there are no dominant vertices: di = Θ(1/n), i = 1, . . . , n asn→∞. The eigenvalues of MD in decreasing absolute values are:

1 > |µ1| ≥ · · · ≥ |µk−1| > ε ≥ |µi |, i ≥ k.

The partition (V1, . . . ,Vk) of V is defined so that it minimizes theweighted k-variance s2 = S2

k (X∗) of the vertex representatives.Suppose that there is a constant 0 < c ≤ 1

k such that |Vi | ≥ cn,

i = 1, . . . , k. Then the (Vi ,Vj) pairs are O(√

2ks + ε)-volumeregular (i 6= j) and for the clusters Vi (i = 1, . . . , k) the followingholds: for all X ,Y ⊂ Vi ,|w(X ,Y )− ρ(Vi )Vol (X )Vol (Y )| = O(

√2ks + ε)Vol (Vi ),

where ρ(Vi ) = w(Vi ,Vi )Vol 2(Vi )

is the relative intra-cluster density of Vi .

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Remark

The case k = 2 was treated separately inB, International Journal of Combinatorics, 2011:

Under the same conditions and with notations |µ1| = θ, |µ2| = ε,

the (V1,V2) pair is O(√

1−θ1−ε

)-volume regular.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Spectral clustering of data points

x1, . . . , xn ∈ Rd

Form one of the following graphs on n vertices:

ε-level graph

k nearest neighbor graph

complete graph with Gaussian weights:

wij = KGauss(xi , xj) = e−‖xi−xj‖

2

2σ2 ,

where σ > 0 is a parameter.Data are mapped into a Reproducing Kernel Hilbert Space,but we are not interested in the mapped data, just in thenew kernel which is is the weight matrix of an edge-weightedgraph.Sometimes several positive definite kernels are multipliedtogether (Schur/Hadamard product) which characterizebrightness, color, texture, etc. of pixels.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Random graphs, Wigner-noise

Definition

The n × n symmetric real matrix W is a Wigner-noise if its entrieswij , 1 ≤ i ≤ j ≤ n, are independent random variables, Ewij = 0,Varwij ≤ σ2 with some 0 < σ <∞ and the wij ’s are uniformlybounded (there is a constant K > 0 such that |wij | ≤ K ).

Furedi, Komlos, Combinatorica (1981):

max1≤i≤n

|λi (W)| ≤ 2σ√

n + O(n1/3 log n)

with probability tending to 1 as n→∞.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Sharp concentration theorem

Theorem

W is an n × n real symmetric matrix, its entries in and above themain diagonal are independent random variables with absolutevalue at most 1. λ1 ≥ λ2 ≥ · · · ≥ λn: eigenvalues of W.For any t > 0:

P (|λi − E(λi )| > t) ≤ exp

(−(1− o(1))t2

32i2

)when i ≤ n

2,

and the same estimate holds for the probability

P (|λn−i+1 − E(λn−i+1)| > t) .

Alon, Krivelevich, Vu, Israel J. Math. (2002)

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Previous results imply:

Lemma

There exist positive constants C1 and C2, depending on thecommon bound K for the entries of the Wigner-noise W, such that

P(‖W‖ > C1 ·

√n)≤ exp(−C2 · n).

Borel–Cantelli Lemma =⇒The spectral norm of W is O(

√n) almost surely.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Perturbation results for weighted graphs

A = B + W, whereW: n × n Wigner-noiseB: n × n blown-up matrix of P with blow-up sizes n1, . . . , nk ,∑k

i=1 ni = n.P: k × k pattern matrixk is kept fixed as n1, . . . , nk →∞“at the same rate”: there is aconstant c such thatnin ≥ c, i = 1, . . . k.growth rate condition: g.r.c.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Adjacency spectrum of a noisy graph

Gn = (V ,A), A = B + W is n × n, n→∞B induces a planted partition Pk = (V1, . . . ,Vk) of V .Weyl’s perturbation theorem =⇒Adjacency spectrum of Gn: under g.r.c. there are k structuraleigenvalues of order n (in absolute value) and the others areO(√

n), almost surely.The eigenvectors X = (x1, . . . , xk) corresponding to the structuraleigenvalues are “not far” from the subspace of stepwise constantvectors on Pk =⇒

S2k (X) ≤ S2

k (Pk ,X) = O(1

n), almost surely as n→∞.

This extends over the normalized Laplacian and modularity spectra.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Noisy graph is simple with appropriate noise

The uniform bound K on the entries of W is such thatA = B + W has entries in [0,1].With an appropriate Wigner-noise the noisy matrix A is ageneralized random graph: edges between Va and Vb exist withprobability 0 < pab < 1.For 1 ≤ a ≤ b ≤ k and i ∈ Va, j ∈ Vb:

wij :=

{1− pab, with probability pab

−pab with probability 1− pab

be independent random variables, otherwise W is symmetric. Theentries have zero expectation and bounded variance:

σ2 = max1≤a≤b≤k

pab(1− pab) ≤ 1

4.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Generalized random graphs

Ideal k-cluster case: given the partition (V1, . . . ,Vk) of V ,vertices i ∈ Va and j ∈ Vb are connected with probability pab,independently of each other, 1 ≤ a, b ≤ k.Generalized random graph: random simple graph= edge-weightedgraph with a special block-structure+random noise =⇒ Spectralcharacterization in B, Discrete Math. (2008):

If k is fixed and n→∞ such that |Va|n ≥ c (a = 1, . . . , k) with

some 0 < c ≤ 1k , then there exists a positive number 0 < θ ≤ 1,

independent of n, such that for every 0 < τ < 1/2

there are exactly k − 1 eigenvalues of MD greater thanθ− n−τ , while all the others are at most n−τ in absolute value,

the k-variance of the vertex representatives constructed by thek − 1 transformed structural eigenvectors is O(n−2τ ),

with any “small”α > 0, the Va,Vb pairs are α-volume regular,

almost surely.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

10-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

20-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

30-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

40-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

50-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

60-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

70-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

80-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

90-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

100-fold blow up

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Before sorting and clustering the vertices

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Parameter estimation via the EM-algorithm

Bickel and Chen (PNAS, 2009) introduced a random block modelwhich is, in fact, a generalized random graph.

For given k, vertices independently belong to cluster Va withprobability πa, a = 1, . . . , k;

∑ka=1 πa = 1.

Vertices of Va and Vb are connected independently of eachother with probabilities P(i ∼ j |i ∈ Va, j ∈ Vb) = pab,1 ≤ a, b ≤ k.

The parameters are collected in the vector π = (π1, . . . , πk) and inthe k × k symmetric matrix P of pab’s.Our statistical sample is the n× n symmetric, 0-1 adjacency matrixA = (aij) of a simple graph on n vertices. There are no loops, sothe diagonal entries are zeroes. We want to estimate theparameters of the above block model.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

By the theorem of mutually exclusive and exhaustive events, thelikelihood function is

1

2

∑1≤a,b≤k

πaπb

∏i∈Ca,j∈Cb,i 6=j

paij

ab(1− pab)(1−aij )

=1

2

∑1≤a,b≤k

πaπb · peabab · (1− pab)(nab−eab)

reminiscent of the mixture of binomial distributions, whereeab is the number of edges connecting vertices of Va and Vb

(a 6= b);eaa is twice the number of edges with endpoints in Va;nab = |Va| · |Vb| if a 6= b, and naa = |Va| · (|Va| − 1), a = 1, . . . , k.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

EM-algorithm for incomplete data

Here A is the incomplete data specification, as the clustermemberships are missing. Therefore we complete our data matrixA by latent membership vectors ∆1, . . . ,∆n of the vertices thatare k-dimensional i.i.d. Poly(1, π) random vectors.∆i = (∆1i , . . . ,∆ki ), where ∆ai = 1 if i ∈ Va and zero otherwise.Thus, the sum of the coordinates of any ∆i is 1, andP(∆ai = 1) = πa.The likelihood function above is

1

2

∑1≤a,b≤k

πaπb · pP

i,j : i 6=j ∆ai ∆bjaij

ab · (1− pab)P

i,j : i 6=j ∆ai ∆bj (1−aij )

that is maximized in the alternating E and M steps of theEM-algorithm.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

We remark that the complete likelihood would be the squareroot of∏1≤a,b≤k

peabab · (1− pab)(nab−eab)

=k∏

a=1

n∏i=1

k∏b=1

[pP

j : j 6=i ∆bjaij

ab · (1− pab)P

j : j 6=i ∆bj (1−aij )]∆ai

that is valid only in case of known cluster memberships.Starting with initial parameter values π(0), P(0) and membership

vectors ∆(0)1 , . . . ,∆

(0)n , the t-th step of the iteration is the

following (t = 1, 2, . . . ).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

E-step

We calculate the conditional expectation of each ∆i conditionedon the model parameters and on the other cluster assignmentsobtained in the (t − 1)-th step (denoted by M(t−1)).By the Bayes theorem, the responsibility of vertex i for cluster a:

π(t)ai = E(∆ai |M(t−1)) = P(∆ai = 1|M(t−1))

=P(M(t−1)|∆ai = 1) · π(t−1)

a∑kl=1 P(M(t−1)|∆li = 1) · π(t−1)

l

(a = 1, . . . , k; i = 1, . . . , n). Thus, for each i , π(t)ai is proportional

to the numerator, where

P(M(t−1)|∆ai = 1) =k∏

b=1

(p(t−1)ab )

Pj 6=i ∆

(t−1)bj aij ·(1−p

(t−1)ab )

Pj 6=i ∆

(t−1)bj (1−aij )

is the part of the likelihood affecting vertex i under the condition∆ai = 1.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

M-step

For all a, b pairs separately, we maximize the truncated binomiallikelihood

pP

i,j : i 6=j π(t)ai π

(t)bj aij

ab · (1− pab)P

i,j : i 6=j π(t)ai π

(t)bj (1−aij )

with respect to the parameter pab. Obviously, the maximum isattained by the following estimators of pab’s comprising thesymmetric matrix P(t):

p(t)ab =

∑i ,j : i 6=j π

(t)ai π

(t)bj aij∑

i ,j : i 6=j π(t)ai π

(t)bj

, 1 ≤ a ≤ b ≤ k,

where edges connecting vertices of clusters a and b are countedfractionally, multiplied by the membership probabilities of theirendpoints.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

The ML-estimator of π in the t-th step is π(t) of coordinates

π(t)a = 1

n

∑ni=1 π

(t)ai (a = 1, . . . , k), while that of the membership

vector ∆i is obtained by discrete maximization: ∆(t)ai = 1, if

π(t)ai = maxb∈{1,...,k} π

(t)bi and 0, otherwise. (In case of ambiguity,

the cluster with the smallest index is selected.) This choice of πwill increase the likelihood.The above algorithm is a special case of so-called CollaborativeFiltering, see Hoffman, T., Puzicha, J., Ungar, L., Foster, D.According to the general theory of EM-algorithm (Dempster, Laird,Rubin, J. R. Statist. Soc B 39, 1977), in exponential families (as inthe present case), convergence to a local maximum can beguaranteed (depending on the starting values), but it runs inpolynomial time in n.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Testable weighted graph parameters

Both edges and vertices may have nonnegative weights.

Convergent graph sequences: Gn’s become more and more similaras n→∞ (homomorphism densities, convergence in the cutmetric, graphons), Lovasz, L. et al. (2006–).

f is a testable weighted graph parameter if

f (Gn) converges whenever Gn converges,

It can be consistently estimated based on a sufficiently largesample selected from the huge graph by an appropriaterandomization.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Examples:

Balanced minimum multiway cuts, normalized and modularitycuts (statistical physics, Hamiltonian).

If the vertex weights are the generalized degrees, thenormalized modularity spectrum is testable:

µn,i (MD(Gn))→ λi (PW ) (n→∞) i = 1, 2, . . .

where µn,i (MD(Gn)) is the ith largest absolute valueeigenvalue of the normalized modularity matrix of Gn, andλi (PW ) is the ith largest absolute value eigenvalue of theoperator taking conditional expectation with respect to thejoint measure embodied by W which is compact with discretespectrum.

For any k > 1, the eigen-subspace belonging to thetransformed µn,1(MD(Gn)), . . . , µn,k−1(MD(Gn)) alsoconverges to the corresponding eigensubspace of PW .

The k-variances also converge.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Biclustering of contingency tables

(Row ,Col ,C) contingency tableRow set: Row = {1, . . . , n}Column set: Col = {1, . . . ,m}C: n ×m matrix of entries cij ≥ 0.cij : some kind of interaction between the objects representing rowi and column j , where 0 means no interaction at all.

drow ,i =m∑

j=1

cij , i = 1, . . . , n

dcol ,j =n∑

i=1

cij , j = 1, . . . ,m

Drow = diag (drow ,1, . . . , drow ,n), Dcol = diag (dcol ,1, . . . , dcol ,m)

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Quadratic placement problem

Given the integer 1 ≤ k ≤ min{n,m}: find k-dimensionalrepresentatives r1, . . . , rn ∈ Rk of the rows and c1, . . . , cm ∈ Rk ofthe columns such that they minimize

Qk =n∑

i=1

m∑j=1

cij‖ri − cj‖2

under the conditions

n∑i=1

drow ,i ri rTi = Ik ,

m∑j=1

dcol ,jcjcTj = Ik

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Equivalence to the correspondence analysis

X := (rT1 , . . . , rTn )T = (x1, . . . , xk) n × k

Y := (cT1 , . . . , c

Tm)T = (y1, . . . , yk) m × k

Constraints:

XT DrowX = Ik , YT DcolY = Ik .

Qk =n∑

i=1

drow ,i‖ri‖2 +m∑

j=1

dcol ,j‖cj‖2 =n∑

i=1

m∑j=1

cijrTi cj

= 2k − trXT CY = 2k − tr (D1/2rowX)T (D

−1/2row CD

−1/2col )(D

1/2col Y),

where D1/2rowX and D

1/2col Y are suborthogonal matrices.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Correspondence matrix / normalized contingencytable

Ccorr := D−1/2row CD

−1/2col

SVD:

Ccorr =r∑

l=1

siviuTi ,

where r ≤ min{n,m} is the rank of Ccorr , or equivalently (as thereare not identically zero rows or columns), that is the rank of C.1 = s1 ≥ s2 ≥ · · · ≥ sr > 0: non-zero singular values of Ccorr withsingular vector pairs vi ,ui (i = 1, . . . , r).1 is a single singular value if Ccorr (or equivalently, C) isirreducible. In this casev1 = (

√drow ,1, . . . ,

√drow ,n)T and u1 = (

√dcol ,1, . . . ,

√dcol ,m)T .

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Representation theorem for contingency tables

Theorem

Let (Row ,Col ,C) be an irreducible contingency table with theabove SVD of its correspondence matrix Ccorr . Let k ≤ r be apositive integer such that sk > sk+1. Then the minimum of Qk

under the given constraints is 2k −∑k

i=1 si and it is attained withthe optimum row representatives r∗1, . . . , r

∗n and column

representatives c∗1, . . . , c∗m, the transposes of which are row vectors

of X∗ = D−1/2row (v1, . . . , vk) and Y∗ = D

−1/2col (u1, . . . ,uk),

respectively.

Remark: if 1 is a single singular value, the first columns of X∗ and

Y∗: D−1/2row v1 and D

−1/2col u1 are the constantly 1 vectors in Rn and

Rm, respectively.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Normalized two-way cuts of a contingency table

(Row ,Col ,C): n ×m contingency tablek (0 < k ≤ r): fixed integerPartition simultaneously the rows and columns into disjoint,nonempty subsets

Row = R1 ∪ · · · ∪ Rk , Col = C1 ∪ · · · ∪ Ck

such that the cuts

c(Ra,Cb) =∑i∈Ra

∑j∈Cb

cij , a, b = 1, . . . , k

between the row-column cluster pairs be as homogeneous aspossible.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

The normalized two-way cut of the contingency table with respectto the above k-partitions Prow = (R1, . . . ,Rk) andPcol = (C1, . . . ,Ck) of its rows and columns and to the collectionof signs σ:

νk(Prow ,Pcol , σ) =

k∑a=1

k∑b=1

(1

Vol (Ra)+

1

Vol (Cb)+

2δabσab√Vol (Ra)Vol (Cb)

)c(Ra,Cb),

where

Vol (Ra) =∑i∈Ra

m∑j=1

cij , Vol (Cb) =∑j∈Cb

n∑i=1

cij

are volumes of the clusters, and σ = (σ11, . . . , σkk) with σaa = ±1(a = 1, . . . , k), whereas σab has no relevance if a 6= b.The objective function also penalizes clusters of extremely differentvolumes.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Theorem

The normalized two-way cut of the contingency table C:

νk(C) := minProw ,Pcol ,σ

νk(Prow ,Pcol , σ) ≥ 2k −k∑

i=1

si .

Proof: νk(Prow ,Pcol , σ) is Qk in the special representation, wherethe column vectors of X and Y are partition vectors belonging toProw and Pcol :

xia :=1√

Vol (Ra)if i ∈ Ra and 0 otherwise (a = 1, . . . k)

yjb :=σbb√

Vol (Cb)if j ∈ Cb and 0 otherwise (b = 1, . . . , k)

X = (x1, . . . , xk) and Y = (y1, . . . , yk) satisfy the conditionsimposed on the representatives and

‖ri−cj‖2 =1

Vol (Ra)+

1

Vol (Cb)+

2δabσbb√Vol (Ra)Vol (Cb)

if i ∈ Ra, j ∈ Cb.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Symmetric contingency table = edge-weightedgraph

If the k − 1 largest absolute value eigenvalues of thenormalized modularity matrix are all positive: the k − 1largest singular values (apart of the 1) of Ccorr are identical tothe k − 1 largest eigenvalues of MD , and the left and rightsingular vectors are identical to the corresponding eigenvectorwith the same orientation =⇒ ri = ci for all(k − 1)-dimensional row and column representatives;νk(C) = 2fk(G ) =⇒ the normalized two-way cut favorsk-partitions with low inter-cluster edge-densities.If all the k − 1 largest absolute value eigenvalues of thenormalized modularity matrix are negative: ri = −ci , and any(but only one) of them can be the corresponding vertexrepresentative; νk(C) differs from fk(G ) in that it also countsthe edge-weights within the clusters. Here, minimizing νk(C),rather a so-called anti-community structure is detected.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Regular row-column cluster pairs

In the generic case, for given k, if the clusters are formed viaapplying the k-means algorithm for the row- and columnrepresentatives, respectively, then the so obtained row–columncluster pairs are homogeneous in the sense, that they form equallydense parts of the contingency table.

Definition

The row–column cluster pair R ⊂ Row , C ⊂ Col of thecontingency table (Row ,Col ,C) (where the sum of the entries is 1)is γ-volume regular, if for all X ⊂ R and Y ⊂ C the relation

|c(X ,Y )− ρ(R,C )Vol (X )Vol (Y )| ≤ γ√Vol (R)Vol (C )

holds, where ρ(R,C ) = c(R,C)Vol (R)Vol (C) is the relative inter-cluster

density of the row–column pair R,C .

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Weighted k-variances

The weighted k-variance of the k-dimensional row representatives:

S2k (X) = min

Prow,k∈Prow,k

S2k (Pk ,X) = min

(R1,...,Rk)

k∑a=1

∑j∈Ra

drow ,j‖rj − ba‖2,

where ba = 1Vol (Ra)

∑j∈Ra

drow ,jrj (a = 1, . . . , k).The weighted k-variance of the k-dimensional columnrepresentatives:

S2k (Y) = min

Qcol,k∈Pcol,k

S2k (Qk ,Y) = min

(C1,...,Ck )

k∑b=1

∑j∈Cb

dcol ,j‖cj − bb‖2,

where bb = 1Vol (Cb)

∑j∈Cb

dcol ,jcj (b = 1, . . . , k).Observe, that the trivial vector components can be omitted, andthe k-variance of the so obtained (k − 1)-dimensionalrepresentatives will be the same.

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Volume regularity versus spectral properties

Theorem

Let (Row ,Col ,C) be a contingency table of n rows and mcolumns, with row- and column sums drow ,1, . . . , drow ,n anddcol ,1, . . . , dcol ,m, respectively. Suppose that

∑ni=1

∑mj=1 cij = 1

and there are no dominant rows and columns: drow ,i = Θ(1/n),(i = 1, . . . , n) and dcol ,j = Θ(1/m), (j = 1, . . . ,m) as n,m→∞.Let the singular values of Ccorr be

1 = s1 > s2 ≥ · · · ≥ sk > ε ≥ si , i ≥ k + 1.

The partition (R1, . . . ,Rk) of Row and (C1, . . . ,Ck) of Col aredefined so that they minimize the weighted k-variances S2

k (X∗) andS2

k (Y∗) of the row and column representatives. Suppose that thereare constants 0 < K1,K2 ≤ 1

k such that |Ri | ≥ K1n and|Ci | ≥ K2m (i = 1, . . . , k), respectively. Then the Ri ,Cj pairs areO(√

2k(Sk(X∗) + Sk(Y∗)) + ε)-volume regular (i , j = 1, . . . , k).

Motivation Spectral clustering of graphs Noisy random graphs Biclustering of contingency tables

Noisy contingency table sequences

Figure: noisy table; table close to the limit; approximation by SVD

THE END