consistency of spectral algorithms for hypergraphs …ph.d. thesis defense advisor: prof. ambedkar...

Post on 16-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Consistency of Spectral Algorithms for Hypergraphsunder Planted Partition Model

Debarghya Ghoshdastidar

Ph.D. Thesis Defense

Advisor: Prof. Ambedkar Dukkipati

January 2, 2017

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 1 / 47

Overview

Purpose of the work:

Theoretical study of spectral methods for hypergraph partitioning

Contributions:

Model for random hypergraphs with planted partition

Error bounds for partitioning planted hypergraphs

New algorithms with improved error rates

Analysis of edge sampling strategies

Bi-partite hypergraph coloring

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 2 / 47

Spectral Algorithm for Graph PartitioningSpectral Clustering

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 3 / 47

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Graph Partitioning

Objective:

High connectivity within clusters

Few edges across clusters (small cut)

Balanced partitions

Applications:

Network Data Imagepartitioning clustering segmentation

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 4 / 47

Spectral Graph Partitioning / Spectral Clustering

Input Graph Good balanced cut

(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 5 / 47

Spectral Clustering (in practice)

Input Graph Good balanced cut

(Normalized) Find k dominant Run k-meansAdjacency matrix eigenvectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 6 / 47

Theoretical analysis

Stochastic block model: [Holland, Laskey & Leinhardt '83]

Random hypergraph (V, E) on |V| = n nodes

Nodes have (hidden) class labels, ψ : 1, . . . , n → 1, . . . , kP(euv ∈ E) depends on labels of u, v

Question:

Error(ψ,ψ′) = minσ

n∑i=1

1ψi 6= σ(ψ′i) (ψ′ is output label)

Find βn such that

Error(ψ,ψ′) ≤ βn with probability 1− o(1)

Consistency of algorithms:

Weakly consistent if βn = o(n); Strongly consistent if βn = o(1)

Spectral clustering is weakly consistent [Rohe, Chatterjee & Yu '11]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 7 / 47

Hypergraph PartitioningApplications and Algorithms

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 8 / 47

Hypergraphs

Collection of sets / Generalization of graphs

Each edge can connect more than two nodes

Graph 3-uniform Hypergraph(2-uniform) hypergraph

m-uniform hypergraph:

Each edge connects m nodes

Adjacencies can be represented by mth-order tensor

Ai1i2...im =

1 if there is edge on i1, i2, . . . , im0 otherwise

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 9 / 47

Hypergraphs in Databases [Gibson, Kleinberg & Raghavan '00]

Gender Male Female Male Male FemaleHair Red Black Bald Black Red

Glasses Yes No Yes No No

Edges can be of varying sizes (non-uniform hypergraph)

Male, Black hair, Without glasses, and so on . . .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 10 / 47

Hypergraphs in Computer Vision [Agarwal et al. '05]

Subspace clustering Motion segmentation

Matching / Image Registration

Involves 3-way / 4-way similarities (uniform hypergraph)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 11 / 47

Hypergraph Partitioning Methods

Partitioning circuits [Schweikert & Kernighan '79]

Graph approximation for hypergraphs [Hadley '95]

Spectral hypergraph partitioning [Zien et al. '99]

hMETIS for VLSI design [Karypis & Kumar '00]

Uniform hypergraph in databases [Gibson et al. '00]

Uniform hypergraph in vision [Agarwal et al. '05]

Tensor based algorithms [Govindu '05; Chen & Lerman '09]

Learning with non-uniform hypergraph [Zhou et al. '07]

Higher order learning [Duchenne et al. '11; Rota Bulo & Pellilo '13; etc.]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 12 / 47

Algorithms studied in our work

HOSVD / SCC: [Govindu '05; Chen & Lerman '09]

Uniform hypergraph partitioning using higher order SVD of adjacencytensor.

TTM / TeTrIS: (proposed)

Uniform hypergraph partitioning by solving a tensor trace maximizationproblem.

TeTrIS is efficient (sampled) version of TTM.

NH-Cut: [Zhou, Huang & Scholkopf '07]

Non-uniform hypergraph partitioning by minimizing normalizedhypergraph cut.

COLOR: (proposed)

Vertex 2-coloring of bi-partite non-uniform hypergraph.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 13 / 47

Uniform Hypergraph PartitioningSpectral Algorithms

Approach 1: Higher order SVD of adjacency tensor

Approach 2: Associativity or tensor trace maximization

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 14 / 47

Approach 1: Higher Order SVD

Matrix eigen decomposition:

A U Σ UT .(orthonormal) (diagonal)

HOSVD of 3rd-order tensor: [De Lathauwer et al. '00]

A U Σ UT .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 15 / 47

HOSVD based Partitioning [Govindu '05]

m-uniform hypergraph

Adjacency tensor A Flattened matrix A

Find dominant left Run k-meanssingular vectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47

HOSVD based Partitioning [Govindu '05]

m-uniform hypergraph

Adjacency tensor A Flattened matrix A

Find dominant left Run k-meanssingular vectors on rows

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 16 / 47

Approach 2: Associativity Maximization

Normalized associativity:

For any cluster V1 ⊂ Vassociativity(V1) =

∑e⊂V1

w(e)

volume(V1) =∑

v∈V1degree(v)

Normalized associativity of partition

N-assoc(V1, . . . ,Vk) =

k∑j=1

associativity(Vj)volume(Vj)

Problem:

Find partition that maximizes N-assoc(V1, . . . ,Vk)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 17 / 47

Tensor Trace Maximization (TTM)

Problem (reformulated):

For m-uniform hypergraph

N-assoc(V1, . . . ,Vk) = 1m! Trace

(A×1 Y

b1 ×2 . . .×m Y bm)

Y ∈ Rn×k has orthogonal columns, and∑

j bj = 1

Y b1 A Y b2 .

Spectral relaxation of TTM:

Set b1 = b2 = 12 , b3 = . . . = bm = 0 and X = Y 1/2

Optimize over all orthonormal X

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 18 / 47

Spectral TTM Algorithm [Ghoshdastidar & Dukkipati, ICML'15]

m-uniform hypergraph

Matrix A

Adjacency tensor A Add slices of tensor

Run k-means Find k dominanton rows eigenvectors

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 19 / 47

Uniform Hypergraph PartitioningConsistency

Planted partition model for uniform hypergraphs

Error bounds for algorithms

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 20 / 47

Planted Partition Model (graph)

Sparse Stochastic Block Model: [Lei & Rinaldo '15]

Given n nodes, and k (hidden) classes

An unknown matrix B ∈ [0, 1]k×k symmetric

An unknown sparsity factor αn

Independent edges with probabilities depending on labels

• • •Class-1 Class-2 Class-3

Prob(•,•) = αnB11, Prob(•,•) = αnB12, Prob(•,•) = αnB13 . . .

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 21 / 47

Planted Partition Model (uniform hypergraph)

Extension of Sparse SBM: (proposed)

Given n nodes, and k (hidden) classes

Unknown mth-order tensor B ∈ [0, 1]k×k×...×k

Unknown sparsity factor αn

Independent edges with label-dependent distribution

Unweighted hypergraph:Prob(edge) = αnBi1i2...im

Weighted hypergraph:w(edge) ∈ [0, 1]E[w(edge)] = αnBi1i2...im

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 22 / 47

Consistency of HOSVD [Ghoshdastidar & Dukkipati, NIPS, 2014]

Define:

nmax (or nmin) = max. (min.) cluster size

A = E[AAT

]and Amin = min

i,jAij : Aij > 0

δ = kth eigen-gap of normalized A

Theorem

There exists constant C > 0, such that, if

δ > 0 and Amin > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2Amin

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 23 / 47

Consistency of TTM [Ghoshdastidar & Dukkipati, ICML, 2015]

Define:

d = mini

E[degree(i)] = mini

∑e3i

E[w(e)]

δ = kth eigen-gap of normalized E[A]

Theorem

There exists constant C > 0, such that, if

δ > 0 and d > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2d

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 24 / 47

Special Case

m-uniform hypergraph

k = O(log n) clusters of equal size

Edge probabilities

Prob(edge) =

αnp if edge lies within a clusterαnq otherwise (p > q)

HOSVD TTM

Allowable sparsity: αn = Ω((logn)m+1.5

n(m−1)/2

)αn = Ω

((logn)2m+1

nm−1

)Dense hypergraph:(αn = 1)

error = O((logn)2m+1

nm−2

)error = O

((logn)2m−1

nm−2

)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 25 / 47

Non-uniform Hypergraph PartitioningAlgorithm and Consistency

Approach 3: Normalized hypergraph cut minimization

Planted partition model for non-uniform hypergraphs

Consistency result (with proof sketch)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 26 / 47

Normalized Hypergraph Cut

Approach: [Zhou, Huang & Scholkopf '07]

Solve spectral relaxation of minimizing normalized hypergraph cut

Reduction to graph:

A,D ∈ Rn×n so that Aij =∑e3i,j

1|e| , Dii = degree(i)

Spectral clustering:

Normalized Laplacian, L = I −D−1/2AD−1/2

Compute k leading orthonormal eigenvectors of L

k-means on normalized rows of eigenvector matrix

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 27 / 47

Planted Partition Model (non-uniform hypergraph)

Model: (proposed)

Given n nodes, and k (hidden) classes

Maximum edge cardinality M

Unknown mth-order tensors B(m) ∈ [0, 1]k×k×...×k

Unknown sparsity factors αm,n, m = 2, 3, . . . ,M

Independent edges with label-dependent distribution

Prob(m-edge) = αm,nB(m)i1i2...im

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 28 / 47

Consistency of NH-Cut [Ghoshdastidar & Dukkipati, Ann. Stat., 2017]

Define:

A = E[A], D = E[D] and L = I −D−1/2AD−1/2

d = mini E[degree(i)]

δ = kth eigen-gap of L

Theorem

There exists constant C > 0, such that, if

δ > 0 and d > Cknmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = O

(knmax log n

δ2d

)= o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 29 / 47

Proof of consistency

Stage 1: (expected case)

If δ > 0, then A is essentially of rank k

If A used instead of A, then Error = 0

Stage 2: (matrix concentration)

A can be expressed as a sum of random matrices

A =∑e∈2V

1e ∈ E(

1|e|heh

Te

)If d > 9 log n for all large n, then w.p. (1− 4

n2 ),

‖L− L‖2 ≤ 12

√log n

dProof uses matrix concentration inequality [Tropp '12]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 30 / 47

Proof of consistency

Stage 3: (matrix perturbation)

X,X row normalized eigenvector matrices of L,L

If δ > 24√

lognd for all large n, then w.p. (1− 4

n2 )

‖X −X‖F ≤24

δ

√2knmax log n

dProof using matrix perturbation [Davis & Kahan '70]

Stage 4: (analyzing k-means)

Rows of X are ε-separable for ε = (log n)−1/2

k-means succeeds w.p. (1− o(1))

Error = O(‖X −X‖2F )

Based on guarantees of k-means [Ostrovsky et al. '12]

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 31 / 47

Sampling Hypergraph Edges

Consistency of partitioning with edge sampling

Approach 4: TTM with iterative sampling

Numerical comparison

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 32 / 47

Edge Sampling (weighted m-uniform hypergraph)

Complexity of tensor methods:

O(nm) runtime to compute all edge weights

Typically m = 3 to 8 in practice

Efficient variant: Use only N nm sampled edges

Question:

Edges sampled with replacement

Sampling distribution (pe)e∈E

Find min. number of samples needed for consistency

Sampling bound for TTM: [Ghoshdastidar & Dukkipati, arXiv:1602.06516]

(Special case) Error = o(n) if

Uniform sampling: N = Ω(α−1n k2m−1n(log n)2

)Weighted, pe ∝ w(e): N = Ω

(k2m−1n(log n)2

)Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 33 / 47

TTM with Iterative Sampling (TeTrIS)

Iterative Sampling:

Principle:

Sample edges with large weight more frequentlyEdges within cluster usually have large weight

Approach (SCC): [Chen & Lerman '09]

Sample few edgesCluster using HOSVD based methodRe-sample with preference to within cluster edgesRe-cluster and repeat till convergence

TeTrIS Algorithm: [proposed]

Replace HOSVD step by TTM

Sampling bound for TTM justifies the usefulness of sampling largeweight edges via iterative sampling

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 34 / 47

Numerical Comparison

Motion Segmentation:

Cluster motion trajectories

Posed as subspace clustering problem

Each motion – subspace of dimension ≤ 4

Mean clustering error on Hopkins 155 data set (%)

Method 2 motion 3 motion All(120 videos) (35 videos)

k-means 19.57 26.16 21.06k-flats 13.05 15.78 13.67SSC 1.53 4.40 2.18LRR 2.13 4.03 2.56NSN 3.62 8.28 4.67

SCC (HOSVD) 2.38 5.71 3.13TeTrIS (TTM) 1.36 5.38 2.27

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 35 / 47

Hypergraph vertex 2-coloring

Objective: No edge can be mono-chromatic

Assume: Planted bi-partite hypergraphM = O(1) and E[#edges] ≥ Cn log n

Algorithm: [Ghoshdastidar & Dukkipati, arXiv:1507.00763]

Spectral step:

Let A ∈ Rn×n as Aij =∑

e3i,j

1|e|

Compute eigenvector x for smallest eigenvalue of AColor node-i red if sign(xi) > 0, else blue

⇒ Achieves error < cn for c 1

Iterative refinement:

Re-color node-i red if∑

j∈VRAij <

∑j∈VB

Aij , else blue

⇒ Error reduces by half in each iteration (log2 n steps suffice)

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 36 / 47

Summary

Hypergraph partitioning can be done efficiently

First study of planted non-uniform hypergraphs

Literature considers only planted k-SAT / 2-coloring

Statistical analysis of tensor based methods

Popular in practice, but no known error bound

Removing the assumptions on k-means

First study of sampled spectral algorithms

Justification for iterative sampling

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 37 / 47

Further works & open questions

Extension to large scale hypergraph partitioningDown sampling of hypergraphs

Analysis of other approaches under planted modelMove based strategiesOptimization based algorithms

Study of sparse planted hypergraphsOverlapping communities / Degree heterogenityAlgorithmic barrier for partitioning

[Angelini et al. '15; Florescu & Perkins '16]

Generalization of graphs problems to hypergraphsTheoretical studiesApplications

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 38 / 47

Thank You

Acknowledgment:The work was supported by Google Ph.D. Fellowship in Statistical Learning Theory

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 39 / 47

References

Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Kriegman, D. & Belongie,S. (2005). In IEEE Computer Vision and Pattern Recognition 838-845.

Angelini, M. C., Caltagirone, F., Krzakala, F. and Zdeborova, L. (2015). InAnnual Allerton Conference on Communication, Control, and Computing.

Chen, G. & Lerman, G. (2009). International Journal of Computer Vision81(3) 317-330.

Davis, C. & Kahan, W. M. (1970). SIAM Journal on Numerical Analysis 7(1)1-46.

De Lathauwer, L., De Moor, B. and Vandewalle, J. (2000). SIAM Journal onMatrix Analysis and Applications 21(4) 1253-1278.

Duchenne, O., Bach, F., Kweon, I.-S. & Ponce, J. (2011). IEEE Transactionson Pattern Analysis and Machine Intelligence 33(12) 2383-2395.

Florescu, L. & Perkins, W. (2016). In Conference on Learning Theory.

Ghoshdastidar, D. & Dukkipati, A. (2014). In Advances in Neural InformalProcessing Systems 397-405.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 40 / 47

References

Ghoshdastidar, D. & Dukkipati, A. (2015). In International Conference onMachine Learning.

Ghoshdastidar, D. & Dukkipati, A. (2015). Annals of Statistics (in press).

Ghoshdastidar, D. & Dukkipati, A. (2015). arXiv preprint 1507.00763.

Ghoshdastidar, D. & Dukkipati, A. (2016). arXiv preprint 1602.06516.

Gibson, D., Kleinberg, J. & Raghavan, P. (2000). VLDB Journal 8 222-236.

Govindu, V. M. (2005). In IEEE Computer Vision and Pattern Recognition1150-1157.

Hadley, S. W. (1995). Discrete Applied Mathematics 59 115-127.

Hein, M., Setzer, S., Jost, L. and Rangapuram, S. (2013). In Advances inNeural Informal Processing Systems 2427-2435.

Holland, P. W., Laskey, K. B. & Leinhardt, S. (1983). Social Networks 5109-137.

Karypis, G. & Kumar, V. (2000). VLSI Design 11 285-300.

Lei, J. & Rinaldo, A. (2015). Annals of Statistics 43 215-237.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 41 / 47

References

Ostrovsky, R., Rabani, Y., Schulman, L. J. & Swamy, C. (2012). Journal of theACM 59(6) 28:128.

Rohe, K., Chatterjee, S., & Yu, B. (2011). Annals of Statistics 39 1878-1915.

Rota Bulo, S. & Pellilo, M. (2013). IEEE Transactions on Pattern Analysisand Machine Intelligence 35(6) 1312-1327.

Schweikert, G. & Kernighan, B. W. (1979). In Design Automation Workshop57-62.

Tropp, J. A. (2012). Foundations of Computational Mathematics 12(4)389-434.

Zien, J. Y., Schlag, M. D. F. and Chan, P. K. (1999). IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems 13(9) 1088-1096.

Zhou, D., Huang, J. and Scholkopf, B. (2007). In Advances in Neural InformalProcessing Systems 1601-1608.

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 42 / 47

Consistency of Sampled TTM (General Case)

Define:

γ = maxew(e)p(e) , where p(e) = P(e is sampled)

d = mini

E[degree(i)], δ = kth eigen-gap of normalized E[A]

Theorem [Ghoshdastidar & Dukkipati '16]

There exist constants C,C ′ > 0, such that, if

δ > 0, d > Cknmax(log n)2

nminδ2

and N > C ′(

1 +2γ

d

)knmax(log n)2

nminδ2

then with probability (1− o(1))

Error(ψ,ψ′) = o(n).

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 43 / 47

More Numerical Results

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 44 / 47

Numerical Comparison (uniform hypergraph)

Subspace clustering

60 points in 5-dim ambient space

Data from union of three random lines (1-dim subspaces)

Data perturbed by Gaussian noise of standard deviation σa

Fractional error (over 20 runs)

Algorithm Noise levelσa = 0.02 σa = 0.05

SNTF 0.025 0.086hMETIS 0.045 0.118

HGT 0.083 0.222HOSVD 0.052 0.126

TTM 0.033 0.103

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 45 / 47

Numerical Comparison (sampled uniform hypergraph)

Subspace clustering

5-dim ambient spaceData from union of five 3-dim subspaces (added noise)

Nois

ele

vel

,σa

Fra

ctio

nal

erro

r(o

ver

50

runs)

Number of points in each subspace, n/k

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 46 / 47

Numerical Comparison (non-uniform hypergraph)

Categorical data clustering

Data set #instances #attributes #attr. values

Voting records 435 16 3Mushroom 8124 22 varies

Fractional errorData set ROCK CoolCat LIMBO hMETIS NH-Cut

Voting 0.16 0.15 0.13 0.24 0.12Mushroom 0.43 0.27 0.11 0.48 0.11

Debarghya Ghoshdastidar Ph.D. Thesis Defense Jan 2, 2017 47 / 47

top related