information and computation limits for finding a hidden ...jx77/jiaming-aps17.pdf · information...

Post on 23-May-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Information and Computation Limits for Finding aHidden Community in Networks

Jiaming Xu

Krannert School of ManagementPurdue University

Joint work with Bruce Hajek (Illinois) and Yihong Wu (Yale)

Applied Probability Society Conference, July 12, 2017

A single hidden community – graph view

G(n, s, p, q)

1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q

Jiaming Xu Finding a Hidden Community in Networks 2

A single hidden community – graph view

G(n, s, p, q)

1 A community of s vertices are chosen randomly

2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

•••••• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

•• ••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Jiaming Xu Finding a Hidden Community in Networks 2

A single hidden community – graph view

G(n, s, p, q)

1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p

3 For other pairs of nodes, add an edge w.p. q

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

•••••• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

•• ••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Jiaming Xu Finding a Hidden Community in Networks 2

A single hidden community – graph view

G(n, s, p, q)

1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q

••

••

••

••••

••

••

••

••••

••

••

••••

•••• ••

••

••

••

••

••••••

••

••

••

••

••

••

•••••• ••••

••••••

••

••

••

••

••

••

••

••

••

••

••••

••

••

••

••

••

••

••

••

••

••••

•• ••

••

••

••••

••

••

••

••

••

••

••

••

••

••

••

•• ••

Jiaming Xu Finding a Hidden Community in Networks 2

A single hidden community – graph view

G(n, s, p, q)

1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q

Jiaming Xu Finding a Hidden Community in Networks 2

A single hidden community – adjacency matrix view

Of course not ordered

n = 200, s = 50, p = 0.3, q = 0.1

Jiaming Xu Finding a Hidden Community in Networks 3

A single hidden community – adjacency matrix view

Of course not ordered

n = 200, s = 50, p = 0.3, q = 0.1

Jiaming Xu Finding a Hidden Community in Networks 3

A single hidden community – adjacency matrix view

Of course not ordered

n = 200, s = 50, p = 0.3, q = 0.1

Jiaming Xu Finding a Hidden Community in Networks 3

Computational gap in planted clique

p = 1

q = 1/2

• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)

• s &√n log n: Trivial by counting degrees [Kucera ’95]

• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]

[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari

’13]...

• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]

[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...

Jiaming Xu Finding a Hidden Community in Networks 4

Computational gap in planted clique

p = 1

q = 1/2

• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)

• s &√n log n: Trivial by counting degrees [Kucera ’95]

• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]

[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari

’13]...

• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]

[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...

Jiaming Xu Finding a Hidden Community in Networks 4

Computational gap in planted clique

p = 1

q = 1/2

• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)

• s &√n log n: Trivial by counting degrees [Kucera ’95]

• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]

[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari

’13]...

• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]

[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...

Jiaming Xu Finding a Hidden Community in Networks 4

Computational gap in planted clique

p = 1

q = 1/2

• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)

• s &√n log n: Trivial by counting degrees [Kucera ’95]

• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]

[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari

’13]...

• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]

[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...

Jiaming Xu Finding a Hidden Community in Networks 4

Linear community size and relatively sparse graph

• Linear community size: s = ρ n

• p = a lognn and q = b logn

n

Theorem (Hajek-Wu-X., Trans. IT 16)

• If ρ > ρ∗, exact recovery is possible in polynomial-time.

• If ρ < ρ∗, exact recovery is impossible.

Remarks

• ρ∗ = 1/(a− τ∗ log eaτ∗ ) with τ∗ = a−b

log a−log b• Convex relaxation (semi-definite programming) works

• No computational gap for linear s!

Jiaming Xu Finding a Hidden Community in Networks 5

Linear community size and relatively sparse graph

• Linear community size: s = ρ n

• p = a lognn and q = b logn

n

Theorem (Hajek-Wu-X., Trans. IT 16)

• If ρ > ρ∗, exact recovery is possible in polynomial-time.

• If ρ < ρ∗, exact recovery is impossible.

Remarks

• ρ∗ = 1/(a− τ∗ log eaτ∗ ) with τ∗ = a−b

log a−log b• Convex relaxation (semi-definite programming) works

• No computational gap for linear s!

Jiaming Xu Finding a Hidden Community in Networks 5

Sublinear community size

[Hajek-Wu-X., COLT ’15]

1

1

p = cq = Θ(n−α)

s = Θ(nβ)

1/2

O α

β

2/3

impossible

easy

1/2PC hard

?

• s = Ω(n): SDP works

• s = n1−ε: no known poly-time algorithm

• Question: When does the computation barrier starts to emerge?s = Θ( n

logn)

Jiaming Xu Finding a Hidden Community in Networks 6

Sublinear community size

[Hajek-Wu-X., COLT ’15]

1

1

p = cq = Θ(n−α)

s = Θ(nβ)

1/2

O α

β

2/3

impossible

easy

1/2PC hard

?

• s = Ω(n): SDP works

• s = n1−ε: no known poly-time algorithm

• Question: When does the computation barrier starts to emerge?

s = Θ( nlogn)

Jiaming Xu Finding a Hidden Community in Networks 6

Sublinear community size

[Hajek-Wu-X., COLT ’15]

1

1

p = cq = Θ(n−α)

s = Θ(nβ)

1/2

O α

β

2/3

impossible

easy

1/2PC hard

?

• s = Ω(n): SDP works

• s = n1−ε: no known poly-time algorithm

• Question: When does the computation barrier starts to emerge?s = Θ( n

logn)

Jiaming Xu Finding a Hidden Community in Networks 6

Belief propagation vs. IT Limits: exact recovery

[Hajek-Wu-X., 15] There is a constant CBP(p/q), such that

• s ≥ CBPn

logn : BP attains the IT limit with sharp constants

• s = (CBP − ε) nlogn : BP is order-wise optimal, but strictly

suboptimal by a constant factor

• s = o( nlogn) and s→∞: BP is order-wise suboptimal

Remarks

• Negative results apply to all local algorithms

Jiaming Xu Finding a Hidden Community in Networks 7

Belief propagation vs. IT Limits: exact recovery

[Hajek-Wu-X., 15] There is a constant CBP(p/q), such that

• s ≥ CBPn

logn : BP attains the IT limit with sharp constants

• s = (CBP − ε) nlogn : BP is order-wise optimal, but strictly

suboptimal by a constant factor

• s = o( nlogn) and s→∞: BP is order-wise suboptimal

Remarks

• Negative results apply to all local algorithms

Jiaming Xu Finding a Hidden Community in Networks 7

Outline of the remainder of the talk

1 Weak recovery via belief propagation

E [# of misclassified vertices ] = o(s)

2 Exact recovery via weak recovery plus voting

P no misclassification n→∞−−−→ 1

Jiaming Xu Finding a Hidden Community in Networks 8

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degreenq s(p− q) + nqs(p− q)

λ ,[s(p− q)]2

nqSignal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degreenq s(p− q) + nqs(p− q)

λ ,

[s(p− q)]2

nqSignal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degreenq s(p− q) + nqs(p− q)

λ ,

[s(p− q)]2

nq

Signal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degreenq s(p− q) + nqs(p− q)

λ ,[s(p− q)]2

nqSignal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degreenq s(p− q) + nqs(p− q)

λ ,[s(p− q)]2

nqSignal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

A naıve degree thresholding

• • • • •Binom(n− 1, q)

outside cluster •

• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)

in cluster

degree

small λ

λ ,[s(p− q)]2

nqSignal-to-noise ratio

Jiaming Xu Finding a Hidden Community in Networks 9

Message passing

widely used in iterative decoding, distributed computing, networking,information spreading, combinatorial optimization...

Jiaming Xu Finding a Hidden Community in Networks 10

tmk i→1tmi+→

tmj i→

Picture courtesy of David Gamarnik

• Iterative, distributed algorithms• Using minimal computation and little memory• Time complexity in each iteration is linear in number of edges

Jiaming Xu Finding a Hidden Community in Networks 11

Belief propagation for inferring the hidden community

For np = no(1), the t-local neighborhood is locally a Poisson tree

• • • • • • • • •i

`

π(i)

•••••••• ••••••

mt+1i→π(i) = −s(p− q) +

∑`∈∂i

f(mt`→i)

• m0`→i ≡ 0 and f(x) = log

(ex(sp/(n−s)q)+1exs/(n−s)+1

)(Bayes’ rule)

• m1i→j corresponds to degree information

Jiaming Xu Finding a Hidden Community in Networks 12

Belief propagation for inferring the hidden community

For np = no(1), the t-local neighborhood is locally a Poisson tree

• • • • • • • • •i

`

π(i)

•••••••• ••••••

mt+1i→π(i) = −s(p− q) +

∑`∈∂i

f(mt`→i)

• m0`→i ≡ 0 and f(x) = log

(ex(sp/(n−s)q)+1exs/(n−s)+1

)(Bayes’ rule)

• m1i→j corresponds to degree information

Jiaming Xu Finding a Hidden Community in Networks 12

mti→j : i /∈ C∗ mt

i→j : i ∈ C∗

t = 1

at

Analysis techniques

• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree

(Bhattacharyya coef.) ρB = E[em

ti→j/2|i /∈ C∗

]a2t+1 ≈ λea

2t

Jiaming Xu Finding a Hidden Community in Networks 13

mti→j : i /∈ C∗ mt

i→j : i ∈ C∗

t = 2

at

Analysis techniques

• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree

(Bhattacharyya coef.) ρB = E[em

ti→j/2|i /∈ C∗

]a2t+1 ≈ λea

2t

Jiaming Xu Finding a Hidden Community in Networks 13

mti→j : i /∈ C∗ mt

i→j : i ∈ C∗

t = 3

at

Analysis techniques

• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree

(Bhattacharyya coef.) ρB = E[em

ti→j/2|i /∈ C∗

]a2t+1 ≈ λea

2t

Jiaming Xu Finding a Hidden Community in Networks 13

mti→j : i /∈ C∗ mt

i→j : i ∈ C∗

t = 3

at

Analysis techniques

• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree

(Bhattacharyya coef.) ρB = E[em

ti→j/2|i /∈ C∗

]

a2t+1 ≈ λea

2t

Jiaming Xu Finding a Hidden Community in Networks 13

mti→j : i /∈ C∗ mt

i→j : i ∈ C∗

t = 3

at

Analysis techniques

• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree

(Bhattacharyya coef.) ρB = E[em

ti→j/2|i /∈ C∗

]a2t+1 ≈ λea

2t

Jiaming Xu Finding a Hidden Community in Networks 13

a2t+1 ≈ λea

2t

x

yλ > 1/e

y = λex

•••

at →∞

x

yλ ≤ 1/e y = λex

1

•• ••

at ≤ 1

Jiaming Xu Finding a Hidden Community in Networks 14

a2t+1 ≈ λea

2t

x

yλ > 1/e

y = λex

•••

at →∞

x

yλ ≤ 1/e y = λex

1

•• ••

at ≤ 1

Jiaming Xu Finding a Hidden Community in Networks 14

Phase transition for belief propagation

Theorem (Hajek-Wu-X. ’15)

Assume s = o(n) and np = no(1). For weak recovery (misclassifies o(s)nodes),

0λ = s2(p−q)2

nq1/e

BP fails BP succeeds

Remarks:

• Needs log∗(n) iterations. For n ∈ (65536, 265536], log∗(n) = 5

• The critical point 1/e is predicted by [Montanari ’15]

• Belief propagation for community detection is proposed by[Decelle-Krzakala-Moore-Zdeborova ’13]

Jiaming Xu Finding a Hidden Community in Networks 15

Voting for exact recovery given weak recovery

• • • • • • • • •

Hypothesis testing for a single vertex

H0 vs. H1

priors: π0 = 1− s/n π1 = s/ndistributions: Binom(s− 1, q) Binom(s− 1, p)

• Exact recovery is guaranteed if pe = o(1/n), which requires

λ = Θ

(s log n

n

)• The idea of weak recovery plus voting ⇒ exact recovery is also used

in detecting multiple communities [Abbe-Bandeira-Hall ’15][Mossel-Neeman-Sly ’15]

Jiaming Xu Finding a Hidden Community in Networks 16

Voting for exact recovery given weak recovery

• • • • • • • • •

Hypothesis testing for a single vertex

H0 vs. H1

priors: π0 = 1− s/n π1 = s/ndistributions: Binom(s− 1, q) Binom(s− 1, p)

• Exact recovery is guaranteed if pe = o(1/n), which requires

λ = Θ

(s log n

n

)• The idea of weak recovery plus voting ⇒ exact recovery is also used

in detecting multiple communities [Abbe-Bandeira-Hall ’15][Mossel-Neeman-Sly ’15]

Jiaming Xu Finding a Hidden Community in Networks 16

Summary

C nlogn

s

λ

log n

1/e

optimalBP subpotimal

λ = Θ(s lognn

)IT limit of exact recovery

BP limit of weak recovery

BP plus voting becomes suboptimal for exact recoverywhen community size s falls below threshold C n

logn

Jiaming Xu Finding a Hidden Community in Networks 17

Conclusion

1

1

p = cq = Θ(n−α)

s = Θ(nβ)

1/2

O α

β

2/3

impossible

easy

1/2PC hard

?

[Hajek-Wu-X., COLT ’16]: Semidefinite programming relaxation becomes

suboptimal as soon as s = Θ(

nlogn

).

Jiaming Xu Finding a Hidden Community in Networks 18

Selected references

• Y. Deshpande and A. Montanari. Finding hidden cliques of size√

N/e in nearlylinear time. Foundations of Computational Mathematics, 15(4):1069–1128,August 2015.

• A. Montanari. Finding one community in a sparse random graph. arXiv1502.05680, Feb 2015.

• C. Bordenave, M. Lelarge, and L. Massoulie. Non-backtracking spectrum ofrandom graphs: community detection and non-regular Ramanujan graphs. arXiv1501.06087, January 2015.

• B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hiddencommunity. arXiv 1509.07859, September 2015.

• B. Hajek, Y. Wu, and J. Xu. Recovering a hidden community beyond thespectral limit in O(|E| log∗ |V |) time. arXiv 1510.02786, October 2015.

• B. Hajek, Y. Wu, and J. Xu. Computational lower bounds for communitydetection on random graphs. COLT 2015; arXiv:1406.6625

Thanks!

Jiaming Xu Finding a Hidden Community in Networks 19

Selected references

• Y. Deshpande and A. Montanari. Finding hidden cliques of size√

N/e in nearlylinear time. Foundations of Computational Mathematics, 15(4):1069–1128,August 2015.

• A. Montanari. Finding one community in a sparse random graph. arXiv1502.05680, Feb 2015.

• C. Bordenave, M. Lelarge, and L. Massoulie. Non-backtracking spectrum ofrandom graphs: community detection and non-regular Ramanujan graphs. arXiv1501.06087, January 2015.

• B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hiddencommunity. arXiv 1509.07859, September 2015.

• B. Hajek, Y. Wu, and J. Xu. Recovering a hidden community beyond thespectral limit in O(|E| log∗ |V |) time. arXiv 1510.02786, October 2015.

• B. Hajek, Y. Wu, and J. Xu. Computational lower bounds for communitydetection on random graphs. COLT 2015; arXiv:1406.6625

Thanks!

Jiaming Xu Finding a Hidden Community in Networks 19

top related