information and computation limits for finding a hidden ...jx77/jiaming-aps17.pdf · information...
TRANSCRIPT
Information and Computation Limits for Finding aHidden Community in Networks
Jiaming Xu
Krannert School of ManagementPurdue University
Joint work with Bruce Hajek (Illinois) and Yihong Wu (Yale)
Applied Probability Society Conference, July 12, 2017
A single hidden community – graph view
G(n, s, p, q)
1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Jiaming Xu Finding a Hidden Community in Networks 2
A single hidden community – graph view
G(n, s, p, q)
1 A community of s vertices are chosen randomly
2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
••
••••
••
••
••
••••
••
••
••••
•••• ••
••
••
••
••
••••••
••
••
••
••
••
••
•••••• ••••
••••••
••
••
••
••
••
••
••
••
••
••
••••
••
••
••
••
••
••
••
••
••
••••
•• ••
••
••
••••
••
••
••
••
••
••
••
••
••
••
••
•• ••
Jiaming Xu Finding a Hidden Community in Networks 2
A single hidden community – graph view
G(n, s, p, q)
1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p
3 For other pairs of nodes, add an edge w.p. q
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
••
••••
••
••
••
••••
••
••
••••
•••• ••
••
••
••
••
••••••
••
••
••
••
••
••
•••••• ••••
••••••
••
••
••
••
••
••
••
••
••
••
••••
••
••
••
••
••
••
••
••
••
••••
•• ••
••
••
••••
••
••
••
••
••
••
••
••
••
••
••
•• ••
Jiaming Xu Finding a Hidden Community in Networks 2
A single hidden community – graph view
G(n, s, p, q)
1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
••
••
••••
••
••
••
••••
••
••
••••
•••• ••
••
••
••
••
••••••
••
••
••
••
••
••
•••••• ••••
••••••
••
••
••
••
••
••
••
••
••
••
••••
••
••
••
••
••
••
••
••
••
••••
•• ••
••
••
••••
••
••
••
••
••
••
••
••
••
••
••
•• ••
Jiaming Xu Finding a Hidden Community in Networks 2
A single hidden community – graph view
G(n, s, p, q)
1 A community of s vertices are chosen randomly2 For every pair of nodes in the community, add an edge w.p. p3 For other pairs of nodes, add an edge w.p. q
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Jiaming Xu Finding a Hidden Community in Networks 2
A single hidden community – adjacency matrix view
Of course not ordered
n = 200, s = 50, p = 0.3, q = 0.1
Jiaming Xu Finding a Hidden Community in Networks 3
A single hidden community – adjacency matrix view
Of course not ordered
n = 200, s = 50, p = 0.3, q = 0.1
Jiaming Xu Finding a Hidden Community in Networks 3
A single hidden community – adjacency matrix view
Of course not ordered
n = 200, s = 50, p = 0.3, q = 0.1
Jiaming Xu Finding a Hidden Community in Networks 3
Computational gap in planted clique
p = 1
q = 1/2
• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)
• s &√n log n: Trivial by counting degrees [Kucera ’95]
• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]
[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari
’13]...
• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]
[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...
Jiaming Xu Finding a Hidden Community in Networks 4
Computational gap in planted clique
p = 1
q = 1/2
• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)
• s &√n log n: Trivial by counting degrees [Kucera ’95]
• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]
[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari
’13]...
• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]
[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...
Jiaming Xu Finding a Hidden Community in Networks 4
Computational gap in planted clique
p = 1
q = 1/2
• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)
• s &√n log n: Trivial by counting degrees [Kucera ’95]
• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]
[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari
’13]...
• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]
[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...
Jiaming Xu Finding a Hidden Community in Networks 4
Computational gap in planted clique
p = 1
q = 1/2
• s ≥ 2(1 + ε) log2 n: Possible via ML estimator (exhaustive search)
• s &√n log n: Trivial by counting degrees [Kucera ’95]
• s ≥ ε√n: Polynomial-time recoverable [Alon-Krivelevich-Sudakov ’98]
[Feige-Ron ’10] [Dekel–Gurel-Gurevich–Peres ’11] [Deshpande-Montanari
’13]...
• s = o(√n): Believed to be hard [Jerrum’92] [Feige-Krauthegamer ’03]
[Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]...
Jiaming Xu Finding a Hidden Community in Networks 4
Linear community size and relatively sparse graph
• Linear community size: s = ρ n
• p = a lognn and q = b logn
n
Theorem (Hajek-Wu-X., Trans. IT 16)
• If ρ > ρ∗, exact recovery is possible in polynomial-time.
• If ρ < ρ∗, exact recovery is impossible.
Remarks
• ρ∗ = 1/(a− τ∗ log eaτ∗ ) with τ∗ = a−b
log a−log b• Convex relaxation (semi-definite programming) works
• No computational gap for linear s!
Jiaming Xu Finding a Hidden Community in Networks 5
Linear community size and relatively sparse graph
• Linear community size: s = ρ n
• p = a lognn and q = b logn
n
Theorem (Hajek-Wu-X., Trans. IT 16)
• If ρ > ρ∗, exact recovery is possible in polynomial-time.
• If ρ < ρ∗, exact recovery is impossible.
Remarks
• ρ∗ = 1/(a− τ∗ log eaτ∗ ) with τ∗ = a−b
log a−log b• Convex relaxation (semi-definite programming) works
• No computational gap for linear s!
Jiaming Xu Finding a Hidden Community in Networks 5
Sublinear community size
[Hajek-Wu-X., COLT ’15]
1
1
p = cq = Θ(n−α)
s = Θ(nβ)
1/2
O α
β
2/3
impossible
easy
1/2PC hard
?
• s = Ω(n): SDP works
• s = n1−ε: no known poly-time algorithm
• Question: When does the computation barrier starts to emerge?s = Θ( n
logn)
Jiaming Xu Finding a Hidden Community in Networks 6
Sublinear community size
[Hajek-Wu-X., COLT ’15]
1
1
p = cq = Θ(n−α)
s = Θ(nβ)
1/2
O α
β
2/3
impossible
easy
1/2PC hard
?
• s = Ω(n): SDP works
• s = n1−ε: no known poly-time algorithm
• Question: When does the computation barrier starts to emerge?
s = Θ( nlogn)
Jiaming Xu Finding a Hidden Community in Networks 6
Sublinear community size
[Hajek-Wu-X., COLT ’15]
1
1
p = cq = Θ(n−α)
s = Θ(nβ)
1/2
O α
β
2/3
impossible
easy
1/2PC hard
?
• s = Ω(n): SDP works
• s = n1−ε: no known poly-time algorithm
• Question: When does the computation barrier starts to emerge?s = Θ( n
logn)
Jiaming Xu Finding a Hidden Community in Networks 6
Belief propagation vs. IT Limits: exact recovery
[Hajek-Wu-X., 15] There is a constant CBP(p/q), such that
• s ≥ CBPn
logn : BP attains the IT limit with sharp constants
• s = (CBP − ε) nlogn : BP is order-wise optimal, but strictly
suboptimal by a constant factor
• s = o( nlogn) and s→∞: BP is order-wise suboptimal
Remarks
• Negative results apply to all local algorithms
Jiaming Xu Finding a Hidden Community in Networks 7
Belief propagation vs. IT Limits: exact recovery
[Hajek-Wu-X., 15] There is a constant CBP(p/q), such that
• s ≥ CBPn
logn : BP attains the IT limit with sharp constants
• s = (CBP − ε) nlogn : BP is order-wise optimal, but strictly
suboptimal by a constant factor
• s = o( nlogn) and s→∞: BP is order-wise suboptimal
Remarks
• Negative results apply to all local algorithms
Jiaming Xu Finding a Hidden Community in Networks 7
Outline of the remainder of the talk
1 Weak recovery via belief propagation
E [# of misclassified vertices ] = o(s)
2 Exact recovery via weak recovery plus voting
P no misclassification n→∞−−−→ 1
Jiaming Xu Finding a Hidden Community in Networks 8
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degreenq s(p− q) + nqs(p− q)
λ ,[s(p− q)]2
nqSignal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degreenq s(p− q) + nqs(p− q)
λ ,
[s(p− q)]2
nqSignal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degreenq s(p− q) + nqs(p− q)
λ ,
[s(p− q)]2
nq
Signal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degreenq s(p− q) + nqs(p− q)
λ ,[s(p− q)]2
nqSignal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degreenq s(p− q) + nqs(p− q)
λ ,[s(p− q)]2
nqSignal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
A naıve degree thresholding
•
• • • • •Binom(n− 1, q)
outside cluster •
• • • • • • • • •Binom(s− 1, p) + Binom(n− s, q)
in cluster
degree
small λ
λ ,[s(p− q)]2
nqSignal-to-noise ratio
Jiaming Xu Finding a Hidden Community in Networks 9
Message passing
widely used in iterative decoding, distributed computing, networking,information spreading, combinatorial optimization...
Jiaming Xu Finding a Hidden Community in Networks 10
tmk i→1tmi+→
tmj i→
Picture courtesy of David Gamarnik
• Iterative, distributed algorithms• Using minimal computation and little memory• Time complexity in each iteration is linear in number of edges
Jiaming Xu Finding a Hidden Community in Networks 11
Belief propagation for inferring the hidden community
For np = no(1), the t-local neighborhood is locally a Poisson tree
•
• • • • • • • • •i
`
π(i)
•••••••• ••••••
mt+1i→π(i) = −s(p− q) +
∑`∈∂i
f(mt`→i)
• m0`→i ≡ 0 and f(x) = log
(ex(sp/(n−s)q)+1exs/(n−s)+1
)(Bayes’ rule)
• m1i→j corresponds to degree information
Jiaming Xu Finding a Hidden Community in Networks 12
Belief propagation for inferring the hidden community
For np = no(1), the t-local neighborhood is locally a Poisson tree
•
• • • • • • • • •i
`
π(i)
•••••••• ••••••
mt+1i→π(i) = −s(p− q) +
∑`∈∂i
f(mt`→i)
• m0`→i ≡ 0 and f(x) = log
(ex(sp/(n−s)q)+1exs/(n−s)+1
)(Bayes’ rule)
• m1i→j corresponds to degree information
Jiaming Xu Finding a Hidden Community in Networks 12
mti→j : i /∈ C∗ mt
i→j : i ∈ C∗
t = 1
at
Analysis techniques
• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree
(Bhattacharyya coef.) ρB = E[em
ti→j/2|i /∈ C∗
]a2t+1 ≈ λea
2t
Jiaming Xu Finding a Hidden Community in Networks 13
mti→j : i /∈ C∗ mt
i→j : i ∈ C∗
t = 2
at
Analysis techniques
• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree
(Bhattacharyya coef.) ρB = E[em
ti→j/2|i /∈ C∗
]a2t+1 ≈ λea
2t
Jiaming Xu Finding a Hidden Community in Networks 13
mti→j : i /∈ C∗ mt
i→j : i ∈ C∗
t = 3
at
Analysis techniques
• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree
(Bhattacharyya coef.) ρB = E[em
ti→j/2|i /∈ C∗
]a2t+1 ≈ λea
2t
Jiaming Xu Finding a Hidden Community in Networks 13
mti→j : i /∈ C∗ mt
i→j : i ∈ C∗
t = 3
at
Analysis techniques
• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree
(Bhattacharyya coef.) ρB = E[em
ti→j/2|i /∈ C∗
]
a2t+1 ≈ λea
2t
Jiaming Xu Finding a Hidden Community in Networks 13
mti→j : i /∈ C∗ mt
i→j : i ∈ C∗
t = 3
at
Analysis techniques
• Couple the local nbrhd of a given node to a Poisson tree• Study the recursions of exponential moments of messages on tree
(Bhattacharyya coef.) ρB = E[em
ti→j/2|i /∈ C∗
]a2t+1 ≈ λea
2t
Jiaming Xu Finding a Hidden Community in Networks 13
a2t+1 ≈ λea
2t
x
yλ > 1/e
y = λex
•••
•
at →∞
x
yλ ≤ 1/e y = λex
1
•• ••
at ≤ 1
Jiaming Xu Finding a Hidden Community in Networks 14
a2t+1 ≈ λea
2t
x
yλ > 1/e
y = λex
•••
•
at →∞
x
yλ ≤ 1/e y = λex
1
•• ••
at ≤ 1
Jiaming Xu Finding a Hidden Community in Networks 14
Phase transition for belief propagation
Theorem (Hajek-Wu-X. ’15)
Assume s = o(n) and np = no(1). For weak recovery (misclassifies o(s)nodes),
0λ = s2(p−q)2
nq1/e
BP fails BP succeeds
Remarks:
• Needs log∗(n) iterations. For n ∈ (65536, 265536], log∗(n) = 5
• The critical point 1/e is predicted by [Montanari ’15]
• Belief propagation for community detection is proposed by[Decelle-Krzakala-Moore-Zdeborova ’13]
Jiaming Xu Finding a Hidden Community in Networks 15
Voting for exact recovery given weak recovery
•
• • • • • • • • •
Hypothesis testing for a single vertex
H0 vs. H1
priors: π0 = 1− s/n π1 = s/ndistributions: Binom(s− 1, q) Binom(s− 1, p)
• Exact recovery is guaranteed if pe = o(1/n), which requires
λ = Θ
(s log n
n
)• The idea of weak recovery plus voting ⇒ exact recovery is also used
in detecting multiple communities [Abbe-Bandeira-Hall ’15][Mossel-Neeman-Sly ’15]
Jiaming Xu Finding a Hidden Community in Networks 16
Voting for exact recovery given weak recovery
•
• • • • • • • • •
Hypothesis testing for a single vertex
H0 vs. H1
priors: π0 = 1− s/n π1 = s/ndistributions: Binom(s− 1, q) Binom(s− 1, p)
• Exact recovery is guaranteed if pe = o(1/n), which requires
λ = Θ
(s log n
n
)• The idea of weak recovery plus voting ⇒ exact recovery is also used
in detecting multiple communities [Abbe-Bandeira-Hall ’15][Mossel-Neeman-Sly ’15]
Jiaming Xu Finding a Hidden Community in Networks 16
Summary
C nlogn
s
λ
log n
1/e
optimalBP subpotimal
λ = Θ(s lognn
)IT limit of exact recovery
BP limit of weak recovery
BP plus voting becomes suboptimal for exact recoverywhen community size s falls below threshold C n
logn
Jiaming Xu Finding a Hidden Community in Networks 17
Conclusion
1
1
p = cq = Θ(n−α)
s = Θ(nβ)
1/2
O α
β
2/3
impossible
easy
1/2PC hard
?
[Hajek-Wu-X., COLT ’16]: Semidefinite programming relaxation becomes
suboptimal as soon as s = Θ(
nlogn
).
Jiaming Xu Finding a Hidden Community in Networks 18
Selected references
• Y. Deshpande and A. Montanari. Finding hidden cliques of size√
N/e in nearlylinear time. Foundations of Computational Mathematics, 15(4):1069–1128,August 2015.
• A. Montanari. Finding one community in a sparse random graph. arXiv1502.05680, Feb 2015.
• C. Bordenave, M. Lelarge, and L. Massoulie. Non-backtracking spectrum ofrandom graphs: community detection and non-regular Ramanujan graphs. arXiv1501.06087, January 2015.
• B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hiddencommunity. arXiv 1509.07859, September 2015.
• B. Hajek, Y. Wu, and J. Xu. Recovering a hidden community beyond thespectral limit in O(|E| log∗ |V |) time. arXiv 1510.02786, October 2015.
• B. Hajek, Y. Wu, and J. Xu. Computational lower bounds for communitydetection on random graphs. COLT 2015; arXiv:1406.6625
Thanks!
Jiaming Xu Finding a Hidden Community in Networks 19
Selected references
• Y. Deshpande and A. Montanari. Finding hidden cliques of size√
N/e in nearlylinear time. Foundations of Computational Mathematics, 15(4):1069–1128,August 2015.
• A. Montanari. Finding one community in a sparse random graph. arXiv1502.05680, Feb 2015.
• C. Bordenave, M. Lelarge, and L. Massoulie. Non-backtracking spectrum ofrandom graphs: community detection and non-regular Ramanujan graphs. arXiv1501.06087, January 2015.
• B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hiddencommunity. arXiv 1509.07859, September 2015.
• B. Hajek, Y. Wu, and J. Xu. Recovering a hidden community beyond thespectral limit in O(|E| log∗ |V |) time. arXiv 1510.02786, October 2015.
• B. Hajek, Y. Wu, and J. Xu. Computational lower bounds for communitydetection on random graphs. COLT 2015; arXiv:1406.6625
Thanks!
Jiaming Xu Finding a Hidden Community in Networks 19