some recent progress in combinatorial statistics

34
Some Recent Progress in Combinatorial Statistics Elchanan Mossel, UC Berkeley + Weizmann Institute http://www.stat.berkeley.edu/~mossel

Upload: leslie-trevino

Post on 31-Dec-2015

41 views

Category:

Documents


2 download

DESCRIPTION

Some Recent Progress in Combinatorial Statistics. Elchanan Mossel, UC Berkeley + Weizmann Institute http://www.stat.berkeley.edu/~mossel. What is Combinatorial Statistics. “ Combinatorial Statistics ” deals with Inference of Discrete parameters such as Graph, Trees or Partitions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Some Recent Progress in Combinatorial Statistics

Some Recent Progress in Combinatorial Statistics

Elchanan Mossel,

UC Berkeley + Weizmann Institute

http://www.stat.berkeley.edu/~mossel

Page 2: Some Recent Progress in Combinatorial Statistics

What is Combinatorial Statistics

• “Combinatorial Statistics” deals with Inference of

• Discrete parameters such as Graph, Trees or Partitions.

• With the goal of providing algorithms with guarantees on:

• Sampling Complexity, Computational Complexity and Success Probability.

• Alternatively: Sampling or Complexity Lower bounds.

Page 3: Some Recent Progress in Combinatorial Statistics

Example 1: Graphical Model reconstruction

Page 4: Some Recent Progress in Combinatorial Statistics

Markov random fields / Graphical Models

• A common model for stochastic networks

bounded degree graph G = (V, E)

weight functions C: | C |R≥0

for every clique C 1

2

3

4

(1, 2)

(2, 3)

(3, 4)

(1

, 3

)

Page 5: Some Recent Progress in Combinatorial Statistics

Markov Random Fields / Graphical Models

• A common model for stochastic networks

nodes v are assigned values av in alphabet

Pr[] ~ C C(au, u 2 C)

bounded degree graph G = (V, E)

distribution over states V given by

weight functions C: | C |R≥0

for every edge clique C 1

2

3

4

(1, 2) ( , )

(2, 3)( ,

)

(3, 4)( , )

(1

, 3

)( ,

)

where the product goes over all Cliques C

Page 6: Some Recent Progress in Combinatorial Statistics

Reconstruction task for Markov random fields

• Suppose we can obtain independent samples from the Markov random field

• Given observed data at the nodes, is it possible to reconstruct the model (network)?

• Important: Do we see data at all of the nodes or only a subset?

Page 7: Some Recent Progress in Combinatorial Statistics

Problem: Given k independent samples of = (,…n) at all the nodes, find the graph G

(Given activity at all the nodes, can network be reconstructed?)

• Restrict attention to graphs with max degree d:

• A structure estimator is a map

Questions:

1. How many samples k are required (asymptotically) in order to reconstruct MRFs with number of nodes n, max degree d with probability approaching 1, I.e.

2. Want an efficient algorithm for reconstruction.

Reconstruction problem

Page 8: Some Recent Progress in Combinatorial Statistics

• Tree Markov Fields can be reconstructed efficiently (even with hidden nodes).• [Erdös,Steel,Szekely,Warnow,99], [Daskalakis,Mossel,Roch,06].

• PAC Setup: [Abbeel,Koller,Ng, ‘06] produce a factorized distribution that is n close in Kullback-Leibler divergence to the true distribution.

• No guarantee to reconstruct the correct graph • Running time and sampling complexity is nO(d)

• More restricted problem studied by [Wainwright,Ravikumar,Lafferty, ‘06]• Restricted to Ising model, sample complexity (d5 log n), difficult to verify

convergence condition – technique based on L1 regularization. Moreover works for graphs not for graphical models! (clique potentials not allowed).

• Subsequent to our results, [Santhanam,Wainwright, ‘08] determine information theoretic sampling complexity and [Wainwright,Ravikumar,Lafferty, ‘08] get (d log n) sampling (restricted to Ising models; still no checkable guarantee for convergence).

Related work

Page 9: Some Recent Progress in Combinatorial Statistics

Related work

Method Abeel et al

Wainwright et al Bresler et al.

Generative model

MRF General

Collection of Edges Ising

MRF

General

Reconstruct Dist of small KL Distance

Graph Graph

Additional conditions

No Yes (very hard to check)

No

Running time nd n5 nd

Sampling Complexity

poly(n) d5 log n

Later:

d log n

d log n

Page 10: Some Recent Progress in Combinatorial Statistics

Theorem (Bresler-M-Sly-08; Lower bound on sample complexity): • In order to recover G of max-deg d need at least c d log n samples, for

some constant c.• Pf follows by “counting # of networks”; information theory lower bounds.• More formally: Given any prior distribution which is uniform over degree d

graphs (no restrictions on the potentials), in order to recover correct graph with probability ¸ 2-n need at least c d log n samples.

Reconstructing General Networks - New Results

Theorem (Bresler-M-Sly-08; Asymptotically optimal algorithm): • If distribution is “non-degenerate” c d log n samples suffice to reconstruct the model with probability ¸ 1 – 1/n100, for some (other) constant c.• Running time is nO(d)

• (sampling complexity tight up to a constant factor; running time – unknown)

Page 11: Some Recent Progress in Combinatorial Statistics

• Observation: Knowing graph is same as knowing neighborhoods• But neighborhood is determined by Markov property• Same intuition behind work of Abeel et. al.

Intuition Behind Algorithms

“Algorithm”:

Step 1. Compute empirical probabilities,

which are concentrated around

true probabilities

Step 2. For each node, simply test

Markov property of each

candidate neighborhood

Main Challenge: Show non-degeneracy ) algorithm works

Page 12: Some Recent Progress in Combinatorial Statistics

Example 2: Reconstruction of MRF with Hidden nodes

1

2

3

4

(1, 2) ( , )

(2, 3)( ,

)

(3, 4)( , )

(1

, 3

)( ,

)

• In many applications only some of the nodes can be observed

visible nodes W V

Markov random field overvisible nodes is

W = (w : w W)

• Is reconstruction still possible?

• What does “reconstruction” even mean?

Page 13: Some Recent Progress in Combinatorial Statistics

Reconstruction versus distinguishing

• Easy to construct many models that lead to the same distribution (statistical unidentifiable)

• Assuming “this is not a problem” are there computational obstacles for reconstruction?

• In particular: how hard is it to distinguish statistically different models?

Page 14: Some Recent Progress in Combinatorial Statistics

Distinguishing problems

• Let M1, M2 be two models with hidden nodes

• Can you tell if M1 and M2 are statistically close or far apart (on the visible nodes)?

• Assuming M1 and M2 are statistically far apart and given access to samples from one of them, can you tell where the samples came from?

PROBLEM 2

PROBLEM 1

Page 15: Some Recent Progress in Combinatorial Statistics

Problems 1 and 2 are intractable (in the worst case) unless NP = RP

Hardness result with hidden nodes

• In Bogdanov-M-Vadhan-08:

• Conversely, if NP = RP then distinguishing (and other forms of reconstruction) are achievable

Page 16: Some Recent Progress in Combinatorial Statistics

A possible objection

• The “hard” models M1, M2 describe distributions that are not efficiently samplable

• But if nature is efficient, we never need to worry about such distributions!

Page 17: Some Recent Progress in Combinatorial Statistics

Distinguishing problem for samplable distributions

• If M1 and M2 are statistically far apart and given access to samples from one of them, can you tell where the samples came from, assuming M1 and M2 are efficiently samplable?

• Theorem

– We don’t know if this is tight

PROBLEM 3

Problem 3 is intractable unless computational zero knowledge is trivial

Page 18: Some Recent Progress in Combinatorial Statistics

18

Example 3: Consensus Ranking and the Mallows Model

Problem Given a set of rankings {1, 2, ... N} find the consensus ranking (or central ranking)

for d = distance on the set of permutations of n objectsMost natural is dK which is the Kendall distance.

Page 19: Some Recent Progress in Combinatorial Statistics

19

The Mallows Model

Exponential family model in :

P( | 0) = Z()-1 exp(- dK(,0))

´ 0 : uniform distribution over permutations

> 0 : 0 is the unique mode of P,0

Theorem [Meila,Phadnis,Patterson,Bilmes 07]

Consensus ranking (i.e ML estimation of 0 for constant ) can be solved

exactly by a branch and bound (B&B) algorithm.

The B&B algorithm can take (super-) exponential time in some cases

Seem to perform well on simulated data.

Page 20: Some Recent Progress in Combinatorial Statistics

20

Related work

ML estimation [Fligner&Verducci 86] introduce generalized Mallows model, ML estimation [Fligner&Verducci 88] (FV) heuristic for 0 estimation

Consensus ranking [Cohen,Schapire,Singer 99] Greedy algorithm (CSS)

+ improvement by finding strongly connected components + missing data (not all i rank all n items)

[Ailon,Newman,Charikar 05] Randomized algorithm guaranteed 11/7 factor approximation (ANC)

[Mathieu, 07] (1+) approximation, time O(n6/+2^2O(1/))

[Davenport,Kalagnanan 03] Heuristics based on edge-disjoint cycles [Conitzer,D,K 05] Exact algorithm based on integer programming, better

bounds

Page 21: Some Recent Progress in Combinatorial Statistics

Efficient Sorting of the Mallow’s model

• [Braverman-Mossel-09]:

• Given r independent samples from the Mallows Model, find ML solution exactly! in time nb,where

• b = 1 + O(( r)-1), • where r is the number of samples • with high probability (say ¸ 1-n-100)

Page 22: Some Recent Progress in Combinatorial Statistics

Sorting the Mallow’s Model

• [Braverman-M-09]: Optimal order can be found in polynomial time and O(n log n) queries.

• Proof Ingredient 1: “statistical properties” of generated permutations i in terms of the original order 0 :

• With high probability: x |i(x)-(x)| = O(n),

max |i(x)-(x)| = O(log n)

•Additional ingredient: A dynamic programming algorithm to find given a starting point where each elements is at most k away with running time O(n 26k)

Page 23: Some Recent Progress in Combinatorial Statistics

Example 4: MCMC Diagnostics

• Sampling from a joint probability distribution π on Ω.• Construct Markov chain X0, X1, …which converges to π.

• Multidimensional Integration• Bayesian inference (sampling from “posterior distributions” of the model)• Computational physics, • Computational biology,• Vision, Image segmentation.

Page 24: Some Recent Progress in Combinatorial Statistics

Convergence Diagnostics

• A method to determine t such that Pt(X0, ) is “sufficiently close” to π. - MCMC in Practice, Gilks, Richardson & Spiegelhalter

Are we there yet?

Page 25: Some Recent Progress in Combinatorial Statistics

Convergence Diagnostics in the Literature

Visual inspection of functions of samples.- MCMC in Practice, Gilks, Richardson & Spiegelhalter

[Raftery-Lewis]Bounding the variance of estimates of quantiles of functions of parameters.

[Gelman-Rubin]Convergence achieved if the separate empirical distributions are approximately the same as the combined distribution.

[Cowles-Carlin]• Survey of 13 diagnostics designed to identify specific types of convergence failure.• Recommend combinations of strategies – running parallel chains, computing autocorrelations.

Page 26: Some Recent Progress in Combinatorial Statistics

Diagnostic Algorithm

X Y

Input:

An MC P on Ω {0,1}n and a time t such that either • T(1/4) < t • T(1/4) > 100t

Output: • “Yes” if T(1/4) < t • “No” T(1/4) > 100t

Diagnostic algorithm A:

Page 27: Some Recent Progress in Combinatorial Statistics

Results - The General CaseTheorem [Bhatnagar-Bogdanov-M-09]: Given

1. A MC P on Ω {0,1}n 2. A time t such that either • T(1/4) < t • T(1/4) > 100t

It is a PSPACE-complete problem to distinguish the two.

Page 28: Some Recent Progress in Combinatorial Statistics

PSPACE complete problems

[Crasmaru-Tromp] Ladder problems in GO – does black have a winning strategy?

PSPACE hard problems are “much harder” the NP-hard problems.

Page 29: Some Recent Progress in Combinatorial Statistics

Results - Polynomial Mixing

Theorem 2 [Bhatnagar-Bogdanov-M-09]: Given

1. A MC P on Ω {0,1}n with a promise that T(1/4) < poly(n) 2. A time t such that either • T(1/4) < t • T(1/4) > 100t• It is as hard as any coNP problem to distinguish the two.

(x0+x3+x7)^(x2+x7+xn) …^(…)

(x6+x3+x2)^(x0+x7+x1) …^(…)

T(1/4) < t

T(1/4) > 100t

Page 30: Some Recent Progress in Combinatorial Statistics

Results - Mixing from a Given State

Theorem 3 [Bhatnagar-Bogdanov-M-09]: : Given

1. A MC P on Ω {0,1}n such that T(1/4) < poly(n) 2. A starting state X0, 3. A time t such that either • TX0

(1/4) < t

• TX0(1/4) > 100t

A diagnostic algorithm can be used to distinguish whether two efficiently sampleable distributions are close or far in statistical distance.

[Sahai-Vadhan] “Statistical Zero Knowledge” - complete

a1 am…

r1 rn…

C1

a1 am…

r’1 r’n…

C2

Page 31: Some Recent Progress in Combinatorial Statistics

Comb. Stat. vs. Common ML Approaches

• Some features of Combinatorial Statistics in comparison to ML:

• +: Using explicit well defined models.

• +: Explicit non-asymptotic running time / sampling bounds.

• - : Often not “statistically efficient”.

• - : Inputs must be generated from correct dist. or close to it (but: often: NP hard otherwise; so true for of all methods)

• +/- : Need to understand combinatorial structure for each new problem vs:

• Applying general statistical techniques and structure is hidden in programming/design etc.

Page 32: Some Recent Progress in Combinatorial Statistics

Conclusions and Questions

• [Cowles-Carlin] Survey of 13 diagnostics designed to identify specific types of convergence failure. Each can fail to detect the type of convergence it was designed for.

• [Arora-Barak] Efficient algorithms are not believed to exist for PSPACE-complete, coNP-complete or SZK-complete problems.

• Diagnostic algorithms do not exist for large classes of MCMC algorithms, like Gibbs samplers unless there are efficient algorithms for PSPACE or coNP or SZK.

• For what classes of samplers are convergence diagnostics possible?

• Hardness of testing convergence for more restricted classes of samplers.

Page 33: Some Recent Progress in Combinatorial Statistics

Collaborators

Salil Vadhan:Harvard

Andrej Bogdanov:Hong-Kong

Guy BreslerBerkeley

Allan SlyBerkeley

Nayantara Bhatnagar, Berkeley

Mark BravermanMicrosoft

Page 34: Some Recent Progress in Combinatorial Statistics

Thanks !!Thanks !!