parallel gibbs sampling from colored fields to thin junction trees

59
Carnegie Mellon Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Yuchen g Low Arthur Gretton Carlos Guestr in Joseph Gonzale z

Upload: kina

Post on 23-Feb-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees. Joseph Gonzalez. Yucheng Low. Arthur Gretton. Carlos Guestrin. Sampling as an Inference Procedure. Suppose we wanted to know the probability that coin lands “heads” We use the same idea for graphical model inference. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Parallel Gibbs SamplingFrom Colored Fields to Thin Junction Trees

Yucheng Low

Arthur Gretton

Carlos Guestrin

Joseph Gonzalez

Page 2: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

2

Inference:Inference:Graphical

Model

Sampling as an Inference ProcedureSuppose we wanted to know the probability that coin lands “heads”

We use the same idea for graphical model inference

4x Heads“Draw Samples”

Counts

6x Tails

X1

X2

X3

X4

X5

X6

Page 3: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Terminology: Graphical ModelsFocus on discrete factorized models with sparse structure:

X1 X2

X5

X3 X4

f1,2

f1,3 f2,4

f3,4

f2,4,5

X1 X2

X5

X3 X4

FactorGraph

MarkovRandom

Field

Page 4: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Terminology: ErgodicityThe goal is to estimate:

Example: marginal estimation

If the sampler is ergodic the following is true*:

*Consult your statistician about potential risks before using.

Page 5: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

5

Gibbs Sampling [Geman & Geman, 1984]

Sequentially for each variable in the modelSelect variableConstruct conditional given adjacent assignments Flip coin and updateassignment to variable

Initi

al A

ssig

nmen

t

Page 6: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Why Study Parallel Gibbs Sampling?

“The Gibbs sampler ... might be considered the workhorse of the MCMC world.”

–Robert and Casella

Ergodic with geometric convergenceGreat for high-dimensional models

No need to tune a joint proposal

Easy to construct algorithmicallyWinBUGS

Important Properties that help Parallelization:Sparse structure factorized computation

Page 7: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Is the Gibbs Sampler trivially parallel?

Page 8: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

8

From the original paper on Gibbs Sampling:

“…the MRF can be divided into collections of [variables] with each collection assigned to an independently running asynchronous processor.”

Converges to the wrong distribution!

-- Stuart and Donald Geman, 1984.

Page 9: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

9

The problem with Synchronous Gibbs

Adjacent variables cannot be sampled simultaneously.

Strong PositiveCorrelation

t=0

Parallel Execution

t=2 t=3

Strong PositiveCorrelation

t=1

Sequential

Execution

Strong NegativeCorrelation

Page 10: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

How has the machine learning community solved

this problem?

Page 11: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Two Decades later

Same problem as the original Geman paperParallel version of the sampler is not ergodic.

Unlike Geman, the recent work:Recognizes the issueIgnores the issue Propose an “approximate” solution

1. Newman et al., Scalable Parallel Topic Models. Jnl. Intelligen. Comm. R&D, 2006.2. Newman et al., Distributed Inference for Latent Dirichlet Allocation. NIPS, 2007.3. Asuncion et al., Asynchronous Distributed Learning of Topic Models. NIPS, 2008.4. Doshi-Velez et al., Large Scale Nonparametric Bayesian Inference: Data

Parallelization in the Indian Buffet Process. NIPS 20095. Yan et al., Parallel Inference for Latent Dirichlet Allocation on GPUs. NIPS, 2009.

Page 12: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Two Decades AgoParallel computing community studied:

Construct an Equivalent Parallel Algorithm

Tim

e

Sequential Algorithm Directed Acyclic Dependency Graph

UsingGraph Coloring

Page 13: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

13

Time

Chromatic Sampler

Compute a k-coloring of the graphical modelSample all variables with same color in parallel

Sequential Consistency:

Page 14: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Chromatic Sampler Algorithm

For t from 1 to T do

For k from 1 to K do

Parfor i in color k:

Page 15: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Asymptotic Properties

Quantifiable acceleration in mixing

Speedup:

Time to updateall variables once

# Variables# Colors# Processors

Penalty Term

Page 16: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Proof of ErgodicityVersion 1 (Sequential Consistency):

Chromatic Gibbs Sampler is equivalent to a Sequential Scan Gibbs Sampler

Version 2 (Probabilistic Interpretation):Variables in same color are Conditionally Independent

Joint Sample is equivalent to Parallel Independent Samples

Time

Page 17: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Special Properties of 2-Colorable Models

Many common models have two colorings

For the [Incorrect] Synchronous Gibbs SamplersProvide a method to correct the chainsDerive the stationary distribution

Page 18: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

t=2 t=3 t=4t=1

Correcting the Synchronous Gibbs Sampler

We can derive two valid chains:

Strong PositiveCorrelation

t=0

Invalid Sequence

t=0 t=1 t=2 t=3 t=4 t=5

18

Page 19: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

t=2 t=3 t=4t=1

We can derive two valid chains:

Strong PositiveCorrelation

t=0

Invalid Sequence

Chain 1

Chain 2

19

Converges to the Correct Distribution

Correcting the Synchronous Gibbs Sampler

Page 20: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

20

Theoretical Contributions on 2-colorable models

Stationary distribution of Synchronous Gibbs:

Variables in Color 1

Variables in Color 2

Page 21: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

21

Theoretical Contributions on 2-colorable models

Stationary distribution of Synchronous Gibbs

Corollary: Synchronous Gibbs sampler is correct for single variable marginals.

Variables in Color 1

Variables in Color 2

Page 22: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

From Colored Fields to Thin Junction Trees

Chromatic Gibbs Sampler

Ideal for:Rapid mixing modelsConditional structure does not admit Splash

Splash Gibbs Sampler

Ideal for:Slowly mixing modelsConditional structure admits Splash

Discrete models

Slowly Mixing Models

?

Page 23: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

23

Models With Strong Dependencies

Single variable Gibbs updates tend to mix slowly:

Ideally we would like to draw joint samples.Blocking

X1

X2

Single site changes move slowly with strong correlation.

Page 24: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Blocking Gibbs SamplerBased on the papers:1. Jensen et al., Blocking Gibbs Sampling for Linkage Analysis in Large

Pedigrees with Many Loops. TR 19962. Hamze et al., From Fields to Trees. UAI 2004.

Page 25: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Carnegie Mellon

An asynchronous Gibbs Sampler that adaptively addresses strong dependencies.

Splash Gibbs Sampler

25

Page 26: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

26

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Page 27: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

27

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 1

Page 28: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

28

Splash Gibbs Sampler

Step 1: Grow multiple Splashes in parallel:

ConditionallyIndependent

Tree-width = 2

Page 29: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

29

Splash Gibbs Sampler

Step 2: Calibrate the trees in parallel

Page 30: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

30

Splash Gibbs Sampler

Step 3: Sample trees in parallel

Page 31: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

31

Higher Treewidth Splashes

Recall:

Tree-width = 2

Junction Trees

Page 32: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Junction TreesData structure used for exact inference in loopy graphical models

A BfAB

D CfCD

fBCfAD

A BfAB

fADD

B C DfBC

fCD

E

fDE fCEC D E

fDE

fCE

Tree-width = 2

Page 33: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash Thin Junction TreeParallel Splash Junction Tree Algorithm

Construct multiple conditionally independent thin (bounded treewidth) junction trees Splashes

Sequential junction tree extension

Calibrate the each thin junction tree in parallelParallel belief propagation

Exact backward samplingParallel exact sampling

Page 34: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

A

Markov Random Field Corresponding Junction tree

A

Page 35: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

A B

Markov Random Field Corresponding Junction tree

A

B

Page 36: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

A

B CB

A BC

Page 37: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

A

BC

D

B C DA B D

Page 38: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

A

BC

D

B C DA B D

E

A D E

Page 39: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

A

BC

D

B C DA B D

E

A D EF

A E F

Page 40: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

A

BC

D

E F

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

B C DA B D

A D E A E FG

A G

Page 41: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

H

GA

BC

D

E F

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

B C DA B D

A D E A E F

A G

B G H

Page 42: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

H

GA

BC

D

E F

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

B C DA B D

A B D E A B E F

A B G

B G H

Page 43: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

A

BC

D

E F

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

B C DA B D

A D E A E FG

A G

H

Page 44: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

A

BC

D

E F

Splash generationFrontier extension algorithm:

Markov Random Field Corresponding Junction tree

B C DA B D

A D E A E FG

A G

H

I

D I

Page 45: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationChallenge:

Efficiently reject vertices that violate treewidth constraintEfficiently extend the junction treeChoosing the next vertex

Solution Splash Junction Trees:Variable elimination with reverse visit ordering

I,G,F,E,D,C,B,A

Add new clique and update RIPIf a clique is created which exceeds treewidth terminate extension

Adaptive prioritize boundary

A

BC

D

E F

G

H

I

Page 46: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Incremental Junction TreesFirst 3 Rounds:

4

{4}

2

5

1

4

3

6

Junction Tree:

Elim. Order:

4 4,5{5,4}

2

5

1

4

3

6

{2,5,4}

2

5

1

4

3

6

4 4,5 2,5

Page 47: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Incremental Junction TreesResult of third round:

Fourth round:

{1,2,5,4}

2

5

1

4

3

6

4

4,5

2,5 1,2,4

Fix RIP

4

4,5

2,4,5 1,2,4

{2,5,4}2

5

1

4

3

64 4,5 2,5

Page 48: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Incremental Junction TreesResults from 4th round:

5th Round:

{6,1,2,5,4} 2

5

1

4

3

6

4

4,5

2,4,5 1,2,4

5,6

{1,2,5,4}2

5

1

4

3

6

4

4,5

2,4,5 1,2,4

Page 49: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Incremental Junction TreesResults from 5th round:

6th Round:

{6,1,2,5,4} 2

5

1

4

3

6

4

4,5

2,4,5 1,2,4

5,6

{3,6,1,2,5,4}2

5

1

4

3

6

4

4,5

2,4,5 1,2,4

5,6

1,2,3, 6

Page 50: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Incremental Junction TreesFinishing 6th round:

{3,6,1,2,5,4}2

5

1

4

3

6

4

4,5

2,4,5 1,2,4

5,6

1,2,3, 6

4

4,5

2,4,5 1,2,4

1,2,3,6

1,2,5,6

4

4,5

2,4,5 1,2,4,5

1,2,3,6

1,2,5,6

4

4,5

2,4,5 1,2,4

1,2,3,6

1,2,5,6

Page 51: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Algorithm Block [Skip]Finishing 6th round:

Page 52: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Splash generationChallenge:

Efficiently reject vertices that violate treewidth constraintEfficiently extend the junction treeChoosing the next vertex

Solution Splash Junction Trees:Variable elimination with reverse visit ordering

I,G,F,E,D,C,B,A

Add new clique and update RIPIf a clique is created which exceeds treewidth terminate extension

Adaptive prioritize boundary

A

BC

D

E F

G

H

I

Page 53: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Adaptive Vertex PrioritiesAssign priorities to boundary vertices:

Can be computed using only factors that depend on Xv

Based on current sampleCaptures difference between marginalizing out the variable (in Splash) fixing its assignment (out of Splash)Exponential in treewidth

Could consider other metrics …

Page 54: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

54

Adaptively Prioritized Splashes

Adapt the shape of the Splash to span strongly coupled variables:

Provably converges to the correct distributionRequires vanishing adaptationIdentify a bug in the Levine & Casella seminal work in adaptive random scan

Noisy Image BFS Splashes Adaptive Splashes

Page 55: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

ExperimentsImplemented using GraphLab

Treewidth = 1 :Parallel tree construction, calibration, and samplingNo incremental junction trees needed

Treewidth > 1 : Sequential tree construction (use multiple Splashes)Parallel calibration and samplingRequires incremental junction trees

Relies heavily on:Edge consistency model to prove ergodicityFIFO/ Prioritized scheduling to construct Splashes

Evaluated on 32 core Nehalem Server

Page 56: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

56

Rapidly Mixing ModelGrid MRF with weak attractive potentials

40K Variables 80K Factors

The Chromatic sampler slightly outperforms the Splash Sampler

Likelihood Final Sample “Mixing” Speedup

Page 57: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

57

Slowly Mixing ModelMarkov logic network with strong dependencies

10K Variables 28K Factors

The Splash sampler outperforms the Chromatic sampler on models with strong dependencies

Likelihood Final Sample “Mixing”

Speedup in Sample Generation

Page 58: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

58

Conclusion

Chromatic Gibbs sampler for models with weak dependencies

Converges to the correct distributionQuantifiable improvement in mixing

Theoretical analysis of the Synchronous Gibbs sampler on 2-colorable models

Proved marginal convergence on 2-colorable models

Splash Gibbs sampler for models with strong dependencies

Adaptive asynchronous tree constructionExperimental evaluation demonstrates an improvement in mixing

Page 59: Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees

Future WorkExtend Splash algorithm to models with continuous variables

Requires continuous junction trees (Kernel BP)

Consider “freezing” the junction tree setReduce the cost of tree generation?

Develop better adaptation heuristicsEliminate the need for vanishing adaptation?

Challenges of Gibbs sampling in high-coloring modelsCollapsed LDA

High dimensional pseudorandom numbers Not currently addressed in the MCMC literature