scalable map inference in bayesian networks based on a map-reduce approach (pgm2016)

Scalable MAP inference in Bayesiannetworks based on a Map-Reduce

approach

Darío Ramos-López1, Antonio Salmerón1, Rafael Rumí1,Ana M. Martínez2, Thomas D. Nielsen2, Andrés R. Masegosa3,

Helge Langseth3, Anders L. Madsen2,4

1Department of Mathematics, University of Almería, Spain2Department of Computer Science, Aalborg University, Denmark

3 Department of Computer and Information Science,The Norwegian University of Science and Technology, Norway

4 Hugin Expert A/S, Aalborg, Denmark

8th Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1

Outline

1 Motivation

2 MAP in CLG networks

3 Scalable MAP

4 Experimental results

5 Conclusions


Outline

1 Motivation


3 Scalable MAP


5 Conclusions


Motivation

I Aim: Provide scalable solutions to the MAP problem.

I Challenges:I Data coming in streams at high speed, and

a quick response is required.

I For each observation in the stream, themost likely configuration of a set ofvariables of interest is sought.

I MAP inference is highly complex.

I Hybrid models come along with specificdifficulties.


Context

I The AMiDST project: Analysis of MassIve Data STreamshttp://www.amidst.eu

I Large number of variablesI Queries to be answered in real timeI Hybrid Bayesian networks (involving discrete and continuous

variables)I Conditional linear Gaussian networks


Outline

1 Motivation


3 Scalable MAP


5 Conclusions


Querying a Bayesian network

I Belief update: Computing the posterior distribution of a variable:

p(xi |xE ) =

∑xD

∫xC

p(x , xE )dxC

∑xDi

∫xCi

p(x , xE )dxCi

I Maximum a posteriori (MAP): For a set of target variables X I , seek

x∗I = argmaxx I

p(x I |XE = xE )

where p(x I |XE = xE ) is obtained by first marginalizing out fromp(x) the variables not in X I and not in XE

I Most probable explanation (MPE): A particular case of MAP whereX I includes all the unobserved variables


Outline

1 Motivation


3 Scalable MAP


5 Conclusions


MAP by Hill Climbing

HC_MAP(B ,x I ,xE ,r)while stopping criterion not satisfied do

x∗I ← GenerateConfiguration(B , x I , xE , r)

if p(x∗I , xE ) ≥ p(x I , xE ) then

x I ← x∗I

endendreturn x I

I Max. number of non-improving iterationsI Target prob. thresholdI Max. number of iterations

I Never move to a worse configuration

I Estimated using importance sampling:

p(x I , xE ) =∑

x∗∈ΩX∗

p(x I , xE , x∗) =∑

x∗∈ΩX∗

p(x I , xE , x∗)f ∗(x∗)

f ∗(x∗)

= Ef ∗

[p(x I , xE , x∗)

f ∗(x∗)

]≈ 1

n

n∑i=1

p(x I , xE , x∗(i)

)f ∗(x∗(i)

) ,

I We’ll see later


MAP by Simulated Annealing

SA_MAP(B ,x I ,xE ,r)T ← 1 000; α← 0.90; ε > 0while T ≥ ε do


Simulate a random number τ ∼ U(0, 1)if p(x∗

I , xE ) > p(x I , xE )/(T · ln(1/τ)) thenx I ← x∗

I

endT ← α · T

endreturn x I

I Default values of the temperature parametersI T 1: almost completely randomI T 1: almost completely greedy

I Accept x∗I if its prob. increases

or decreases < T · ln(1/τ)I Cool down the temperature


MAP by Simulated Annealing

SA_MAP(B ,x I ,xE ,r)T ← 1 000; α← 0.90; ε > 0while T ≥ ε do


Simulate a random number τ ∼ U(0, 1)if p(x∗

I , xE ) > p(x I , xE )/(T · ln(1/τ)) thenx I ← x∗

I

endT ← α · T

endreturn x I

I Default values of the temperature parametersI T 1: almost completely randomI T 1: almost completely greedy

I Accept x∗I if its prob. increases

or decreases < T · ln(1/τ)

I Cool down the temperature


Generating a new configuration of variables

Only discrete variablesI The new values are chosen at random

A variable whose parents are discrete or observed

Y

W

TU

S

P(Y ) = (0.5, 0.5)P(S) = (0.1, 0.9)f (w |Y = 0) = N (w ;−1, 1)f (w |Y = 1) = N (w ; 2, 1)f (t|w ,S = 0) = N (t;−w , 1)f (t|w ,S = 1) = N (t;w , 1)f (u|w) = N (u;w , 1)



Hybrid modelsI We take advantage of the properties of the CLG distribution


Y

W

TU

S

P(Y ) = (0.5, 0.5)P(S) = (0.1, 0.9)f (w |Y = 0) = N (w ;−1, 1)f (w |Y = 1) = N (w ; 2, 1)f (t|w ,S = 0) = N (t;−w , 1)f (t|w ,S = 1) = N (t;w , 1)f (u|w) = N (u;w , 1)





Y

W

TU

S

Return its conditional mean:

f (w |Y = 0) = N (w ;−1, 1)f (w |Y = 1) = N (w ; 2, 1)

A variable with unobserved continuous parents

Y

W

TU

S

Simulate a value usingits conditional distribution:

f (t|w ,S = 0) = N (t;−w , 1)f (t|w ,S = 1) = N (t;w , 1)




A variable with unobserved continuous parents

Y

W

TU

S

Simulate a value usingits conditional distribution:

f (t|w ,S = 0) = N (t;−w , 1)f (t|w ,S = 1) = N (t;w , 1)


Scalable implementation


Outline

1 Motivation


3 Scalable MAP


5 Conclusions


Experimental analysis

PurposeAnalyze the scalability in terms of

I SpeedI Accuracy

Experimental setupI Synthetic networks with 200 variables (50% discrete)I 70% of the variables observed at randomI 10% of the variables selected as target ⇒ 20% to be

marginalized out


Experimental analysis

Computing environmentI AMIDST Toolbox with Apache FlinkI Multi-core environment based on a dual-processor AMD

Opteron 2.8 GHz server with 32 cores and 64 GB of RAM,running Ubuntu Linux 14.04.1 LTS

I Multi-node environment based on Amazon Web Services(AWS)


Scalability: run times

100

200

300

400

500

600

1 2 4 8 16 24 32Number of cores

Exe

cutio

n tim

e (

s)

method HC Global HC Local SA Global SA Local

Scalability of MAP in a multi−core node

5

10

15

Sp

ee

d-u

p fa

ctor 5

10

15

20

1 2 4 8 16 24 32Number of nodes (4 vCPUs per node)

Sp

ee

d−

up fact

or


Scalability of MAP in a multi−node cluster

25

50

75

100

125

1 2 4 8 16 24 32Number of nodes (4 vCPUs per node)

Exe

cutio

n t

ime

(s)


Scalability of MAP in a multi−node cluster

5

10

15

20

Sp

ee

d-u

p fa

ctor


Scalability: accuracy (Simulated Annealing)

−230

−220

−210

−200

−190

1 2 4 8 16 32Number of cores

log

Pro

ba

bility

cores

1

2

4

8

16

32

Simulated Annealing global

−200

−195

−190

−185


log

Pro

ba

bility

cores

1

2

4

8

16

32

Simulated Annealing local

Estimated log-probabilities of the MAP configurations found by each algorithm


Scalability: accuracy (Hill Climbing)

−240

−220

−200


log

Pro

ba

bility

cores

1

2

4

8

16

32

Hill Climbing global

−240

−220

−200


log

Pro

ba

bility

cores

1

2

4

8

16

32

Hill Climbing local

Estimated log-probabilities of the MAP configurations found by each algorithm


Outline

1 Motivation


3 Scalable MAP


5 Conclusions


Conclusions

I Scalable MAP for CLG models in terms of accuracy and runtime

I Available in the AMIDST ToolboxI Valid for multi-cores and cluster systemsI MapReduce-based design on top of Apache Flink


Thank you for your attentionYou can download our open source Java toolbox:

http://www.amidsttoolbox.com

Acknowledgments: This project has received funding from the European Union’sSeventh Framework Programme for research, technological development and

demonstration under grant agreement no 619209


http://www.amidsttoolbox.com