locally averaged bayesian dirichlet metrics

48
Locally averaged Bayesian Dirichlet metrics A. Cano, M. Gomez-Olmedo, A. R. Masegosa and S. Moral Department of Computer Science and Artificial Intelligence University of Granada (Spain) Belfast, July 2011 European Conference on Symbolic and Quantitative Approaches to Reasoning under Uncertainty ECSQARU 2011 Belfast (UK) 1/30

Upload: ntnu

Post on 12-Apr-2017

259 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Locally Averaged Bayesian Dirichlet Metrics

Locally averaged Bayesian Dirichlet metrics

A. Cano, M. Gomez-Olmedo, A. R. Masegosa and S. Moral

Department of Computer Science and Artificial Intelligence

University of Granada (Spain)

Belfast, July 2011

European Conference on Symbolic and Quantitative Approaches to Reasoning under Uncertainty

ECSQARU 2011 Belfast (UK) 1/30

Page 2: Locally Averaged Bayesian Dirichlet Metrics

Outline

1 Introduction

2 Bayesian Dirichlet Metrics

3 Locally Averaged Bayesian Dirichlet Metrics

4 Experimental Evaluation

5 Conclusions & Future Works

ECSQARU 2011 Belfast (UK) 2/30

Page 3: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Part I

Introduction

ECSQARU 2011 Belfast (UK) 3/30

Page 4: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Bayesian Networks

Bayesian Networks

Excellent models to graphically represent the dependency structure of theunderlying distribution in multivariate domains.

This dependency structure in a multivariate problem domain represents a veryrelevant source of knowledge (direct interactions, conditionalindependencies...)

ECSQARU 2011 Belfast (UK) 4/30

Page 5: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Learning Bayesian Networks from Data

Learning Algorithms

Constrained-Base learning based on hypothesis tests approaches such as PCalgorithm.

Score+Search methods which employs a search algorithm guided by a scorefunction.

The model with the highest score is selected.

ECSQARU 2011 Belfast (UK) 5/30

Page 6: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Bayesian Score Metrics

Marginal Likelihood of the data

P(D|G) =

∫P(D|θ,G)P(θ|G)dφ

Bayesian Dirichlet Equivalent Metric (BDe)

It satisfies the likelihood equivalence property.

A global Dirichlet distribution is assumed in order to guarantee the likelihoodequivalence property.

The parametrization depends of the equivalent sample size, ESS, parameter.

score(G : D) =∏

i

|Ui |∏j=0

Γ( ESS|Ui |

)

Γ( ESS|Ui |

+ Nij )

|Xi |∏k=1

Γ( ESS|Ui ||Xi |

+ Nijk )

Γ( ESS|Ui ||Xi |

)

ECSQARU 2011 Belfast (UK) 6/30

Page 7: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Bayesian Score Metrics

Marginal Likelihood of the data

P(D|G) =

∫P(D|θ,G)P(θ|G)dφ

Bayesian Dirichlet Equivalent Metric (BDe)

It satisfies the likelihood equivalence property.

A global Dirichlet distribution is assumed in order to guarantee the likelihoodequivalence property.

The parametrization depends of the equivalent sample size, ESS, parameter.

score(G : D) =∏

i

|Ui |∏j=0

Γ( ESS|Ui |

)

Γ( ESS|Ui |

+ Nij )

|Xi |∏k=1

Γ( ESS|Ui ||Xi |

+ Nijk )

Γ( ESS|Ui ||Xi |

)

ECSQARU 2011 Belfast (UK) 6/30

Page 8: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Sensitivity to ESS parameter

Experimental Evaluations [Silander et al.2007]

The global MAP BN was computed with an exhaustive search based algorithmfor 20 UCI data sets.

They found as different ESS values lead to different optimal BN models.

For some data sets (e.g. Yeast database) the optimal BN model monotonicallygoes from the empty to the fully connected graph.

N. of Arcs in the optimal BN vs ESS value

ECSQARU 2011 Belfast (UK) 7/30

Page 9: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Our approach

Solution: Marginalizing the ESS parameter

As firstly suggested in [Silander et al. 2007], a possible solution is to employ aBayesian approach:

Assume a prior distribution on the ESS parameter and to marginalize itout.

Locally Averaged Bayesian Dirichlet Metrics

It is based on a local averaging approach to marginalize the ESS parameter.

We experimentally justify that this approach is superior:

It is able to adapt to more complex parameter spaces.

This approach removes the sensitivity of Bayesian Dirichlet metric to the ESSparameter.

ECSQARU 2011 Belfast (UK) 8/30

Page 10: Locally Averaged Bayesian Dirichlet Metrics

Introduction

Our approach

Solution: Marginalizing the ESS parameter

As firstly suggested in [Silander et al. 2007], a possible solution is to employ aBayesian approach:

Assume a prior distribution on the ESS parameter and to marginalize itout.

Locally Averaged Bayesian Dirichlet Metrics

It is based on a local averaging approach to marginalize the ESS parameter.

We experimentally justify that this approach is superior:

It is able to adapt to more complex parameter spaces.

This approach removes the sensitivity of Bayesian Dirichlet metric to the ESSparameter.

ECSQARU 2011 Belfast (UK) 8/30

Page 11: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Part II

Bayesian Dirichlet Metrics

ECSQARU 2011 Belfast (UK) 9/30

Page 12: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Notation

Let be X = (X1, ...,Xn) a set of nmultinomial random variables.

|Xi | is the number of values of Xi .

We also assume a fully observedmultinomial data set D.

A Bayesian Network B can be described by:

G is a directed acyclic graph.

G = (Pa(X1), ...,Pa(Xn)).

θG a set of parameter vectors.

P(Xi |Pa(Xi ) = j) = θij .

ECSQARU 2011 Belfast (UK) 10/30

Page 13: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Bayesian Dirchlet equivalent metric

Marginal Likelihood of a graph structure:

P(D|G) =

∫P(D|θ,G)P(θ|G)dφ

It is computed under the following assumptions:

Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.

θij ∼ Dirichet(αij1, ..., αijk )

Parameters are globally and locally independent:

scoreBDeu(G|D) =n∏

i=1

|PaG(Xi )|∏j=0

Γ(αij )

Γ(αij + Nij )

|Xi |∏k=1

Γ(αijk + Nijk )

Γ(αijk )

BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:

αijk =S

|Xi ||Pa(Xi )|

ECSQARU 2011 Belfast (UK) 11/30

Page 14: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Bayesian Dirchlet equivalent metric

Marginal Likelihood of a graph structure:

P(D|G) =

∫P(D|θ,G)P(θ|G)dφ

It is computed under the following assumptions:

Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.

θij ∼ Dirichet(αij1, ..., αijk )

Parameters are globally and locally independent:

scoreBDeu(G|D) =n∏

i=1

|PaG(Xi )|∏j=0

Γ(αij )

Γ(αij + Nij )

|Xi |∏k=1

Γ(αijk + Nijk )

Γ(αijk )

BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:

αijk =S

|Xi ||Pa(Xi )|

ECSQARU 2011 Belfast (UK) 11/30

Page 15: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Bayesian Dirchlet equivalent metric

Marginal Likelihood of a graph structure:

P(D|G) =

∫P(D|θ,G)P(θ|G)dφ

It is computed under the following assumptions:

Complete labelled training data.The prior distributions over the parameters are Dirichlet distributions.

θij ∼ Dirichet(αij1, ..., αijk )

Parameters are globally and locally independent:

scoreBDeu(G|D) =n∏

i=1

|PaG(Xi )|∏j=0

Γ(αij )

Γ(αij + Nij )

|Xi |∏k=1

Γ(αijk + Nijk )

Γ(αijk )

BDe metrics sets alpha values as follows, in order to guarantee the likelihoodequivalence property:

αijk =S

|Xi ||Pa(Xi )|ECSQARU 2011 Belfast (UK) 11/30

Page 16: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Sensitivity to the ESS

The problem is that we make αijk values exponentially small either with thenumber or the cardinality of the parents: αijk = S

|Xi ||Pa(Xi )|.

Beta(1,1), Beta(0.5, 0.5), Beta(0.25, 0.25), Beta(0.125, 0.125)

(SteckJackola2002, Steck2008, Ueno.2010): small αijk values tends to favorthe the absence of an edge Y −→ X over its presence (even if they are notconditionally independent).

Specially if the empirical P̂(X |Y ) is not very extreme (it does not matchwith the prior assupmtions).

ECSQARU 2011 Belfast (UK) 12/30

Page 17: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Sensitivity to the ESS

The problem is that we make αijk values exponentially small either with thenumber or the cardinality of the parents: αijk = S

|Xi ||Pa(Xi )|.

Beta(1,1), Beta(0.5, 0.5), Beta(0.25, 0.25), Beta(0.125, 0.125)

(SteckJackola2002, Steck2008, Ueno.2010): small αijk values tends to favorthe the absence of an edge Y −→ X over its presence (even if they are notconditionally independent).

Specially if the empirical P̂(X |Y ) is not very extreme (it does not matchwith the prior assupmtions).

ECSQARU 2011 Belfast (UK) 12/30

Page 18: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Sensitivity to the ESS

If we increase the S value, we implicitly assume that marginal distributionsP(Xi ) = θi have very symmetrical probability distribution.

Beta(1,1), Beta(2, 2), Beta(4, 4), Beta(8, 8)

(SteckJackola2002, Steck2008, Ueno.2010): larger S values tends to favor thepresence of an edge Y −→ X over its absence (even if they are conditionallyindependent).

Specially, if there is a notable skewness in both marginal distributions:P(X |PaX ) and P(Y |PaY ).

ECSQARU 2011 Belfast (UK) 13/30

Page 19: Locally Averaged Bayesian Dirichlet Metrics

Bayesian Dirichlet Metrics

Sensitivity to the ESS

If we increase the S value, we implicitly assume that marginal distributionsP(Xi ) = θi have very symmetrical probability distribution.

Beta(1,1), Beta(2, 2), Beta(4, 4), Beta(8, 8)

(SteckJackola2002, Steck2008, Ueno.2010): larger S values tends to favor thepresence of an edge Y −→ X over its absence (even if they are conditionallyindependent).

Specially, if there is a notable skewness in both marginal distributions:P(X |PaX ) and P(Y |PaY ).

ECSQARU 2011 Belfast (UK) 13/30

Page 20: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Part III

Locally Averaged BayesianDirichlet Metrics

ECSQARU 2011 Belfast (UK) 14/30

Page 21: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:

Consider S as a random variable, place a prior on S and marginalize it out.

P(D|G) =

∫P(D|G, s)P(s|G)ds

where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.

It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method

P(D|G) =1|S|

∑s∈S

∏i

|Ui |∏j=0

Γ( S|Ui |

)

Γ( S|Ui |

+ Nij )

|Xi |∏k=1

Γ( S|Ui ||Xi |

+ Nijk )

Γ( S|Ui ||Xi |

)

where S is a finite set of different S values.

Satisfies the likelihood equivalence property but it is not locally decomposable.

ECSQARU 2011 Belfast (UK) 15/30

Page 22: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:

Consider S as a random variable, place a prior on S and marginalize it out.

P(D|G) =

∫P(D|G, s)P(s|G)ds

where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.

It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method

P(D|G) =1|S|

∑s∈S

∏i

|Ui |∏j=0

Γ( S|Ui |

)

Γ( S|Ui |

+ Nij )

|Xi |∏k=1

Γ( S|Ui ||Xi |

+ Nijk )

Γ( S|Ui ||Xi |

)

where S is a finite set of different S values.

Satisfies the likelihood equivalence property but it is not locally decomposable.

ECSQARU 2011 Belfast (UK) 15/30

Page 23: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

[Silander et al. 2007] Bayesian solution to the problem of selecting anoptimal ESS:

Consider S as a random variable, place a prior on S and marginalize it out.

P(D|G) =

∫P(D|G, s)P(s|G)ds

where P(D|G, s) is the classic marginal likelihood which depends of theequivalent sample size.

It is assumed that P(S|G) is uniform and integral is approximated by a simpleaveraging method

P(D|G) =1|S|

∑s∈S

∏i

|Ui |∏j=0

Γ( S|Ui |

)

Γ( S|Ui |

+ Nij )

|Xi |∏k=1

Γ( S|Ui ||Xi |

+ Nijk )

Γ( S|Ui ||Xi |

)

where S is a finite set of different S values.

Satisfies the likelihood equivalence property but it is not locally decomposable.

ECSQARU 2011 Belfast (UK) 15/30

Page 24: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Sensitivity to the ESS

A toy example:

Z and Y have very skewed marginaldistributions.

P(X|Z) is not notably far from uniform.

We generate 1000 data samples.

We evaluate the BN with the highest score.

ECSQARU 2011 Belfast (UK) 16/30

Page 25: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsIt always retrieves the empty graph without any edge.

Reasons:

We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.

ECSQARU 2011 Belfast (UK) 17/30

Page 26: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsIt always retrieves the empty graph without any edge.

Reasons:

We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.

ECSQARU 2011 Belfast (UK) 17/30

Page 27: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Globally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsIt always retrieves the empty graph without any edge.

Reasons:

We assume a global distribution (either strongly uniform or uniform orskewed or very skewed) for all parameters at the same time.This assumption does not fit the parameter space of this Bayesiannetwork.

ECSQARU 2011 Belfast (UK) 17/30

Page 28: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:

We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the

parameters S are independent.

P(D|G) =1|S|

∏i

|Pa(Xi )|∏j=0

∑s∈S

Γ( S|Pa(Xi )|

)

Γ( S|Pa(Xi )|

+ Nij )

|Xi |∏k=1

Γ( S|Pa(Xi )||Xi |

+ Nijk )

Γ( S|Pa(Xi )||Xi |

)

where S is a finite set of different S values.

It is now locally decomposable metric but it losses the likelihood equivalenceproperty.

ECSQARU 2011 Belfast (UK) 18/30

Page 29: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:

We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the

parameters S are independent.

P(D|G) =1|S|

∏i

|Pa(Xi )|∏j=0

∑s∈S

Γ( S|Pa(Xi )|

)

Γ( S|Pa(Xi )|

+ Nij )

|Xi |∏k=1

Γ( S|Pa(Xi )||Xi |

+ Nijk )

Γ( S|Pa(Xi )||Xi |

)

where S is a finite set of different S values.

It is now locally decomposable metric but it losses the likelihood equivalenceproperty.

ECSQARU 2011 Belfast (UK) 18/30

Page 30: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet MetricsThe marginalization of the parameter S is carried out locally:

We assume that each parameter vector θij is drawn from a different Dirichlet distribution where the

parameters S are independent.

P(D|G) =1|S|

∏i

|Pa(Xi )|∏j=0

∑s∈S

Γ( S|Pa(Xi )|

)

Γ( S|Pa(Xi )|

+ Nij )

|Xi |∏k=1

Γ( S|Pa(Xi )||Xi |

+ Nijk )

Γ( S|Pa(Xi )||Xi |

)

where S is a finite set of different S values.

It is now locally decomposable metric but it losses the likelihood equivalenceproperty.

ECSQARU 2011 Belfast (UK) 18/30

Page 31: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsWhen L ≥ 5 we always retrieve the right graph.

We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.

This assumption allow to fit much more complex parameter spaces.

ECSQARU 2011 Belfast (UK) 19/30

Page 32: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsWhen L ≥ 5 we always retrieve the right graph.

We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.

This assumption allow to fit much more complex parameter spaces.

ECSQARU 2011 Belfast (UK) 19/30

Page 33: Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Locally Averaged Bayesian Dirichlet Metrics

Different averaging set values, SL, were tested:

S1 = {0.5, 1, 2}, S2 = {0.25, 0.5, 1, 2, 4}, ...,S10 = {2−10, 2−9, ...., 29, 210}.

S << 1 (very skewed), S < 1 (skewed), S ≈ 1(uniform), S >> 1 (strongly uniform).

ResultsWhen L ≥ 5 we always retrieve the right graph.

We assume that each parameter vector follows a different Dirichletdistribution either strongly uniform or uniform or skewed or very skewed. Butindependent from the rest of parameters.

This assumption allow to fit much more complex parameter spaces.

ECSQARU 2011 Belfast (UK) 19/30

Page 34: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Part IV

Experimental Evaluation

ECSQARU 2011 Belfast (UK) 20/30

Page 35: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Experimental Set-up

Bayesian Networks:

alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).

Data Sets:

We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).

Evaluation Measures

Number of missing/extra links, Kullback-Leibler distance....Algorithms

A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).

ECSQARU 2011 Belfast (UK) 21/30

Page 36: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Experimental Set-up

Bayesian Networks:

alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).

Data Sets:

We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).

Evaluation Measures

Number of missing/extra links, Kullback-Leibler distance....Algorithms

A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).

ECSQARU 2011 Belfast (UK) 21/30

Page 37: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Experimental Set-up

Bayesian Networks:

alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).

Data Sets:

We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).

Evaluation Measures

Number of missing/extra links, Kullback-Leibler distance....

Algorithms

A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).

ECSQARU 2011 Belfast (UK) 21/30

Page 38: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Experimental Set-up

Bayesian Networks:

alarm (37 nodes), boblo (23 nodes), boerlage-92 (23 nodes), hailfinder (56nodes), insurance (27 nodes).

Data Sets:

We run 10 times the algorithms with 1000 data samples (other datasamples sizes were evaluated).

Evaluation Measures

Number of missing/extra links, Kullback-Leibler distance....Algorithms

A greedy search algorithm is used assuming we are given a correcttopological order of the variables.Different SL sets are used to perform averaging: L = 1, ...10 (displayed onx-axis).

ECSQARU 2011 Belfast (UK) 21/30

Page 39: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

BDe with different S values I

050

100

150

Log of the S Value

Mis

sing

+E

xtra

Lin

ks

AlarmBobloBoerlageHailfinderInsurance

−6 −4 −2 0 1 2 3 4 5 6 7 8

0.0

0.5

1.0

1.5

2.0

Log of the S ValueK

L D

ista

nce

AlarmBobloBoerlageHailfinderInsurance

−6 −4 −2 0 1 2 3 4 5 6 7 8

Analysis

The BDe metric is very sensitive to the S values in some domain problems.

There is an optimal S value which is different for each problem.

ECSQARU 2011 Belfast (UK) 22/30

Page 40: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

BDe with different S values II

510

1520

Log of the S Value

Mis

sing

Lin

ks

AlarmBobloBoerlageHailfinderInsurance

−6 −4 −2 0 1 2 3 4 5 6 7 8

020

4060

8010

012

014

0

Log of the S Value

Ext

ra L

inks

AlarmBobloBoerlageHailfinderInsurance

−6 −4 −2 0 1 2 3 4 5 6 7 8

Analysis

We can see the theoretically predicted tendencies appears.

Higher S values have a tendency to add edges.

Lower S values have a tendency to remove edges.

ECSQARU 2011 Belfast (UK) 23/30

Page 41: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Locally Averaged Bayesian Dirichlet metrics

510

1520

L Values

Mis

sing

+E

xtra

Lin

ks

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

0.0

0.2

0.4

0.6

0.8

1.0

1.2

L Values

KL

Dis

tanc

e

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

Analysis

The higher the L value, the wider the set of averaged S values.

In some domains, the error measures improves with the size of averaged Svalues.

In other domains, the error does not improve but it does not get worse.

ECSQARU 2011 Belfast (UK) 24/30

Page 42: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Globally Averaged Bayesian Dirichlet metrics

510

1520

L Values

Mis

sing

+E

xtra

Lin

ks

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

0.0

0.2

0.4

0.6

0.8

1.0

1.2

L ValuesK

L D

ista

nce

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

Analysis

Similar behavior to locally averaged metrics.

ECSQARU 2011 Belfast (UK) 25/30

Page 43: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

Globally vs Locally Averaged Bayesian Dirichlet metrics

Global-AvBD error minus Local-AvBD error

01

23

L Values

Mis

sing

+E

xtra

Diff

eren

ce

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

Analysis

In Alarm, Boblo and Boerlage, there hardly are differences between them.

In Hailfinder and Insurance, Local-AvBD metric performs better.

The performance depends of the complexity of the parameter space.

ECSQARU 2011 Belfast (UK) 26/30

Page 44: Locally Averaged Bayesian Dirichlet Metrics

Experimental Evaluation

BDe metric vs Locally Averaged Bayesian Dirichlet metrics

BD error minus Local-AvBD error

−1

01

2

L Values

Mis

sing

+E

xtra

Diff

eren

ce

AlarmBobloBoerlageHailfinderInsurance

1 2 3 4 5 6 7 8 9 10

Analysis

For BD metric, it is seleced the model with the lowest error using any of Svalues in the set SL.

Local-AvBD metric peforms as least as well as the BD metric with an optimal Svalue.

In some domains (Hailfinder and Insurance), Local-AvBD metric carries outbetter inferences.

ECSQARU 2011 Belfast (UK) 27/30

Page 45: Locally Averaged Bayesian Dirichlet Metrics

Conclusions and Future Works

Part V

Conclusions and Future Works

ECSQARU 2011 Belfast (UK) 28/30

Page 46: Locally Averaged Bayesian Dirichlet Metrics

Conclusions and Future Works

Conclusions and Future Works

ConclusionsLocally Averaged Bayesian Dirichlet metrics robustly infers more accuratemodels than the BDe metric with an optimal selection the ESS parameter.

It is able to adapt to complex parameter spaces.

This metric is worth for knowledge discovery tasks: the inferences does notdepend of any free parameter and it gives the performance of an opmtimalsolution.

Future WorksExtend this method to the parameter estimation of a BN model:

P(Xi = k |Pa(Xi ) = j) =nijk + S

|Xi ||Pa(Xi )|

nij + S|Pa(Xi )|

ECSQARU 2011 Belfast (UK) 29/30

Page 47: Locally Averaged Bayesian Dirichlet Metrics

Conclusions and Future Works

Conclusions and Future Works

ConclusionsLocally Averaged Bayesian Dirichlet metrics robustly infers more accuratemodels than the BDe metric with an optimal selection the ESS parameter.

It is able to adapt to complex parameter spaces.

This metric is worth for knowledge discovery tasks: the inferences does notdepend of any free parameter and it gives the performance of an opmtimalsolution.

Future WorksExtend this method to the parameter estimation of a BN model:

P(Xi = k |Pa(Xi ) = j) =nijk + S

|Xi ||Pa(Xi )|

nij + S|Pa(Xi )|

ECSQARU 2011 Belfast (UK) 29/30

Page 48: Locally Averaged Bayesian Dirichlet Metrics

Conclusions and Future Works

Thanks for you attention!!!

ECSQARU 2011 Belfast (UK) 30/30