two approximate algorithms for belief updating mini-clustering - mc robert mateescu, rina dechter,...
TRANSCRIPT
Two Approximate Algorithms for Belief Updating
Mini-Clustering - MCRobert Mateescu, Rina Dechter, Kalev Kask. "Tree Approximation for Belief Updating", AAAI-2002
Iterative Join-Graph Propagation - IJGP Rina Dechter, Kalev Kask and Robert Mateescu. "Iterative Join-Graph Propagation”, UAI 2002
What is Mini-Clustering?
Mini-Clustering (MC) is an approximate algorithm for belief updating in Bayesian networks
MC is an anytime version of join-tree clustering
MC applies message passing along a cluster tree
The complexity of MC is controlled by a user-adjustable parameter, the i-bound
Empirical evaluation shows that MC is a very effective algorithm, in many cases superior to other approximate schemes (IBP, Gibbs Sampling)
The belief updating problem is the task of computing the posterior probability P(Y|e) of query nodes Y X given evidence e.We focus on the basic case where Y is a single variable Xi
G
E
F
C D
B
A
y tables)probabilit al(condition
CPTs are )|(},,...,{
over graph) acyclic (directedDAG a is
domains their ofset theis },...,{
variablesrandom ofset a is },...,{
: where,,,
quadruple a is A
1
1
1
iiin
n
n
paXPpppP
XG
DDD
XXX
PGDXBN
network belief
Belief networks
Tree decompositions
property)on intersecti (running subtree connected
a forms set the bleeach variaFor 2.
and
such that vertex oneexactly is therefunction each For 1.
:satisfying
and sets, twox each verte with gassociatin functions,
labeling are and and treea is where,,, triple
a is network belief afor A
χ(v)}V|X{vXX
χ(v))scope(pψ(v)p
Pp
Pψ(v)
Xχ(v)Vv
ψχ(V,E)TT
X,D,G,PBNpositiontree decom
ii
ii
i
A B C p(a), p(b|a), p(c|a,b)
B C D Fp(d|b), p(f|c,d)
B E Fp(e|b,f)
E F Gp(g|e,f)
EF
BF
BC
G
E
F
C D
B
A
Belief network Tree decomposition
Cluster Tree Elimination
Cluster Tree Elimination (CTE) is an exact algorithm
It works by passing messages along a tree decomposition
Basic idea: Each node sends only one message to each of its
neighbors Node u sends a message to its neighbor v only when
u received messages from all its other neighbors
Cluster Tree Elimination
Previous work on tree clustering:
Lauritzen, Spiegelhalter - ‘88 (probabilities) Jensen, Lauritzen, Olesen - ‘90 (probabilities) Shenoy, Shafer - ‘90, Shenoy - ‘97 (general) Dechter, Pearl - ‘89 (constraints) Gottlob, Leone, Scarello - ‘00 (constraints)
)(u
u v
x1
x2
xn
)},(),,({)( 21 uxhuxhu )},({)( 1 uxhu )},(),...,,(),,({)( 21 uxhuxhuxhu n
),( )},({)(),(
:message theCompute
vuelim uvhuclusterffvuh
Belief Propagation
h(u,v)
)},(),,(),...,,(),,({)( 21 uvhuxhuxhuxhu n
ABC
2
4
),|()|()(),()2,1( bacpabpapcbha
1
3 BEF
EFG
),(),|()|(),( )2,3(,
)1,2( fbhdcfpbdpcbhfd
),(),|()|(),( )2,1(,
)3,2( cbhdcfpbdpfbhdc
),(),|(),( )3,4()2,3( fehfbepfbhe
),(),|(),( )3,2()4,3( fbhfbepfehb
),|(),()3,4( fegGpfeh e
EF
BF
BC
BCDF
G
E
F
C D
B
A
Cluster Tree Elimination - example
Cluster Tree Elimination - the messages
),|()|()(),()2,1( bacpabpapcbha
A B C p(a), p(b|a), p(c|a,b)
B C D Fp(d|b), p(f|c,d)
h(1,2)(b,c)
B E Fp(e|b,f), h(2,3)(b,f)
E F Gp(g|e,f)
),(),|()|(),( )2,1(,
)3,2( cbhdcfpbdpfbhdc
2
4
1
3
EF
BC
BFsep(2,3)={B,F}
elim(2,3)={C,D}
Cluster Tree Elimination - properties
Correctness and completeness: Algorithm CTE is correct, i.e. it computes the exact joint probability of a single variable and the evidence.
Time complexity: O ( deg (n+N) d w*+1 )
Space complexity: O ( N d sep)where deg = the maximum degree of a node
n = number of variables (= number of CPTs)
N = number of nodes in the tree decomposition
d = the maximum domain size of a variable
w* = the induced widthsep = the separator size
Mini-Clustering - motivation
Time and space complexity of Cluster Tree Elimination depend on the induced width w* of the problem
When the induced width w* is big, CTE algorithm becomes infeasible
Mini-Clustering - the basic idea
Try to reduce the size of the cluster (the exponent); partition each cluster into mini-clusters with less variables
Accuracy parameter i = maximum number of variables in a mini-cluster
The idea was explored for variable elimination (Mini-Bucket)
Suppose cluster(u) is partitioned into p mini-clusters: mc(1),…,mc(p), each containing at most i variables
TC computes the ‘exact’ message:
We want to process each fmc(k) f separately
),( 1 )(),( vuelim
p
k kmcfvu fh
Mini-Clustering
),( 1 )(),( vuelim
p
k kmcfvu fh
Mini-Clustering
Approximate each fmc(k) f , k=2,…,p and take it outside the summation
How to process the mini-clusters to obtain approximations or bounds:
Process all mini-clusters by summation - this gives an upper bound on the joint probability
A tighter upper bound: process one mini-cluster by summation and the others by maximization
Can also use mean operator (average) - this gives an approximation of the joint probability
Split a cluster into mini-clusters =>bound complexity
XX gh
)()()O(e :decrease complexity lExponentia n rnr eOeO
Idea of Mini-Clustering
EF
BF
BC
),|()|()(:),(1)2,1( bacpabpapcbh
a
)2,1(H
),|(max:)(
),()|(:)(
,
2)1,2(
1)2,3(
,
1)1,2(
dcfpch
fbhbdpbh
fd
fd
)1,2(H
),|(max:)(
),()|(:)(
,
2)3,2(
1)2,1(
,
1)3,2(
dcfpfh
cbhbdpbh
dc
dc
)3,2(H
),(),|(:),( 1)3,4(
1)2,3( fehfbepfbh
e
)2,3(H
)()(),|(:),( 2)3,2(
1)3,2(
1)4,3( fhbhfbepfeh
b
)4,3(H
),|(:),(1)3,4( fegGpfeh e)3,4(H
ABC
2
4
1
3 BEF
EFG
BCDF
Mini-Clustering - example
Mini-Clustering - the messages, i=3
),|()|()(),(1)2,1( bacpabpapcbh
a
A B C p(a), p(b|a), p(c|a,b)
B C D p(d|b), h(1,2)(b,c)
C D F p(f|c,d)
B E Fp(e|b,f),
h1(2,3)(b), h2
(2,3)(f)
E F Gp(g|e,f)
2
4
1
3
EF
BC
BFsep(2,3)={B,F}
elim(2,3)={C,D} ),|(max)(,
2)3,2( dcfpfh
dc
),()|()( 1)2,1(
,
1)3,2( cbhbdpbh
dc
Cluster Tree Elimination vs. Mini-Clustering
ABC
2
4
),()2,1( cbh1
3 BEF
EFG
),()1,2( cbh
),()3,2( fbh
),()2,3( fbh
),()4,3( feh
),()3,4( fehEF
BF
BC
BCDF
),(1)2,1( cbh
)(
)(2
)1,2(
1)1,2(
ch
bh
)(
)(2
)3,2(
1)3,2(
fh
bh
),(1)2,3( fbh
),(1)4,3( feh
),(1)3,4( feh
)2,1(H
)1,2(H
)3,2(H
)2,3(H
)4,3(H
)3,4(H
ABC
2
4
1
3 BEF
EFG
EF
BF
BC
BCDF
Mini-Clustering
Correctness and completeness: Algorithm MC(i) computes a bound (or an approximation) on the joint probability P(Xi,e) of each variable and each of its values.
Time & space complexity: O(n hw* d i)
where hw* = maxu | {f | f (u) } |
Normalization
Algorithms for the belief updating problem compute, in general, the joint probability:
Computing the conditional probability:
is easy to do if exact algorithms can be applied becomes an important issue for approximate
algorithms
evidence node,query ),,( eXeXP ii
evidence node,query ),|( eXeXP ii
MC can compute an (upper) bound on the joint P(Xi,e)
Deriving a bound on the conditional P(Xi|e) is not easy when the exact P(e) is not available
If a lower bound would be available, we could use:
as an upper bound on the posterior
In our experiments we normalized the results and regarded them as approximations of the posterior P(Xi|e)
),( eXP i
)(eP
)(/),( ePeXP i
Normalization
Experimental results
Algorithms: Exact IBP Gibbs sampling (GS) MC with normalization
(approximate)
Networks (all variables are binary): Coding networks CPCS 54, 360, 422 Grid networks (MxM) Random noisy-OR networks Random networks
We tested MC with max and mean operators
Measures: Normalized Hamming Distance
(NHD) BER (Bit Error Rate) Absolute error Relative error Time
Random networks - Absolute error
evidence=0 evidence=10
Random networks, N=50, P=2, k=2, evid=0, w*=10, 50 instances
i-bound
0 2 4 6 8 10
Abs
olut
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
MCGibbs SamplingIBP
Random networks, N=50, P=2, k=2, evid=10, w*=10, 50 instances
i-bound
0 2 4 6 8 10
Abs
olut
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
MCGibbs SamplingIBP
Coding networks - Bit Error Rate
sigma=0.22 sigma=.51
Coding networks, N=100, P=4, sigma=.51, w*=12, 50 instances
i-bound
0 2 4 6 8 10 12
Bit
Err
or R
ate
0.06
0.08
0.10
0.12
0.14
0.16
0.18
MCIBP
Coding networks, N=100, P=4, sigma=.22, w*=12, 50 instances
i-bound
0 2 4 6 8 10 12
Bit
Err
or R
ate
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
MCIBP
Noisy-OR networks - Absolute error
Noisy-OR networks, N=50, P=3, evid=10, w*=16, 25 instances
i-bound
0 2 4 6 8 10 12 14 16
Abs
olut
e er
ror
1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
MCIBPGibbs Sampling
Noisy-OR networks, N=50, P=3, evid=20, w*=16, 25 instances
i-bound
0 2 4 6 8 10 12 14 16A
bsol
ute
erro
r1e-5
1e-4
1e-3
1e-2
1e-1
1e+0
MCIBPGibbs Sampling
evidence=10 evidence=20
CPCS422 - Absolute error
evidence=0 evidence=10
CPCS 422, evid=0, w*=23, 1 instance
i-bound
2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
MCIBP
CPCS 422, evid=10, w*=23, 1 instance
i-bound
2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
MCIBP
Grid 15x15, evid=0, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
NH
D
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
MCIBP
Grid 15x15 - 0 evidenceGrid 15x15, evid=0, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
MCIBP
Grid 15x15, evid=0, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Rel
ativ
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
MCIBP
Grid 15x15, evid=0, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Tim
e (s
eco
nds)
0
2
4
6
8
10
12
MCIBP
Grid 15x15 - 10 evidenceGrid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
NH
D
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.00
0.01
0.02
0.03
0.04
0.05
0.06
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Rel
ativ
e er
ror
0.00
0.02
0.04
0.06
0.08
0.10
0.12
MCIBP
Grid 15x15, evid=10, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Tim
e (s
eco
nds)
0
2
4
6
8
10
12
MCIBP
Grid 15x15 - 20 evidenceGrid 15x15, evid=20, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
NH
D
0.001
0.01
0.1
1
MCIBPGibbs Sampling
Grid 15x15, evid=20, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Abs
olut
e er
ror
0.001
0.01
0.1
1
MCIBPGibbs Sampling
Grid 15x15, evid=20, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Rel
ativ
e er
ror
0.001
0.01
0.1
1
MCIBPGibbs Sampling
Grid 15x15, evid=20, w*=22, 10 instances
i-bound
0 2 4 6 8 10 12 14 16 18
Tim
e (s
eco
nds)
0
2
4
6
8
10
MCIBPGibbs Sampling
Conclusion
MC extends the partition based approximation from mini-buckets to general tree decompositions for the problem of belief updating
Empirical evaluation demonstrates its effectiveness and superiority (for certain types of problems, with respect to the measures considered) relative to other existing algorithms