sum-product and related algorithms for inference
TRANSCRIPT
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Sum-product and related algorithms forinference
Manuel Yguel1
Person in charge: Olivier Aycard2
1Institut National Polytechnique de Grenoble2Université Joseph Fourier, Grenoble
Master II, IVR, 3I, I.C.A.
1 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
2 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Probabilistic modelling
The modelling of phenomenons is almost surely uncertain:• communication between entities is subject to random
perturbations,• records of sensors are uncertain:
pixels of an image, range measurements of a laserrange-finder, etc.
• knowledge are approximative:camera extrinsec and intrinsec parameters, sensor orrobot localization, goals of people, etc.
• algorithms are approximative:approximations for real-time, first-order approximationsfor optimization and control, numerical precision, etc.
3 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
4 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Framework
• x1, x2, . . . , xn is a set of variables,• ∀i , xi takes on values in some (usually finite) domain (or
alphabet) Ai ,• let g(x1, . . . , xn) be a [0; 1]-valued function of x1, . . . , xn,
g is called the joint probabilistic distribution function(JPDF).
• the domain of g is S = A1 × A2 × . . .× An and is calledthe configuration space,
• each element of S is a particular configuration of thevariables, also called an event.
5 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Example: the robot start problem
The robot does not start. The possible causes are:1 the battery is down,2 a wire is disconnected,
Furthermore observation can be made on the batteryvoltage. 4 variables can be defined:
Variable Alphabet or domainStart? {yes, no}
Power State? {up, down}Connected? {connected, disconnected}
Voltage Measure {[iV ; (i + 1)V [|i = 0, · · · , 199}.e = (no, up, disconnected, [24V ; 25V [) is a configuration ofthe 4 variables.g(e) ∈ [0; 1] is defined for all possible event it is also calledP(e) as a probability.
6 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
7 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Variable partition
For each problem: the set of variables is partitionned intothree subsets:
1 the set of questionned variables Q,2 the set of known variables K, (possibly empty),3 the set of unknown variables U , (possibly empty).
8 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Example: the robot start problem
Power State evaluation:1 questionned variables Q = { Power State? },2 known variables K = {Start? , Voltage Measure },3 unknown variables U = {Connected? }.
Connection evaluation:1 questionned variables Q = { Connected? },2 known variables K = {Start? , Voltage Measure },3 unknown variables U = { Power State? }.
9 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Goal
The goal of a probabilistic model is to calculate theconditional jpdf
B := P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q))
where ∀(i , j), xq(i) ∈ Q and xk(j) ∈ KIt is a set of functions, each one indexed by one differentconfiguration of the known variables:
(ak(1), . . . , ak(q)) 7−→ (pdf : Sq(1) × . . .× Sq(p) −→ [0; 1])
10 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Example: the robot start problem
Power State evaluation:
B1 := P(PS|St, VM)
The variables Start? , Power State? , Connected? , VoltageMeasure are abreviated St , PS , C , VM respectively.
For each Start? and Voltage Measure configuration(∈ {yes, no} × [0.0V ; 200V ]) it defines a probabilisticfunction over the possible values of Power State? .
(no, 24V ) 7−→
0
0.2
0.4
0.6
0.8
1
up down
11 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Use of probabilistic definitions
brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%
green eyes 6% 2% 6% 6%
12 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Use of probabilistic definitions
brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%
green eyes 6% 2% 6% 6%
Marginal probability: calculating the probability of having blondhair.
P(blond hair) =∑
eye colorP(blond hair, eye color) = 18%
brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%
green eyes 6% 2% 6% 6%
12 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Use of probabilistic definitions
brown hair blond hair red hair light brown hairbrown eyes 22% 5% 3% 15%blue eyes 8% 11% 9% 7%
green eyes 6% 2% 6% 6%
Conditional probability on eyes having blond hair:P(eye color|blond hair).
blond hairbrown eyes 5%
18%
blue eyes 11%18%
green eyes 2%18%
12 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Probabilistic definitions
Conditional probability:P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q)) =
P(xq(1),...,xq(p),xk(1),...,xk(q))
P(xk(1),...,xk(q))
Marginalization (also called sum rule):• P(xq(1), . . . , xq(p), xk(1), . . . , xk(q))
=∑
(au(1),...,au(r))∈
Au(1)×...×Au(r)
g(x1, . . . , xn)
• P(xk(1), . . . , xk(q))
=∑
(aq(1),...,aq(p))∈
Aq(1)×...×Aq(p)
∑(au(1),...,au(r))∈
Au(1)×...×Au(r)
g(x1, . . . , xn)
13 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Expanded expression of the goal function
Following conditional and marginal probability definitions Bequal: ∑
(au(1),...,au(r))
g(x1, . . . , xn)∑(aq(1),...,aq(p))
∑(au(1),...,au(r))
g(x1, . . . , xn)
14 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Example: the robot start problem
Power State evaluation:
B1 := P(PS|St, VM)
=
∑c∈{connected,disconnected}
g(St, PS, c, VM)
∑ps∈{up,down}
∑c∈{connected,disconnected}
g(St, ps, c, VM)
15 / 64
Factor GraphsAlgorithms
Motivation
JPDFDefinitions
Definitions and rulesfor JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithm
Worst inference complexity
Each variable takes values in a finite alphabet of size K ,K p+r sums are required,p and r variables in the questionned and unknown sets(resp.).
EXPONENTIAL COMPLEXITY in the number of variables.
16 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
17 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Product rule
Let o(i), i ∈ {1, . . . , n} any permutation of the variablesindices,
g(x1, . . . , xn) = P(xo(1))n∏
i=2
P(xo(i)|xo(1), . . . , xo(i−1))
(easy to demonstrate by recursion: just replace conditionalprobabilities by their definition)
Example: the robot start problemg(St, PS, C, VM)= P(PS)P(C|PS)P(VM|PS, C)P(St|PS, C, VM)
18 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
19 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Probabilistic independence and conditionalindependence
Two variables xi and xj are said independent if and only if:
∀(ai , aj) ∈ Ai × Aj , p(ai , aj) = p(ai)p(aj)
Two variables xi and xj are said conditionnaly independentgiven xk if and only if:∀(ai , aj , ak ) ∈ Ai × Aj × Ak , p(ai , aj |ak ) = p(ai |ak )p(aj |ak )
20 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Example: the robot start problem
Most of the case, a lot of independencies or conditionalindependencies arise as reasonable hyptothesis in aprobabilistic modelling.Reasonable hypothesis:• Power State? and Connected? are independent• Voltage Measure and Connected? are conditionnally
independent given Power State?• Start? and Voltage Measure are conditionnally
independent given Power State? and Connected? .
g(St, PS, C, VM)
= P(PS)P(C|PS/////////////////)P(VM|PS, C/////////)P(St|PS, C, VM///////////////////)
= P(PS)P(C)P(VM|PS)P(St|PS, C)
21 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
1 Power State? and Connected? are independent:P(PS, C) = P(PS)P(C).
P(C|PS) =P(C, PS)
P(PS)=
P(PS)//////////////////////////////////P(C)
P(PS)//////////////////////////////////
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:
P(VM|PS, C) =P(VM, PS, C)
P(PS, C)
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:
P(VM|PS, C) =P(VM, PS, C)
P(PS, C)
Use of product rule:
P(VM, PS, C) = P(PS)P(VM, C|PS)
= P(PS)P(VM|PS)P(C|PS)
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).Use of conditional probability definition:
P(VM|PS, C) =P(VM, PS, C)
P(PS, C)
P(VM|PS, C) =P(PS)P(VM|PS)P(C|PS)
P(PS, C)
P(VM|PS, C) =
P(PS)//////////////////////////////////P(VM|PS)
P(PS, C)/////////////////////////////////////////////////
P(C, PS)/////////////////////////////////////////////////
P(PS)//////////////////////////////////
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
2 Voltage Measure and Connected? are conditionnallyindependent given Power State? :P(VM, C|PS) = P(VM|PS)P(C|PS).
P(VM|PS, C) = P(VM|PS)
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
the robot start problem
Simple substitutions gives this simplifications from thehypothesis:
3 Start? and Voltage Measure are conditionnallyindependent given Power State? and Connected? :P(St, VM|PS, C) = P(St|PS, C)P(VM|PS, C).Same as for hypothesis (2) by replacing St by C andthe group PS, C by PS.
22 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDFProduct rule
Independencies
Graphicalmodels
Sum-ProductAlgorithm
Independencies immediate utility: memory gain
If each variable takes values in a finite alphabet of size Kand no independence assumption is made:g(x1, . . . , xn) required a grid of size K n.The memory size of P(xi |x1, . . . , xi−1) is K × K i−1 = K i .If p conditional indepence assumptions are made thememory size reduced to: K × K i−1−p = K i−p
no independencies with independenciesg(St, PS, C, VM) P(PS)P(C)P(VM|PS)P(St|PS, C)
23 ∗ 200 = 1600 2 + 2 + 200× 2 + 2 ∗ 4 = 412
23 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
24 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
P(PS)P(C|PS)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
P(PS)P(C|PS)P(VM|PS, C)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
Full graph:
P(PS)P(C|PS)P(VM|PS, C)P(St|PS, C, VM)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
P(PS)P(C|PS/////////////////)P(VM|PS, C)P(St|PS, C, VM)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
P(PS)P(C)P(VM|PS, C/////////)P(St|PS, C, VM)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
P(PS)P(C)P(VM|PS, )P(St|PS, C, VM///////////////////)
Power State?
VoltageMeasure
Start?
Connected?
1
25 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Bayesian networks (BN): definition (1)
• BN are Directed Acyclic Graphs (DAGs) that expressesa certain factorization of a JPDF.
• The graph as a polytree structure: it is possible todefine an order o over the nodes.If there is a directed path from xi to xj in the graph theno(j) > o(i).Let’s x1, . . . , xn ordered following o.
26 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Bayesian networks: definition (2)
• o is used in the factorization of the JPDF induced bythe product rule:
g(x1, . . . , xn) = P(x1)n∏
i=2
P(xi |x1, . . . , xi−1)
If there is no edge from xk to xj then in the factorP(xj |x1, . . . , xj−1) xk can be simplified at the right handside:
P(xj |x1, . . . , xk , . . . , xj−1) := P(xj |x1, . . . , xk////////////, . . . , xj−1)
27 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Bayesian networks: definition (3)
• The JPDF is equivalently defined by the BN or thefollowing factorization:
g(x1, . . . , xn) :=n∏
j=1
P(xj |paj)
where paj is the set of parents of xj . It is the set ofvariables xi such that there exists an edge from xi to xjin the BN.
28 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Interests and drawbacks of Bayesian networks
+ each term of the factorization is a probabilitydistribution:
1 clear semantic,2 normalized;
− does not represent each possible decomposition withprobability distributions:
P(A, B, C, D, E) = P(C)P(D|C)P(A, B|C, D)P(E |C, B);
− does not represent each possible factorization of theJPDF.
29 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
30 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Factor Graphs
Hypothesis: g(x1, . . . , xn) factors into a product of severallocal functions, each having some subset of {x1, . . . , xn} asarguments:
g(x1, . . . , xn) =∏j∈J
fj(Xj)
where J is a discrete index set, Xj is a subset of {x1, . . . , xn}and fj(Xj) is a function that depends only on the variables inXj .If Xj = {v1, . . . , vp}, fj(Xj) = fj(v1, . . . , vp).
Factor graphs represent all possible factorizations of theJPDF.
31 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Factor Graphs
Hypothesis: g(x1, . . . , xn) factors into a product of severallocal functions, each having some subset of {x1, . . . , xn} asarguments:
g(x1, . . . , xn) =∏j∈J
fj(Xj)
where J is a discrete index set, Xj is a subset of {x1, . . . , xn}and fj(Xj) is a function that depends only on the variables inXj .If Xj = {v1, . . . , vp}, fj(Xj) = fj(v1, . . . , vp).
Factor graphs represent all possible factorizations of theJPDF.
31 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Factor Graphs
Sometimes factor graphs for JPDF are expressed asfollows:
g(x1, . . . , xn) =1Z
∏j∈J
fj(Xj)
where Z =∑
(x1,··· ,xn)
∏j∈J fj(Xj), such that g is normalized.
It is possible to consider a special factor node: f0 = 1Z with
X0 = ∅ so that f0 is not linked to any variable node.
In those cases: factors are not necessarily normalizedanymore and thus are not necessarily probabilisticdistributions.
32 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Factor Graphs
Sometimes factor graphs for JPDF are expressed asfollows:
g(x1, . . . , xn) =1Z
∏j∈J
fj(Xj)
where Z =∑
(x1,··· ,xn)
∏j∈J fj(Xj), such that g is normalized.
It is possible to consider a special factor node: f0 = 1Z with
X0 = ∅ so that f0 is not linked to any variable node.
In those cases: factors are not necessarily normalizedanymore and thus are not necessarily probabilisticdistributions.
32 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Factor Graphs
Definition: a factor graph is a bipartite graph that expressesthe structure of the factorization hypothesis. A factor graph
has a variable node xi for each variable xi and factor
node fjfor each local function fj . The nodes of the
graph only connect a variable node to a factor node(bipartite property). A variable node xi is edge-connected toa factor node fj if and only if xi is an argument of fj or xi ∈ Xj .
33 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
GraphicalmodelsBayesian networks
Factor Graphs
Sum-ProductAlgorithm
Example: the robot start problem
VoltageMeasure
Power State? Start? Connected?
P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )
34 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Bayesian inference for factor graphs: thesum-product algorithm
Reminder: the goal of a probabilistic model is to calculatethe conditional jpdf
B := P(xq(1), . . . , xq(p)|xk(1), . . . , xk(q))
NOW: exploit factorization of the JPDF to speed upbayesian inference.
35 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Factorization property
Idea: reorganize sums of products into products of sumsfollowing the distributive law:
ab + ac = a(b + c)
36 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Factorization property
Idea: reorganize sums of products into products of sumsfollowing the distributive law:
ab + ac = a(b + c)
2 MULT, 1 ADD become 1 MULT, 1 ADD
36 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Factorization property
Idea: reorganize sums of products into products of sumsfollowing the distributive law:
ab + ac + ad + ae + · · ·+ az = a(b + c + · · ·+ z)
25 MULT, 25 ADD become 1 MULT, 25 ADD
36 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Outline
1 Motivation
2 JPDFDefinitionsDefinitions and rules for JPDF
3 Factorization of JPDFProduct ruleIndependencies
4 Graphical modelsBayesian networksFactor Graphs
5 Sum-Product AlgorithmSingle marginal function
Marginal for a chainMarginal for a tree
37 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainChain definition (1)
A path without cycle or a chain describes a JPDF in whicheach variable has at most one parent and at most one child,in a BN point of view. It leads to the following factor graph:
g(x1, . . . , xn) = P(x1)n∏
j=2
P(xj |xj−1)
= f1(x1)f2(x1, x2) · · · fn(xn, xn−1)
38 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainChain definition (2)
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
Definition: a chain without cycle is a sequence of verticesand edges in a graph:
c = v0, e1, v1, e2, · · · , vn−1, en, vn
such that the edge ei joins the vertices vi−1 and vi and thateach vertex appears only one time.
39 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainSum rearrangements and message definition (1)
Example: g(x1, . . . , x6) = f1(x1)∏6
j=2 fj(xj , xj−1).
Marginal for x4:
P(x4) =∑
x1,x2,x3,x5,x6
g(x1, . . . , x6) =∑∼{x4}
g(x1, . . . , x6)
where the notation ∼ {xi} stands forx1, · · · , xi−1, xi+1, · · · , xn i.e. all variables except xi .
40 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorization
P∼ {x4}
g(x1, . . . , x6) =
Px1, x2, x3
f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)
8<: Px5, x6
f5(x5, x4)f6(x6, x5)
9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorization
P∼ {x4}
g(x1, . . . , x6) =
8<: Px1, x2, x3
f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)
9=;8<: P
x5, x6
f5(x5, x4)f6(x6, x5)
9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorization
P∼ {x4}
g(x1, . . . , x6) =8<: Px1, x2, x3
f1(x1)f2(x2, x1)f3(x3, x2)f4(x4, x3)
9=; ×
8<: Px5, x6
f5(x5, x4)f6(x6, x5)
9=;µα(x4) × µβ(x4)
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µα(x4) µβ(x4)
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorizationP
∼ {x4}g(x1, . . . , x6) =8<: P
x3
f4(x4, x3)
8<: Px1, x2
f3(x3, x2)f1(x1)f2(x2, x1)
9=;9=;
×
8<: Px5, x6
f5(x5, x4)f6(x6, x5)
9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µα(x4) µβ(x4)
µα(x3)
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorizationP
∼ {x4}g(x1, . . . , x6) =8<: P
x3
f4(x4, x3)
8<: Px1, x2
f3(x3, x2)f1(x1)f2(x2, x1)
9=;9=;
×
8<: Px5
f5(x5, x4)
8<: Px6
f6(x6, x5)
9=;9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µα(x4) µβ(x4)
µα(x3) µβ(x5)
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainx4 is a pivot for factorizationP
∼ {x4}g(x1, . . . , x6) =8<: P
x3
f4(x4, x3)
8<: Px2
f3(x3, x2)
8<: Px1
f1(x1)f2(x2, x1)
9=;9=;
9=;×
8<: Px5
f5(x5, x4)
8<: Px6
f6(x6, x5)
9=;9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µα(x4) µβ(x4)
µα(x3) µβ(x5)
µα(x2)
41 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainGraphical definition of messages using recursionP
∼ {x4}g(x1, . . . , x6) =8<: P
x3
f4(x4, x3)
8<: Px2
f3(x3, x2)
8<: Px1
f1(x1)f2(x2, x1)
9=;9=;
9=;×
8<: Px5
f5(x5, x4)
8<: Px6
f6(x6, x5)
9=;9=;
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µα(x4) µβ(x4)µα(x3) µβ(x5)µα(x2)
42 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainMathematical insight of the message operations (1)
•∑
∼ {x4}g(x1, . . . , x6) = µα(x4)× µβ(x4)
• x4 can take K values: {a14, . . . , aK
4 }• the product of two messages is a vector: µα(x4)
1 × µβ(x4)1
...µα(x4)
K × µβ(x4)K
43 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainMathematical insight of the message operations (2)
• µβ(x4) =
∑x5
f5(x5, x4)
∑x6
f6(x6, x5)
• x5 can take P values: {a15, . . . , aP
5 }• (f5(x5, x4) is discretized) the next message is obtained
by a matrix vector operation:
26664µβ(x4)
1
...µβ(x4)
K
37775=
26666664f5(a1
5, a14) f5(a2
5, a14) . . . f5(aP
5 , a14)
f5(a15, a2
4) . . . f5(aP5 , a2
4)...
. . ....
f5(a15, aK
4 ) f5(a25, aK
4 ) . . . f5(aP5 , aK
4 )
37777775
26666664µβ(x5)
1
µβ(x5)2
...µβ(x5)
P
37777775
44 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainMathematical insight of the message operations (2)
• µβ(x4) =
∑x5
f5(x5, x4)µβ(x5)
• x5 can take P values: {a1
5, . . . , aP5 }
• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:
26664µβ(x4)
1
...µβ(x4)
K
37775=
26666664f5(a1
5, a14) f5(a2
5, a14) . . . f5(aP
5 , a14)
f5(a15, a2
4) . . . f5(aP5 , a2
4)...
. . ....
f5(a15, aK
4 ) f5(a25, aK
4 ) . . . f5(aP5 , aK
4 )
37777775
26666664µβ(x5)
1
µβ(x5)2
...µβ(x5)
P
37777775
44 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainMathematical insight of the message operations (2)
• µβ(x4) =
∑x5
f5(x5, x4)µβ(x5)
• x5 can take P values: {a1
5, . . . , aP5 }
• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:
26664µβ(x4)
1
...µβ(x4)
K
37775=
26666664f5(a1
5, a14) f5(a2
5, a14) . . . f5(aP
5 , a14)
f5(a15, a2
4) . . . f5(aP5 , a2
4)...
. . ....
f5(a15, aK
4 ) f5(a25, aK
4 ) . . . f5(aP5 , aK
4 )
37777775
26666664µβ(x5)
1
µβ(x5)2
...µβ(x5)
P
37777775
44 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainMathematical insight of the message operations (2)
• µβ(x4) =
∑x5
f5(x5, x4)µβ(x5)
• x5 can take P values: {a1
5, . . . , aP5 }
• (f5(x5, x4) is discretized) the next message is obtainedby a matrix vector operation:
26664µβ(x4)
1
...µβ(x4)
K
37775=
26666664f5(a1
5, a14) f5(a2
5, a14) . . . f5(aP
5 , a14)
f5(a15, a2
4) . . . f5(aP5 , a2
4)...
. . ....
f5(a15, aK
4 ) f5(a25, aK
4 ) . . . f5(aP5 , aK
4 )
37777775
26666664µβ(x5)
1
µβ(x5)2
...µβ(x5)
P
37777775
44 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainFactor node information representation
In a discretized factor node: the matrix f5(x5, x4) is stored.
K
f5(a15, a1
4) f5(a25, a1
4) . . . f5(aP5 , a1
4)f5(a1
5, a24) . . . f5(aP
5 , a24)
.... . .
...f5(a1
5, aK4 ) f5(a2
5, aK4 ) . . . f5(aP
5 , aK4 )
︸ ︷︷ ︸
P
45 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainInference complexity
For discretized variables (all with K cases):26666664µβ(x4)
1
µβ(x4)2
...µβ(x4)
K
37777775=
26666664f5(a1
5, a14) f5(a2
5, a14) . . . f5(aK
5 , a14)
f5(a15, a2
4) . . . f5(aK5 , a2
4)...
. . ....
f5(a15, aK
4 ) f5(a25, aK
4 ) . . . f5(aK5 , aK
4 )
37777775
26666664µβ(x5)
1
µβ(x5)2
...µβ(x5)
K
37777775
• for a message: K 2 sums and K 2 products,• marginalizing over one variable among N: N − 1
messages,inference for a chain:
O((N − 1)K 2) operations
to compare with O(K N−1) in the general case.
46 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a chainContinuous factor nodes
Where a variable is continuous, sums becomes integrals.
−→ computational overhead
BUT only the definition of the functional f5(x5, x4) is needed(instead of matrices).
Example: f5(x5, x4) = N (x4, σ4)(x5) = 1σ4√
2πe
12 (
x5−x4σ4
)2
Warning: at factor nodestorage = function code definition + parameters (here σ4)
47 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeSum rearrangements (1)
Extension of the chain algorithm to a tree is possible.Let the graph of g(x1, · · · , xn) =
∏j fj(Xj) be a tree.
For a marginal over xi :
P(xi) =∑
∼ {xi}
∏j
fj(Xj)
pick up a variable node, xi , as the root of the tree (that’salways possible with trees).
48 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
For a marginal calculation over Start?:
VoltageMeasure
Power State? Start? Connected?
P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )
49 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
For a marginal calculation over Start?:
VoltageMeasure
Power State? Start? Connected?
P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )
49 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
For a marginal calculation over Start?:VoltageMeasure
Power State? Start? Connected?
P(VM |PS ) P(PS ) P(St |PS ,C ) P(C )
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
49 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeSum rearrangements (2)
Let st(xi) the set of all the subtrees connected to xi .They are disjoint subtrees and then:
P(xi) =∑
∼ {xi}
∏j
fj(Xj) =∑
∼ {xi}
∏s∈st(xi )
Fs(xi , Ys)
• Ys is the set of all the variables in the subtree s,• Fs(xi , Ys) is the product of all the factors in the subtree
s,• in a tree there is at most one path that link one node to
another, so
∀(s1, s2) ∈ st(xi)2, Xs1 ∩ Xs2 = ∅
the factors of different subtree work on disjointvariables.
50 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeSum rearrangements (2)
Let st(xi) the set of all the subtrees connected to xi .They are disjoint subtrees and then:
P(xi) =∑
∼ {xi}
∏j
fj(Xj) =∑
∼ {xi}
∏s∈st(xi )
Fs(xi , Ys)
• Ys is the set of all the variables in the subtree s,• Fs(xi , Ys) is the product of all the factors in the subtree
s,• in a tree there is at most one path that link one node to
another, so
∀(s1, s2) ∈ st(xi)2, Xs1 ∩ Xs2 = ∅
the factors of different subtree work on disjointvariables.
50 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
∑∼{St}
g(St,PS,C,VM)=∑
∼{St}{P(VM|PS)}{P(C)P(St|PS,C)}{P(PS)}
51 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
∑∼{St}
g(St,PS,C,VM)=∑
∼{St}Fj (VM,PS)Fb(St,PS,C)Fg(PS)
51 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
∑∼{St}
g(St,PS,C,VM)=∑
∼{St}fj (VM,PS)fb2(C)fb1(St,PS,C)fg(PS)
51 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeExample: the robot start problem
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
∑∼{St}
g(St,PS,C,VM)=∑
∼{St}{P(VM|PS)}{P(C)P(St|PS,C)}{P(PS)}
51 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeSum rearrangements (3)
As the factors of different subtree work on disjoint variables,it is possible to exchange sums and product locally:
P(xi) =∑
∼ {xi}
∏j
fj(Xj) (1)
=∑
∼ {xi}
∏s∈st(xi )
Fs(xi , Ys) (2)
=∏
s∈st(xi )
∑Ys
Fs(xi , Ys) (3)
=∏
s∈st(xi )
µFs→xi (4)
where µFs→xi :=∑Ys
Fs(xi , Ys).
52 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeFactor to node messages definition (1)
And each subtree s is connected to the node variablethrough a unique factor node fs due to the bipartite propertyof factor graphs such as the message from factor s to node iis defined as:
µfs→xi := µFs→xi .
VoltageMeasure
Power State?
Start? Connected?
P(VM |PS )
P(PS )
P(St |PS ,C )
P(C )
µfj→PS
µfb→PS
µfv→PS
53 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (1)
For each subtree, the processus of pushing the sumsdeeper is continued:
µfs→xi :=∑Ys
Fs(xi , Ys)
=∑Ys
fs(xi , Xs)∏
m∈st(fs)
Fm(Ym)
where:• st(fs) is the set of all the subtrees connected to the
factor node fs;• each subtree m is connected to fs through a unique
variable node xm;• Fm(Ym) is the product of all the factors in the subtree m;• Ym is the set of all the variables in the subtree m.
54 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (2)
µfs→xi :=∑Ys
Fs(xi , Ys)
=∑Ys
fs(xi , Xs)∏
m∈st(fs)
Fm(Ym)
=∑
Xs\{xi}fs(xi , Xs)
∏m∈st(fs)
∑Xm\Xs
Fm(Ym)
=∑
Xs\{xi}fs(xi , Xs)
∏m∈st(fs)
µxm→fs
µxm→fs :=∑
Ym\Xs
Fm(Ym) is the message from node m to
factor s.
55 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (2)
µfs→xi :=∑Ys
Fs(xi , Ys)
=∑Ys
fs(xi , Xs)∏
m∈st(fs)
Fm(Ym)
=∑
Xs \ {xi}fs(xi , Xs)
∏m∈st(fs)
∑Ym\Xs
Fm(Ym)
=∑
Xs \ {xi}fs(xi , Xs)
∏m∈st(fs)
µxm→fs
µxm→fs :=∑
Ym\Xs
Fm(Ym) is the message from node m to
factor s.
55 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (3)
Ys =⋃
m∈st(fs) Ym et Ym \ Xs = Ym \ {xm}
xa xb xt
f1
f2 f3 fi fj
56 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeFactor to node messages definition (2)
Finally:from the previous expansion, factor to node messages fromfs to xi can be written completely recursively from xi , Xs andthe messages from other nodes than xi to fs.
µfs→xi :=∑Ys
Fs(xi , Ys)
=∑
Xs \ {xi}fs(xi , Xs)
∏m∈Xs\{xi}
µxm→fs
57 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (4)
It is possible to expand the messages from node to factor asdone previously.
µxm→fs :=∑
Ym\{xm}Fm(Ym)
As:• Fm(Ym) =
∏k∈st(xm) Fk (Yk ) considering all subtrees
attached to the variable node xm• Ym \ {xm} =
⋃k Yk
• all the set of variables of the subtrees are disjoint
∑Ym\{xm}
Fm(Ym) =∏
k∈st(xm)
∑Yk
Fk (Yk )
=∏
k∈st(xm)
µfk→xm
58 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (4)
As:• Fm(Ym) =
∏k∈st(xm) Fk (Yk ) considering all subtrees
attached to the variable node xm
• Ym \ {xm} =⋃
k Yk
• all the set of variables of the subtrees are disjoint
µxm→fs =∏
k∈st(xm)
µfk→xm
The recursion is done, node to factor messages are productof factor to node messages.
58 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeNode to factor messages definition (5)
It is worth noticing that the factors in the subtrees attachedto xm are all the factors attached to xm except the precedentone in the path: fs.In general we note ne(v) the set of all the neighbour nodesof the node v in the factor graph.So that:
µxm→fs =∏
fk∈ne(xm)\fs
µfk→xm
59 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Marginal for a treeFactor graph messages formulae: update rules
FACTOR TO NODE MESSAGE FORMULA:
µfs→xi =∑
xm∈ne(fs)\{xi}fs(xi , Xs)
∏xm∈ne(fs)\{xi}
µxm→fs
NODE TO FACTOR MESSAGE FORMULA:
µxm→fs =∏
fk∈ne(xm)\fs
µfk→xm
60 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Factor graph messages definitionEnd messages
Definition of end messages:
• for node to factor message:
a vector of one
1...1
x f
µx→f (x) = 1
• for factor to node message:
a vector of function values
fi(a1l )
...fi(aK
l )
xf
µf→x(x) = f(x)
61 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Sum-Product algorithm for trees
1 start from leaves: brodcast end messages to therespective neighbours,
2 apply message update rule recursively,3 in the root: multiply each incomming message.
62 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Exercise: factor graph message definition for achain
Chain example:
x1 x2 x3 x4 x5 x6
f1 f2 f3 f4 f5 f6
µαf1→x1
µαf2→x2
µαf3→x3
µαf4→x4 µβ
f6→x5µβ
f5→x4
µαx1→f2
µαx2→f3
µαx3→f4 µβ
x6→f6µβ
x5→f5
63 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Solution: factor graph message definition for achain
Factor to node message for a chain:
µβfi→xi−1
:=∑xi
fi(xi , xi−1)µβxi→fi
It is a sum of product, no real simplification. Except, there isonly no product with all the differents children of fi .Node to factor message for a chain:
µβxi→fi−1
:= µβfi+1→xi
In this case there is a big simplification: the message isexactly that send by the unique child. Again: there is only noproduct with all the differents children of xi .
64 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Solution: factor graph message definition for achain
Factor to node message for a chain:
µβfi→xi−1
:=∑xi
fi(xi , xi−1)µβxi→fi
It is a sum of product, no real simplification. Except, there isonly no product with all the differents children of fi .Node to factor message for a chain:
µβxi→fi−1
:= µβfi+1→xi
In this case there is a big simplification: the message isexactly that send by the unique child. Again: there is only noproduct with all the differents children of xi .
Same remarks hold for α messages.64 / 64
Factor GraphsAlgorithms
Motivation
JPDF
Factorizationof JPDF
Graphicalmodels
Sum-ProductAlgorithmSingle marginalfunction
Marginal for a chain
Marginal for a tree
Bibliography I
• Kschischang, Frey, Loeliger, Factor Graphs and theSum-Product Algorithm (2001)http://citeseer.ist.psu.edu/kschischang01factor.html
• Christopher M. Bishop, Pattern Recognition andMachine Learning, chapter 8, Springer (2006)http://research.microsoft.com/ cmbishop/PRML/Bishop-PRML-sample.pdf
• David J.C. MacKay (2003). Message Passing andExact Marginalization in Graphs. In David J.C. MacKay,Information Theory, Inference, and LearningAlgorithms, pp. 241-247, pp. 334-340. Cambridge:Cambridge University Press.
65 / 64