recall the chain rule: p(v1 and v2 and ~v3 and v4 and ~v5) =...
TRANSCRIPT
Recall the chain rule:
P(v1 and v2 and ~v3 and v4 and ~v5)
= P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4)
P(v1,v2) P(v1,v2,~v3)
P(v1,v2,~v3,v4)
P(v1,v2,~v3,v4,~v5)
P(v1 and v2 and ~v3 and v4 and ~v5)
= P(v4)P(v2|v4)P(~v3|v4,v2)P(v1|v4,v2,~v3)P(~v5|v4,v2,~v3,v1)
Assume Vi a binary valued variable (T or F)
A factorization ordering
An alternative ordering
Consider an ordering:
P(v1 and v2 and ~v3 and v4 and ~v5)
= P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4)
Assume the following conditional independencies:
P(v1) P(v2|v1) = P(v2) and P(v2|~v1) = P(v2), P(~v2|v1) = P(~v2), P(~v2|~v1) = P(~v2)
P(~v3|v1,v2) = P(~v3|v1) and P(~v3|v1,~v2) = P(~v3|v1), P(~v3|~v1,v2) = P(~v3|~v1), P(~v3|~v1,~v2) = P(~v3|~v1), P(v3|v1,v2) = P(v3|v1), P(v3|v1,~v2) = P(v3|v1), P(v3|~v1,v2) = P(v3|~v1), P(v3|~v1,~v2) = P(v3|~v1)
P(v4|v1,v2,~v3) = P(v4|v2, ~v3) and ……
P(~v5|v1,v2,~v3,v4) = P(~v5|~v3) and …..
A factorization ordering
v2 independent of v1
v3 independent of v2 conditioned on v1
ThenP(v1 and v2 and ~v3 and v4 and ~v5) = P(v1)P(v2|v1)P(~v3|v1,v2)P(v4|v1,v2,~v3)P(~v5|v1,v2,~v3,v4) = P(v1)P(v2)P(~v3|v1)P(v4|v2,~v3)P(~v5|~v3)
How many probabilities need be stored? P(v1), P(~v1) 2 probabilities (actually only one, since P(~v1) = 1 – P(v1))
2 probabilities (or 1) instead of 4 (or 2)
P(v2|v1) = P(v2) and P(v2|~v1) = P(v2), P(~v2|v1) = P(~v2), P(~v2|~v1) = P(~v2) P(~v3|v1) = 1 – P(v3|v1)
P(~v3|v1,v2) = P(~v3|v1) 4 probabilities (or 2) instead of 8 (or 4)
and P(~v3|v1,~v2) = P(~v3|v1), P(~v3|~v1,v2) = P(~v3|~v1), P(~v3|~v1,~v2) = P(~v3|~v1), P(v3|v1,v2) = P(v3|v1), P(v3|v1,~v2) = P(v3|v1), P(v3|~v1,v2) = P(v3|~v1), P(v3|~v1,~v2) = P(v3|~v1) 8 probabilities (or 4) instead of 16 (or 8)
P(v4|v1,v2,~v3) = P(v4|v2, ~v3) and …… 4 probabilities (or 2) instead of 32 (or 16)
P(~v5|v1,v2,~v3,v4) = P(~v5|~v3) and …..
A Bayesian Network is a graphical representation of a joint probability distribution with (conditional) independence relationships made explicit
For a particular factorization ordering, construct a Bayesian network as follows(p. 440 of text):
P(v1), P(~v1) v1 P(v1) = 0.75P(~v1) = 0.25 = 1 – P(v1)
Since P(v2|v1) = P(v2) ….
v1 a “root”
v2 is second variable in ordering. If v2 independent of a subset of its predecessors(possibly the empty set) in ordering conditioned on a disjoint subset of predecessors(including possibly all its predecessors), then the latter subset is its parents, elsev2 is a “root”
v1P(v1) v2 P(v2)
v3 is third variable in ordering. Since P(v3|v1,v2) = P(v3|v1), …:
v1P(v1) v2 P(v2)
v3P(v3|v1)P(v3|~v1)P(~v3|v1) = 1 – P(v3|v1)P(~v3|~v1) = 1 – P(v3|~v1)
Since P(v4| v1, v2, v3) = P(v4 | v2, v3), …
v1P(v1) v2 P(v2)
v3P(v3|v1)P(v3|~v1)
v4
P(v4|v2, v3), P(~v4|v2,v3) = 1-P(v4|v2,v3)P(v4|v2,~v3), …P(v4|~v2, v3), …P(v4|~v2, ~v3), …
Since P(v5|v1,v2, v3, v4) = P(v5|v3),…:
v1P(v1) v2 P(v2)
v3P(v3|v1)P(v3|~v1)
v4P(v4|v2, v3),P(v4|v2,~v3),P(v4|~v2, v3),P(v4|~v2, ~v3)
v5P(v5|v3)P(v5|~v3)
Components of a Bayesian Network: a topology (graph) that qualitatively indicates displays the conditional independencies, and probability tables at each node
Semantics of graphical component: for each variable, v, v is independent of all of its non-descendents conditioned on its parents
Where does knowledge of conditional independence come from?
a) From data. Consider congressional voting records. Suppose that we have data on House votes (and political party). Suppose variables are ordered Party, Immigration, StarWars, ….
Party P(Republican) = 0.52 (226/435 Republicans 209/435 Democrats)
To determine relationship between Party and Immigration, we count
Actual Counts Predicted Counts (if Immigration and Immigration Party independent) Yes No Yes No Republican 17 209 Republican 92 134 Democrat 160 49 Democrat 85 124
P(Rep)*P(Yes) * 435 = 0.52 * (17+160)/435 * 435
Very different distributions – conclude dependent
Party P(Republican) = 0.52 (226/435 Republicans 209/435 Democrats)
Actual Counts Immigration Yes No Republican 17 209 Democrat 160 49
ImmigrationP(Yes| Rep) = 0.075P(Yes|Dem) = 0.765
17/226
Consider StarWars Is StarWars independent of Party and Immigration? (i.e., is P(StarWars|Party, Immigration) approx equal P(StarWars) for all combinations of variable values?) if yes, then stop and make StarWars a “root”, else continue Is StarWars independent of Immigration conditioned on Party? if yes, then stop and make Immigration a child of Party, else continue Is StarWars independent of Party conditioned on Immigration? if yes, then stop and make Immigration a child of Immigration, else continue Make StarWars a child of both Party and Immigration
Party P(Republican) = 0.52 (226/435 Republicans 209/435 Democrats)
Actual Counts Actual Counts Immigration StarWars Yes No Yes No Republican 17 209 219 7 Democrat 160 49 24 185
ImmigrationP(Yes| Rep) = 0.075P(Yes|Dem) = 0.765
17/226
Consider StarWars Is StarWars independent of Party and Immigration?
Actual Counts Immigration Yes NoRepublicanDemocrat Yes No Yes No StarWars
14 3 205 4 8 152 16 33
Predicted Counts Immigration Yes NoRepublicanDemocrat Yes No Yes No StarWars
9.5 7.5 117 92 89 71 27 22
different – not independent
P(Rep & Imm=Y)P(SW=Y)435
Party
Immigration StarWars
Further tests might indicate
i.e., Immigration and StarWars are independent conditioned on Party
Where does knowledge of conditional independence come from?
b) “First principles”
For example, suppose that the grounds keeper sets sprinkler timersto a fixed schedule that depends on the season (Summer, Winter,Spring, Fall), and suppose that the probability that it rains or not is dependent on season. We might write:
This model might differ from one in which a homeowner manuallyturns on a sprinkler
Season
Rains Sprinkler
Season
Rains Sprinkler
Consider the following:
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
H: Hardware problems (h) or not (~h)B: Bugs in code (b) or not (~b)E: Editor running (e) or not (~e)L: Lisp interpreter running (l) or not (~l)F: Cursor flashing (f) or not (~f)D: prompt displayed (d) or not (~d)
Consider the following:
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
P(h,~b,e,~l,f,d) ?
Consider the following:
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
P(h,~b,e,~l,f,d)
= P(h)P(~b|h)P(e|h,~b)P(~l|h,~b,e)P(f|h,~b,e,~l)P(d|h,~b,e,~l,f)
= P(h)P(~b)P(e|h)P(~l|h,~b)P(f|e)P(d|~l)
always true
true given BN above
Note that H,B,E,L,F,D order consistent with BN, but so are others such as B,H,L,D,E,F (a node comes after all its ancestors)
1-P(b) 1-P(l|h,~b)
Consider the following:
P(~l,~b) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l,~b) = P(~l, ~b, h) + P(~l, ~b, ~h) = P(~l | ~b, h)P(~b)P(h) + P(~l | ~b, ~h)P(~b)P(~h) = (1-P(l|~b,h)(1-P(b))P(h) + (1-P(l|~b,~h)(1-P(b))(1-P(h))
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(f, d)?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(f, d) = P(f, d, e, h, l, b) + … + P(f, d, ~e, ~h, ~l, ~b) = …
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l|h,~b) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l|h,~b)? 1 – P(l|h,~b) (essentially table lookup)
Causal inference
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l|h) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l|h)? = P(~l,h)/P(h) = [P(~l,h,b) + P(~l,h,~b)] / P(h) = [P(~l|h,b)P(h,b) + P(~l|h,~b)P(h,~b)] / P(h) = [P(~l|h,b)P(h)P(b) + P(~l|h,~b)P(h)P(~b)] / P(h) = P(~l|h,b)P(b) + P(~l|h,~b)P(~b) = (1-P(l|h,b))P(b) + (1-P(l|h,~b))(1-P(b))
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(~l)? = P(~l,h,b) + P(~l,h,~b) + P(~l,~h,b) + P(~l,~h,~b) = P(~l|h,b)P(h)P(b) + P(~l|h,~b)P(h)P(~b) + P(~l|~h,b)P(~h)P(b) + P(~l|~h,~b)P(~h)P(~b) = (1-P(l|h,b))P(h)P(b) + (1-P(l|h,~b)P(h)(1-P(b)) + ….
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(d|h) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(d|h)? = P(d,h)/P(h) = [P(d,l,b,h) + P(d,l,~b,h) + P(d,~l,b,h) + P(d,~l,~b,h)] / P(h) = P(d|l)P(l|h,b)P(b) + P(d|l)P(l|h,~b)P(~b) + P(d|~l)P(~l|h,b)P(b) + P(d|~l)P(~l|h,~b)P(~b) = ….
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(h|l) ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(h|l)? = P(h,l)/P(l) = [P(l,b,h) + P(l,~b,h)] P(l,b,h) + P(l,~b,h) + P(l,b,~h) + P(l,~b,~h)
= …. simplify
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following:
P(f | l) = ?
P(f | l, h) = ?
P(f | b) = ?
P(f | b, l) = ?
H B
E L
F D
P(b)P(h)
P(e|h)P(e|~h)
P(l|h,b) P(l|~h,b)P(l|h,~b) P(l|~h,~b)
P(d|l)P(d|~l)
P(f|e)P(f|~e)
Consider the following belief network: A B
C D E
GEach variable is a binary-valued variable (e.g., A has values a and ~a, B has values
and ~b). Probability tables are stored at each node. For each of the following, involve the minimal number of variables necessary.
Express P(a, d, ~b, e, c) only in terms of the probabilities found probability tables of the network
F
A B
C D E
GExpress P(f | b) only in terms of probabilities found in the probability tables of the network.
Express P(a | ~e) only in terms of probabilities found in the probability tables of the network
F