learning with bayesian networks markus kalisch eth zürich
TRANSCRIPT
![Page 1: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/1.jpg)
Learning With Bayesian Networks
Markus Kalisch
ETH Zürich
![Page 2: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/2.jpg)
Inference in BNs - Review
• Exact Inference:
– P(b|j,m) = c SumeSumaP(b)P(e)P(a|b,e)P(j|a)P(m|a)
– Deal with sums in a clever way: Variable elimination, message passing
– Singly connected: linear in space/timeMultiply connected: exponential in space/time (worst case)
• Approximate Inference:
– Direct sampling
– Likelihood weighting
– MCMC methods
P(Burglary|JohnCalls=TRUE, MaryCalls=TRUE)
2Markus Kalisch, ETH Zürich
![Page 3: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/3.jpg)
Markus Kalisch, ETH Zürich
Learning BNs - Overview
• Brief summary of Heckerman Tutorial
• Recent provably correct Search Methods:– Greedy Equivalence Search (GES)– PC-algorithm
• Discussion
3
![Page 4: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/4.jpg)
Abstract and Introduction
• Easy handling of missing data
• Easy modeling of causal relationships
• Easy combination of prior information and data
• Easy to avoid overfitting
Graphical Modeling offers:
4Markus Kalisch, ETH Zürich
![Page 5: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/5.jpg)
Bayesian Approach
• Degree of belief
• Rules of probability are a good tool to deal with beliefs
• Probability assessment: Precision & Accuracy
• Running Example: Multinomial Sampling with Dirichlet Prior
5Markus Kalisch, ETH Zürich
![Page 6: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/6.jpg)
Bayesian Networks (BN)
Define a BN by• a network structure• local probability distributions
To learn a BN, we have to• choose the variables of the model• choose the structure of the model• assess local probability distributions
6Markus Kalisch, ETH Zürich
![Page 7: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/7.jpg)
Inference
• Book by Russell / Norvig:
– exact inference
– variable elimination
– approximate methods
• Talk by Prof. Loeliger:
– factor graphs / belief propagation / message passing
• Probabilistic inference in BN is NP-hard: Approximations or special-case-solutions are needed
We have seen up to now:
7Markus Kalisch, ETH Zürich
![Page 8: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/8.jpg)
Learning Parameters (structure given)
• Prof. Loeliger: Trainable parameters can be added to the factor graph and therefore be infered
• Complete Data
– reduce to one-variable case
• Incomplete Data (missing at random)
– formula for posterior grows exponential in number of incomplete cases
– Gibbs-Sampling
– Gaussian Approximation; get MAP by gradient based optimization or EM-algorithm
8Markus Kalisch, ETH Zürich
![Page 9: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/9.jpg)
Learning Parameters AND structure
• Can learn structure only up to likelihood equivalence
• Averaging over all structures is infeasible: Space of DAGs and of equivalence classes grows super-exponentially in the number of nodes.
9Markus Kalisch, ETH Zürich
![Page 10: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/10.jpg)
Model Selection
• Don't average over all structures, but select a good one (Model Selection)
• A good scoring criterion is the log posterior probability:log(P(D,S)) = log(P(S)) + log(P(D|S))Priors: Dirichlet for Parameters / Uniform for structure
• Complete cases: Compute this exactly
• Incomplete cases: Gaussian Approximation and further simplification lead to BIClog(P(D|S)) = log(P(D|ML-Par,S)) – d/2 * log(N)This is usually used in practice.
10Markus Kalisch, ETH Zürich
![Page 11: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/11.jpg)
Search Methods
• Learning BNs on discrete nodes (3 or more parents) is NP-hard (Heckerman 2004)
• There are provably (asymptoticly) correct search methods:– Search and Score methods: Greedy
Equivalence Search (GES; Chickering 2002)– Constrained based methods: PC-algorithm
(Spirtes et. al. 2000)
11Markus Kalisch, ETH Zürich
![Page 12: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/12.jpg)
GES – The Idea
• Restrict the search space to equivalence classes
• Score: BIC“separable search criterion” => fast
• Greedy Search for “best” equivalence class
• In theory (asymptotic): Correct equivalence class is found
12Markus Kalisch, ETH Zürich
![Page 13: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/13.jpg)
GES – The Algorithm
• Initialize with equivalence class E containing the empty DAG
• Stage 1: Repeatedly replace E with the member of E+(E) that has the highest score, until no such replacement increases the score
• Stage 2: Repeatedly replace E with the member of E-(E) that has the highest score, until no such replacement increases the score
GES is a two-stage greedy algorithm
13Markus Kalisch, ETH Zürich
![Page 14: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/14.jpg)
PC – The idea
• Start: Complete, undirected graph
• Recursive conditional independence testsfor deleting edges
• Afterwards: Add arrowheads
• In theory (asymptotic): Correct equivalence class is found
14Markus Kalisch, ETH Zürich
![Page 15: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/15.jpg)
Markus Kalisch, ETH Zürich
PC – The Algorithm
Form complete, undirected graph Gl = -1repeat
l=l+1repeat
select ordered pair of adjacent nodes A,B in Gselect neighborhood N of A with size l (if possible)delete edge A,B in G if A,B are cond. indep. given N
until all ordered pairs have been testeduntil all neighborhoods are of size smaller than l
Add arrowheads by applying a couple of simple rules
15
![Page 16: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/16.jpg)
Markus Kalisch, ETH Zürich
Example
Conditional Independencies:
• l=0: none• l=1:
PC-algorithm: correct skeleton
A
B C
D
A
B C
D
A
B C
D
16
![Page 17: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/17.jpg)
Markus Kalisch, ETH Zürich
Sample Version of PC-algorithm
• Real World: Cond. Indep. Relations not known
• Instead: Use statistical test for Conditional independence
• Theory: Using statistical test instead of true conditional independence relations is often ok
17
![Page 18: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/18.jpg)
Comparing PC and GES
The PC-algorithm• finds less edges• finds true edges with higher reliability• is fast for sparse graphs (e.g. p=100,n=1000,E[N]=3: T = 13 sec)
For p = 10, n = 50, E(N) = 0.9, 50 replicates:
Method ave[TPR] ave[FPR] ave[TDR]PC 0.57 (0.06) 0.02 (0.01) 0.91 (0.05)GES 0.85 (0.05) 0.13 (0.04) 0.71 (0.07)
18Markus Kalisch, ETH Zürich
![Page 19: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/19.jpg)
Learning Causal Relationships
• Causal Markov Condition:Let C be a causal graph for XthenC is also a Bayesian-network structure for the pdf of X
• Use this to infer causal relationships
19Markus Kalisch, ETH Zürich
![Page 20: Learning With Bayesian Networks Markus Kalisch ETH Zürich](https://reader036.vdocuments.net/reader036/viewer/2022082819/56649f2e5503460f94c4836f/html5/thumbnails/20.jpg)
Markus Kalisch, ETH Zürich
Conclusion
• Using a BN: Inference (NP-Hard)– exact inference, variable elimination, message
passing (factor graphs) – approximate methods
• Learn BN:– Parameters:
Exact, Factor GraphsMonte Carlo, Gauss
– Structure: GES, PC-algorithm; NP-Hard
20