cs 188: artificial intelligence spring 2007 lecture 14: bayes nets iii 3/1/2007 srini narayanan –...
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/1.jpg)
CS 188: Artificial IntelligenceSpring 2007
Lecture 14: Bayes Nets III
3/1/2007
Srini Narayanan – ICSI and UC Berkeley
![Page 2: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/2.jpg)
Announcements
Office Hours this week will be on Friday (11-1).
Assignment 2 grading Midterm 3/13
Review 3/8 (next Thursday) Midterm review materials up over the
weekend. Extended office hours next week
(Thursday 11-1, Friday 2:30-4:30)
![Page 3: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/3.jpg)
Representing Knowledge
![Page 4: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/4.jpg)
Inference
Inference: calculating some statistic from a joint probability distribution
Examples: Posterior probability:
Most likely explanation:
R
T
B
D
L
T’
![Page 5: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/5.jpg)
Inference in Graphical Models
Queries
Value of information What evidence should I seek next
Sensitivity Analysis What probability values are most critical
Explanation e.g., Why do I need a new starter motor
Prediction e.g., What would happen if my fuel pump stops
working
![Page 6: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/6.jpg)
Inference Techniques
Exact Inference Inference by enumeration Variable elimination
Approximate Inference/ Monte Carlo Prior Sampling Rejection Sampling Likelihood weighting Monte Carlo Markov Chain (MCMC)
![Page 7: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/7.jpg)
Reminder: Alarm Network
![Page 8: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/8.jpg)
Inference by Enumeration
Given unlimited time, inference in BNs is easy Recipe:
State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them
Example:
![Page 9: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/9.jpg)
Example
Where did we use the BN structure?
We didn’t!
![Page 10: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/10.jpg)
Example
In this simple method, we only need the BN to synthesize the joint entries
![Page 11: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/11.jpg)
Nesting Sums
Atomic inference is extremely slow! Slightly clever way to save work:
Move the sums as far right as possible Example:
![Page 12: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/12.jpg)
Evaluation Tree
View the nested sums as a computation tree:
Still repeated work: calculate P(m | a) P(j | a) twice, etc.
![Page 13: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/13.jpg)
Variable Elimination: Idea
Lots of redundant work in the computation tree
We can save time if we carry out the summation right to left and cache all intermediate results into objects called factors
This is the basic idea behind variable elimination
![Page 14: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/14.jpg)
Basic Objects Track objects called factors Initial factors are local CPTs
During elimination, create new factors Anatomy of a factor:
Variables introduced
Variables summed out
Factor argument variables
![Page 15: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/15.jpg)
Basic Operations
First basic operation: join factors Combining two factors:
Just like a database join Build a factor over the union of the domains
Example:
![Page 16: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/16.jpg)
Basic Operations
Second basic operation: marginalization Take a factor and sum out a variable
Shrinks a factor to a smaller one A projection operation
Example:
![Page 17: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/17.jpg)
Example
![Page 18: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/18.jpg)
Example
![Page 19: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/19.jpg)
General Variable Elimination
Query:
Start with initial factors: Local CPTs (but instantiated by evidence)
While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Project out H
Join all remaining factors and normalize
![Page 20: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/20.jpg)
Variable Elimination
What you need to know: VE caches intermediate computations Polynomial time for tree-structured graphs! Saves time by marginalizing variables as soon as
possible rather than at the end
Approximations Exact inference is slow, especially when you have a
lot of hidden nodes Approximate methods give you a (close) answer,
faster
![Page 21: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/21.jpg)
Sampling
Basic idea: Draw N samples from a sampling distribution S Compute an approximate posterior probability Show this converges to the true probability P
Outline: Sampling from an empty network Rejection sampling: reject samples disagreeing with evidence Likelihood weighting: use evidence to weight samples
![Page 22: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/22.jpg)
Prior Sampling
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
![Page 23: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/23.jpg)
Prior Sampling
This process generates samples with probability
…i.e. the BN’s joint probability
Let the number of samples of an event be
Then
I.e., the sampling procedure is consistent
![Page 24: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/24.jpg)
Example
We’ll get a bunch of samples from the BN:c, s, r, w
c, s, r, w
c, s, r, w
c, s, r, w
c, s, r, w
If we want to know P(W) We have counts <w:4, w:1> Normalize to get P(W) = <w:0.8, w:0.2> This will get closer to the true distribution with more samples Can estimate anything else, too What about P(C| r)? P(C| r, w)?
Cloudy
Sprinkler Rain
WetGrass
C
S R
W
![Page 25: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/25.jpg)
Rejection Sampling
Let’s say we want P(C) No point keeping all samples around Just tally counts of C outcomes
Let’s say we want P(C| s) Same thing: tally C outcomes, but
ignore (reject) samples which don’t have S=s
This is rejection sampling It is also consistent (correct in the
limit)
c, s, r, wc, s, r, wc, s, r, wc, s, r, wc, s, r, w
Cloudy
Sprinkler Rain
WetGrass
C
S R
W
![Page 26: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/26.jpg)
Likelihood Weighting
Problem with rejection sampling: If evidence is unlikely, you reject a lot of samples You don’t exploit your evidence as you sample Consider P(B|a)
Idea: fix evidence variables and sample the rest
Problem: sample distribution not consistent! Solution: weight by probability of evidence given parents
Burglary Alarm
Burglary Alarm
![Page 27: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/27.jpg)
Likelihood Sampling
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
![Page 28: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/28.jpg)
Likelihood Weighting
Sampling distribution if z sampled and e fixed evidence
Now, samples have weights
Together, weighted sampling distribution is consistent
Cloudy
Rain
C
S R
W
=
![Page 29: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/29.jpg)
Likelihood Weighting
Note that likelihood weighting doesn’t solve all our problems
Rare evidence is taken into account for downstream variables, but not upstream ones
A better solution is Markov-chain Monte Carlo (MCMC), more advanced
We’ll return to sampling for robot localization and tracking in dynamic BNs
Cloudy
Rain
C
S R
W
![Page 30: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/30.jpg)
Summary
Exact inference in graphical models is tractable only for polytrees.
Variable elimination caches intermediate results to make the exact inference process more efficient for a given for a given query.
Approximate inference techniques use a variety of sampling methods and can scale to large models.
NEXT: Adding dynamics and change to graphical models
![Page 31: CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley](https://reader035.vdocuments.net/reader035/viewer/2022062714/56649d625503460f94a45204/html5/thumbnails/31.jpg)
Bayes Net for insurance