l2 probabilistic reasoning
Post on 14-Apr-2018
222 Views
Preview:
TRANSCRIPT
-
7/29/2019 L2 Probabilistic Reasoning
1/39
-
7/29/2019 L2 Probabilistic Reasoning
2/39
The process of probabilistic inference
1. define model of problem
2. derive posterior distributions and estimators
3. estimate parameters from data
4. evaluate model accuracy
-
7/29/2019 L2 Probabilistic Reasoning
3/39
Axioms of probability
Axioms (Kolmogorov):0 P(A) 1
P(true) = 1
P(false) = 0
P(A or B) = P(A) + P(B) P(A and B)
Corollaries:- A single random variable must sum to 1:
- The joint probability of a set of variables must also sum to 1.- If A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
n
i=1
P(D = di) = 1
-
7/29/2019 L2 Probabilistic Reasoning
4/39
Rules of probability
conditional probability
corollary (Bayes rule)
P r(A|B) = P r(A and B)P r(B)
, P r(B) > 0
Pr(B|A)Pr(A) = Pr(A andB) = Pr(A|B)Pr(B)
Pr(B|A) =Pr(A|B)Pr(B)
Pr(A)
-
7/29/2019 L2 Probabilistic Reasoning
5/39
-
7/29/2019 L2 Probabilistic Reasoning
6/39
Basic concepts
Making rational decisions when faced with uncertainty:
Probabilitythe precise representation of knowledge and uncertainty
Probability theoryhow to optimally update your knowledge based on new information
Decision theory: probability theory + utility theoryhow to use this information to achieve maximum expected utility
Basic probability theory
random variables probability distributions (discrete) and probability densities (continuous) rules of probability expectation and the computation of 1st and 2nd moments joint and multivariate probability distributions and densities covariance and principal components
-
7/29/2019 L2 Probabilistic Reasoning
7/39
The Joint Distribution
Recipe for making a joint distributionof M variables:
1. Make a truth table listing allcombinations of values of your
variables (if there are M Booleanvariables then the table will have2M rows).
2. For each combination of values,say how probable it is.
3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.
Example: Booleanvariables A, B, C
0.10111
0.25011
0.10101
0.05001
0.05110
0.10010
0.05100
0.30000
ProbCBA
A
B
C0.050.25
0.10 0.05 0.05
0.10
0.100.30
All the nice looking slides like this one from are from Andrew Moore, CMU
-
7/29/2019 L2 Probabilistic Reasoning
8/39
The Joint Distribution
Recipe for making a joint distributionof M variables:
1. Make a truth table listing allcombinations of values of your
variables (if there are M Booleanvariables then the table will have2M rows).
2. For each combination of values,say how probable it is.
3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.
Example: Booleanvariables A, B, C
0.10111
0.25011
0.10101
0.05001
0.05110
0.10010
0.05100
0.30000
ProbCBA
A
B
C0.050.25
0.10 0.05 0.05
0.10
0.100.30
-
7/29/2019 L2 Probabilistic Reasoning
9/39
The Joint Distribution
Recipe for making a joint distributionof M variables:
1. Make a truth table listing allcombinations of values of your
variables (if there are M Booleanvariables then the table will have2M rows).
2. For each combination of values,say how probable it is.
3. If you subscribe to the axioms ofprobability, those numbers mustsum to 1.
Example: Booleanvariables A, B, C
0.10111
0.25011
0.10101
0.05001
0.05110
0.10010
0.05100
0.30000
ProbCBA
A
B
C0.050.25
0.10 0.05 0.05
0.10
0.100.30
-
7/29/2019 L2 Probabilistic Reasoning
10/39
Using theJoint
One you have the JD you canask for the probability of anylogical expression involvingyour attribute
!"E
PEP
matchingrows
)row()(
-
7/29/2019 L2 Probabilistic Reasoning
11/39
Using theJoint
One you have the JD you canask for the probability of anylogical expression involvingyour attribute
!"E
PEP
matchingrows
)row()(
P(Poor Male) = 0.4654
-
7/29/2019 L2 Probabilistic Reasoning
12/39
Using theJoint
One you have the JD you canask for the probability of anylogical expression involvingyour attribute
!"E
PEP
matchingrows
)row()(
P(Poor) = 0.7604
-
7/29/2019 L2 Probabilistic Reasoning
13/39
Using theJoint
One you have the JD you canask for the probability of anylogical expression involvingyour attribute
!"E
PEP
matchingrows
)row()(
Inference
with theJoint
!
!"
#"
2
21
matchingrows
andmatchingrows
2
2121
)row(
)row(
)(
)()|(
E
EE
P
P
EP
EEPEEP
P(Male | Poor) = 0.4654 / 0.7604 = 0.612
-
7/29/2019 L2 Probabilistic Reasoning
14/39
Continuous probability distributions
probability density function (pdf)
joint probability density
marginal probability calculating probabilities using the pdf Bayes rule
-
7/29/2019 L2 Probabilistic Reasoning
15/39
A PDF of American Ages in 2000
more of Andrews nice slides
-
7/29/2019 L2 Probabilistic Reasoning
16/39
A PDF of American Ages in 2000Let X be a continuous randomvariable.
If p(x) is a Probability DensityFunction for X then
! " #$
$%&
b
ax
dxxpbXaP )(
! " #$$%&
50
30ageage)age(50Age30 dpP
= 0.36
-
7/29/2019 L2 Probabilistic Reasoning
17/39
What does p(x) mean?
It does not mean a probability!
First of all, its not a value between 0 and 1.
Its just a value, and an arbitrary one at that. The likelihood of p(a) can only be compared relativelyto other values p(b) It indicates the relative probability of the integrated density over a small delta:
If
then
!
bp
ap"
)(
)(
!
hbXhbP
haXhaP
h
"
#$$%
#$$%
& )(
)(lim
0
-
7/29/2019 L2 Probabilistic Reasoning
18/39
ExpectationsE[X] = the expected value ofrandom variable X
= the average value wed seeif we took a very large number
of random samples of X
!"
#"$
$x
dxxpx )(
-
7/29/2019 L2 Probabilistic Reasoning
19/39
ExpectationsE[X] = the expected value ofrandom variable X
= the average value wed seeif we took a very large number
of random samples of X
!"
#"$
$x
dxxpx )(
= the first moment of theshape formed by the axes and
the blue curve
= the best value to choose ifyou must guess an unknownpersons age and youll befined the square of your error
E[age]=35.897
-
7/29/2019 L2 Probabilistic Reasoning
20/39
Expectation of a function
!=E[f(X)] = the expectedvalue of f(x) where x is drawnfrom Xs distribution.
= the average value wed seeif we took a very large numberof random samples of f(X)
"#
$#%
%x
dxxpxf )()(!
Note that in general:
])[()]([ XEfxfE &
64.1786]age[ 2 %E
62.1288])age[( 2 %E
-
7/29/2019 L2 Probabilistic Reasoning
21/39
Variance
'2
= Var[X] = theexpected squareddifference betweenx and E[X] "
#
$#%
$%x
dxxpx )()( 22 !'
= amount youd expect to lose
if you must guess an unknownpersons age and youll befined the square of your error,and assuming you playoptimally
02.498]age[Var %
-
7/29/2019 L2 Probabilistic Reasoning
22/39
Standard Deviation
!2
= Var[X] = theexpected squareddifference betweenx and E[X] "
#
$#%
$%x
dxxpx )()( 22 &!
= amount youd expect to lose
if you must guess an unknownpersons age and youll befined the square of your error,and assuming you playoptimally
! = Standard Deviation =typical deviation of X fromits mean
02.498]age[Var %
][Var X%!
32.22%!
-
7/29/2019 L2 Probabilistic Reasoning
23/39
Simple example: medical test results
Test report for rare disease is positive, 90% accurate Whats the probability that you have the disease?
What if the test is repeated?
This is the simplest example of reasoning by combining sources of information.
-
7/29/2019 L2 Probabilistic Reasoning
24/39
How do we model the problem?
Which is the correct description of Test is 90% accurate ?
What do we want to know?
More compact notation:
P(T = true|D = true) P(T|D)
P(T = false|D = false) P(T|D)
P(T = true) = 0.9P(T = true|D = true) = 0.9
P(D = true|T = true) = 0.9
P(T = true)
P(T = true|D = true)
P(D = true|T = true)
-
7/29/2019 L2 Probabilistic Reasoning
25/39
Evaluating the posterior probability through Bayesian inference
We want P(D|T) = The probability of the having the disease given a positive test Use Bayes rule to relate it to what we know: P(T|D)
Whats the prior P(D)? Disease is rare, so lets assume
What about P(T)?
Whats the interpretation of that?
P(D|T) =P(T|D)P(D)
P(T)posterior
likelihood prior
normalizing
constant
P(D) = 0.001
-
7/29/2019 L2 Probabilistic Reasoning
26/39
P(T) is the marginal probability of P(T,D) = P(T|D) P(D) So, compute with summation
For true or false propositions:
Evaluating the normalizing constant
P(D|T) =P(T|D)P(D)
P(T)posterior
likelihood prior
normalizing
constant
P(T) =
all values of D
P(T|D)P(D)
P(T) = P(T|D)P(D) + P(T|D)P(D)
What are
these?
-
7/29/2019 L2 Probabilistic Reasoning
27/39
Refining our model of the test
We also have to consider the negative case to incorporate all information:
What should it be?
What about ?
P(T|D) = 0.9P(T|D) = ?
P(D)
-
7/29/2019 L2 Probabilistic Reasoning
28/39
-
7/29/2019 L2 Probabilistic Reasoning
29/39
Same problem different situation
Suppose we have a test to determine if you won the lottery. Its 90% accurate.
What is P($ = true | T = true) then?
-
7/29/2019 L2 Probabilistic Reasoning
30/39
Playing around with the numbers
What if the test were 100% reliable?
What if the test was the same, but disease wasnt so rare?
P(D|T) = 1.0
0.001
1.0 0.001 + 0.0 0.999= 1.0
P(D|T) =P(T|D)P(D)
P(T|D)P(D) + P(T|D)P(D)
P(D|T) =. .
0.9 0.1 + 0.1 0.9= 0.5
-
7/29/2019 L2 Probabilistic Reasoning
31/39
Repeating the test
We can relax, P(D|T) = 0.0089, right? Just to be sure the doctor recommends repeating the test.
How do we represent this?
Again, we apply Bayes rule
How do we model P(T1,T2|D)?
P(D|T1, T2)
P(D|T1, T2) =P(T1, T2|D)P(D)
P(T1, T2)
-
7/29/2019 L2 Probabilistic Reasoning
32/39
Modeling repeated tests
Easiest is to assume the tests are independent.
This also implies:
Plugging these in, we have
P(T1, T2|D) = P(T1|D)P(T2|D)
P(D|T1, T2) =P(T1, T2|D)P(D)
P(T1, T2)
P(T1, T2) = P(T1)P(T2)
P(D|T1, T2) =P T1 D P T2 D P D
P(T1)P(T2)
-
7/29/2019 L2 Probabilistic Reasoning
33/39
Evaluating the normalizing constant again
Expanding as before we have
Plugging in the numbers gives us
Another way to think about this:- Whats the chance of 1 false positive from the test?
- Whats the chance of 2 false positives?
The chance of 2 false positives is still 10x more likely than the a prior probability ofhaving the disease.
P(D|T1, T2) = P(T1|D)P(T2|D)P(D)D={t,f} P(T1|D)P(T2|D)P(D)
P(D|T) = 0.9
0.9
0.001
0.9 0.9 0.001 + 0.1 0.1 0.999= 0.075
-
7/29/2019 L2 Probabilistic Reasoning
34/39
Simpler: Combining information the Bayesian way
Lets look at the equation again:
If we rearrange slightly:
Its the posterior for the first test, which we just computed
P(D|T1, T2) =P(T1|D)P(T2|D)P(D)
P(T1)P(T2)
P(D|T1, T2) =P(T2|D)P(T1|D)P(D)
P(T2)P(T1)
Weve seen this
before!
P(D
|T1) =
P(T1|D)P(D)
P(T1)
-
7/29/2019 L2 Probabilistic Reasoning
35/39
The old posterior is the new prior
We can just plugin the value of the old posterior It plays exactly the same role as our old prior
Plugging in the numbers gives the same answer:
P(D|T1, T2) =P T2 D P T1 D P D
P(T2)P(T1)
P(D|T1, T2) =P(T2|D) 0.0089
P(T2)
P(D|T) =P(T|D)P(D)
P(T|D)P(D) + P(T|D)P(D)
P(D|T) = 0.9 0.0089
0.9 0.0089 + 0.1 0.9911= 0.075
This is how Bayesian
reasoning combines old
information with new
information to update
our belief states.
-
7/29/2019 L2 Probabilistic Reasoning
36/39
Example 1.2 (Hamburgers). Consider the following fictitious scientific information: Doctors find thatpeople with Kreuzfeld-Jacob disease (KJ) almost invariably ate hamburgers, thus p(Hamburger Eater|KJ ) =0.9. The probability of an individual having KJ is currently rather low, about one in 100,000.
1. Assuming eating lots of hamburgers is rather widespread, say p(Hamburger Eater) = 0.5, what is theprobability that a hamburger eater will have Kreuzfeld-Jacob disease?
This may be computed as
p(KJ |Hamburger Eater) =p(Hamburger Eater, KJ )
p(Hamburger Eater)=
p(Hamburger Eater|KJ )p(KJ )
p(Hamburger Eater)
(1.2.1)
=9
10
1
100000
1
2
= 1.8 105 (1.2.2)
2. If the fraction of people eating hamburgers was rather small, p(Hamburger Eater) = 0.001, what is theprobability that a regular hamburger eater will have Kreuzfeld-Jacob disease? Repeating the abovecalculation, this is given by
9
10
1
100000
1
1000
1/100 (1.2.3)
This is much higher than in scenario (1) since here we can be more sure that eating hamburgers isrelated to the illness.
-
7/29/2019 L2 Probabilistic Reasoning
37/39
Example 1.3 (Inspector Clouseau). Inspector Clouseau arrives at the scene of a crime. The victim lies deadin the room alongside the possible murder weapon, a knife. The Butler (B) and Maid (M) are the inspectorsmain suspects and the inspector has a prior belief of 0.6 that the Butler is the murderer, and a prior beliefof 0.2 that the Maid is the murderer. These beliefs are independent in the sense that p(B, M) = p(B)p(M).(It is possible that both the Butler and the Maid murdered the victim or neither). The inspectors priorcriminal knowledge can be formulated mathematically as follows:
dom(B) = dom(M) = {murderer, not murderer} , dom(K) = {knife used, knife not used} (1.2.4)
p(B = murderer) = 0.6, p(M = murderer) = 0.2 (1.2.5)
p(knife used|B = not murderer, M = not murderer) = 0.3p(knife used|B = not murderer, M = murderer) = 0.2p(knife used|B = murderer, M = not murderer) = 0.6p(knife used|B = murderer, M = murderer) = 0.1
(1.2.6)
In addition p(K , B , M ) = p(K|B, M)p(B)p(M). Assuming that the knife is the murder weapon, what isthe probability that the Butler is the murderer? (Remember that it might be that neither is the murderer).Using b for the two states of B and m for the two states of M,
p(B|K) =X
m
p(B, m|K) =X
m
p(B,m,K)
p(K)=
Pm p(K|B, m)p(B, m)Pm,b p(K|b, m)p(b, m)
=p(B)P
m p(K|B, m)p(m)Pb p(b)P
m p(K|b, m)p(m)(1.2.7)
-
7/29/2019 L2 Probabilistic Reasoning
38/39
Example 1.5 (Aristotle : Resolution). We can represent the statement All apples are fruit by p(F = tr|A =tr) = 1. Similarly, All fruits grow on trees may be represented by p(T = tr|F = tr) = 1. Additionallywe assume that whether or not something grows on a tree depends only on whether or not it is a fruit,p(T|A,F) = P(T|F). From this we can compute
p(T = tr|A = tr) = XFp(T = tr|F, A = tr)p(F|A = tr) = X
Fp(T = tr|F)p(F|A = tr)
= p(T = tr|F = fa)p(F = fa|A = tr)| {z }=0
+p(T = tr|F = tr)| {z }=1
p(F = tr|A = tr)| {z }=1
= 1 (1.2.16)
In other words we have deduced that All apples grow on trees is a true statement, based on the informationpresented. (This kind of reasoning is called resolution and is a form of transitivity : from the statementsA) F and F) T we can infer A) T).
-
7/29/2019 L2 Probabilistic Reasoning
39/39
Next time
Bayesian belief networks
top related