probability theory review - github pages
TRANSCRIPT
Belief representation: how do we represent our belief (hypothesis) of where the robot is located?Continuous map with single hypothesis probability distribution
Continuous map with multiple hypotheses probability distribution
Discretized map with multiple hypotheses probability distribution
Discretized topological map with with multiple hypotheses probability distribution
Belief representation
β’ Single-hypothesis belief: The robotβs belief about its position is expressed as a single point on a mapβ’ Advantage: no ambiguity, simplifies planning and decision makingβ’ Disadvantage: does not represent ambiguity/uncertainty
β’ Multi-hypothesis belief: allows the robot to track (possibly infinitely) many possible positions.
In both of the above, the beliefs are represented as probabilities
Discrete Random Variables
β’ X denotes a random variable.
β’ X can take on a finite number of values in {x1, x2, β¦, xn}.
β’ P(X=xi), or P(xi), is the probability that the random variable Xtakes on value xi.
β’ P(xi) is called probability mass function.
β’ E.g.2.0)( =RainingP
Continuous Random Variables
β’ π takes on values in the continuum.
β’ π(π = π₯), or π(π₯), is a probability density function.
β’ E.g.
Γ²=Γb
a
dxxpbaxP )()),((
x
p(x)
Probability Density Function
Since continuous probability functions are defined for an infinite number of points over a continuous interval, the probability at a single point is always 0.
x
p(x)Magnitude of curve could be greater than 1 in some areas. The total area under the curve must add up to 1.
Joint Probability
β’ Notationβ’ π(π = π₯ πππ π = π¦) = π(π₯, π¦)
β’ If X and Y are independent then π(π₯, π¦) = π(π₯) π(π¦)
Conditional Probability
β’ π(π₯ | π¦) is the probability of x given y
π(π₯ | π¦) = / 0,1/ 1
π(π₯, π¦) = π(π₯ | π¦) π(π¦)= π(π¦ | π₯) π(π₯)
β’ If X and Y are independent thenπ(π₯ | π¦) = π(π₯)
An ExampleRoll two dice, observe π₯2and π₯3.We know that there are 36 possible outcomes, all of which are equally likely (assuming the dice are fair).Itβs easy to compute probabilities by simply counting outcomes:β’ Probability π₯2 = 6:
6,1 , 6,2 , 6,3 , 6,4 , 6,5 , (6,6) β π =636
=16
β’ Probability π₯2 = 6 and π₯3 is even:6,2 , 6,4 , (6,6) β π =
336 =
112
β’ Probability π₯2is even:
(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
β π =1836
=12
Letβs apply rules of conditional and joint probabilities:Define events: π΄: x2 is even; π΅: x2 = 6; πΆ: x3is even; π·: x3 = 5From the previous page, we easily compute the following:
π π΄ =12 , π π΅ =
16 , π πΆ =
12 , π π· =
16 .
Letβs look at some combinations of events:
β’ π π΄, π΅ = 2C β π π΄ π π΅ = 2
CΓ23 =
223 β NOT independent
β’ π π΄, πΆ = FGC = π π΄ π πΆ = 2
3Γ23 β independent
β’ π π΅|π΄ = /(H,I)/(H) =
JKJL= 2
G
This agrees with our intuition, since x2 = 6 in one third of the cases of x2being even:(2,1), (2,2), (2,3), (2,4), (2,5), (2,6)(4,1), (4,2), (4,3), (4,4), (4,5), (4,6)(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
Law of Total Probability
Γ₯=y
yxPxP ),()(
Γ₯=y
yPyxPxP )()|()(
Γ₯ =x
xP 1)(
Discrete case
Γ² =1)( dxxp
Continuous case
Γ²= dyypyxpxp )()|()(
Γ²= dyyxpxp ),()(
Bayes TheoremWe know that conjunction is commutative:
π π΄, π΅ = π π΅, π΄
Using the definition of conditional probability:
π π΅ π΄ π π΄ = π π΅, π΄ = π π΄, π΅ = π π΄ π΅ π π΅
π π΅ π΄ π π΄ = π π΄ π΅ π(π΅)
π π΅ π΄ =π π΄ π΅ π(π΅)
π(π΄)
Bayes TheoremWe know that conjunction is commutative:
π π΄, π΅ = π π΅, π΄
Using the definition of conditional probability:
π π΅ π΄ π π΄ = π π΅, π΄ = π π΄, π΅ = π π΄ π΅ π π΅
π π΅ π΄ π π΄ = π π΄ π΅ π(π΅)
π π΅ π΄ =π π΄ π΅ π(π΅)
π(π΄)
Example
We roll one die, and an observer tells us things about the outcome. We want to know if π = 4.
β’ Before we know anything, we believe π π = 4 = 2C. PRIOR
β’ Now, suppose the observer tells us that πis even. EVIDENCE
π π = 4 π even) = /(PQR,P STSU)/(P STSU)
=JKJL= 2
GBAYES
β’ We could also use Bayes to infer π π = even π = 4):
π π even π = 4) =π(π = 4, π even)
π(π = 4)=1616= 1
)( yxp
evidenceprior likelihood
)()()|()( Γ==
yPxPxyPyxP
Posterior (conditional) probability distribution
)( xyp
)(xp Prior probability distribution
Likelihood, model of the characteristics of the event
)(yp
Γ
Γ
Γ
Γ
Evidence, does not depend on x
Bayes Rule
π is robot pose and π is sensor data
About likelihoodsβ¦
Why do we call the conditional probability π(π¦|π₯) a likelihood, but we call π(π₯|π¦) the posterior??
We define the likelihood β(π₯) to be a function of π₯, not a function of π¦ :
β π₯ = π π¦ π₯
Note: β π₯ is not a probability. In particular,
Y0
β π₯ β 1
Normalization Coefficient
π π₯ π§ =π π§ π₯ π(π₯)
π(π§)
Note that the denominator is independent of π₯, and as a result will typically be the same for any value of π₯in the posterior π π₯ π§ .
Therefore, we typically represent the normalization term by the coefficient π = [π π§ ]^2 and Bayes equation is written as
π π₯ π§ = ππ π§ π₯ π(π₯)
Simple Example of State Estimation
β’ Suppose a robot obtains measurement π§ (e.g., distance sensor reports an obstacle 40cm in front of the robot)β’ What is π(ππππ|π§)?
Causal vs. Diagnostic Reasoning
β’ π(ππππ|π§) is diagnostic.β’ π(π§|ππππ) is causal.β’ Often causal knowledge is easier to obtain.β’ Bayes rule allows us to use causal knowledge:
)()()|()|( zP
openPopenzPzopenP =
Comes from sensor model.
Use law of total probability: π π§ = β1π π§ π¦ π(π¦)
Example
} P(z|open) = 0.6 P(z|Β¬open) = 0.3} P(open) = P(Β¬open) = 0.5
67.032
5.03.05.06.05.06.0)|(
)()|()()|()()|()|(
==Γ+Γ
Γ=
¬¬+=
zopenP
openpopenzPopenpopenzPopenPopenzPzopenP
π§ raises the probability that the door is open.
)()()|()|( zP
openPopenzPzopenP =
Lets try the measurement againβ¦
π π₯ π§ = π π π§ π₯ π(π₯)
π ππππ π§2 = π π π§2 ππππ π(ππππ)
π ππππ π§2 = π 0.6 β 0.5 = π 0.3Given information:π(π§2|ππππ) = 0.6π(π§2|ππππ ππ) = 0.3π(ππππ) = 0.5π(ππππ ππ) = 0.5
Unlike before, we donβt yet have the answer because we still have the unknown term π that indicates that we need to normalize to get the true probability.
π ππππ ππ π§2 = π π π§2 ππππ ππ π(ππππ ππ)
π ππππ ππ π§2 = π 0.3 β 0.5 = π 0.15
π = 0.3 + 0.15 ^2 = 2.22
π· ππππ ππ = π. ππ
Combining Evidence
β’ Suppose our robot obtains another observation z2. e.g. we made a second sensor reading with the same sensor, and it reports an obstacle 35cm away
β’ How can we integrate this new information?
β’ More generally, how can we estimateP(x| z1...zn )?
Generalizing the Conditionwith Bayes Theorem
π π₯ π§, π΄ππ¦π‘βπππ =π π§ π₯, π΄ππ¦π‘βπππ)π(π₯|π΄ππ¦π‘βπππ)
π(π§|π΄ππ¦π‘βπππ)
π π₯ π§ =π π§ π₯ π(π₯)
π(π§)
In fact, we can add any arbitrary context variables on the right side of the conditioning bar, so long as we apply them in every term.
The usual version of Bayes is conditioned on a single event:
Multiple Measurements
π π₯ π§, π΄ππ¦π‘βπππ =π π§ π₯, π΄ππ¦π‘βπππ)π(π₯|π΄ππ¦π‘βπππ)
π(π§|π΄ππ¦π‘βπππ)
π π₯ π§3, π΄ππ¦π‘βπππ =π π§3 π₯, π΄ππ¦π‘βπππ)π(π₯|π΄ππ¦π‘βπππ)
π(π§3|π΄ππ¦π‘βπππ)
π π₯ π§3, π§2 =π π§3 π₯, π§2)π(π₯|π§2)
π(π§3|π§2)
At time π‘ = 2, everything earlier is merely context information.
Multiple Measurements (cont)
π π₯ π§3, π§2 =π π§3 π₯, π§2)π(π₯|π§2)
π(π§3|π§2)
At time π‘ = 2β’ π(π₯|π§2) is the priorβ¦ what we believe about the state π₯, based on
history of measurements before π‘ = 2β’ π π₯ π§3, π§2 is the posteriorβ¦ what we believe about the state π₯,
based on history of measurements, including π‘ = 2β’ π π§3 π₯, π§2) β¦ If we really know the state π₯, then what we measured
at time π‘ = 1 wonβt affect what we expect to measure at time π‘ = 2