1 probability fol fails for a domain due to: laziness: too much to list the complete set of rules,...
DESCRIPTION
3 Probability Prior probability: probability in the absence of any other information P(Dice = 2) = 1/6 random variable: Dice domain = probability distribution: P(Dice) =TRANSCRIPT
1
Probability• FOL fails for a domain due to:
– Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result
– Theoretical ignorance: there is no complete theory for the domain
– Practical ignorance: have not or cannot run all necessary tests
2
Probability• Probability = a degree of belief• Probability comes from:
– Frequentist: experiments and statistical assessment
– Objectivist: real aspects of the universe– Subjectivist: a way of characterizing an agent’s
beliefs• Decision theory = probability + utility theory
3
Probability• Prior probability: probability in the
absence of any other information
P(Dice = 2) = 1/6
random variable: Dicedomain = <1, 2, 3, 4, 5, 6>probability distribution: P(Dice) = <1/6, 1/6, 1/6, 1/6, 1/6,
1/6>
4
Probability• Conditional probability: probability in
the presence of some evidence
P(Dice = 2 | Dice is even) = 1/3P(Dice = 2 | Dice is odd) = 0
• P(A | B) = P(A B)/P(B)P(A B) = P(A | B).P(B)
5
Probability• Example:
S = stiff neckM = meningitisP(S | M) = 0.5P(M) = 1/50000P(S) = 1/20
P(M | S) = P(S | M).P(M)/P(S) = 1/5000
6
Probability• Joint probability distributions:
X: <x1, …, xm> Y: <y1, …, yn>
P(X = xi, Y = yj)
7
Probability• Axioms:
0 P(A) 1P(true) = 1 and P(false) = 0P(A B) = P(A) + P(B) P(A B)
8
Probability• Derived properties:
P(A) = 1 P(A)
P(U) = P(A1) + P(A2) + ... + P(An)U = A1 A2 ... An collectively exhaustiveAi Aj = false mutually exclusive
9
Probability• Bayes’ theorem:
P(Hi | E) = P(HiE)/P(E) = P(E | Hi).P(Hi)/P(E | Hi).P(Hi)
Hi’s are collectively exhaustive & mutually exclusive
10
Probability• Problem: a full joint probability
distribution P(X1, X2, ..., Xn) is sufficient for computing any probability on Xi's, but the number of joint probabilities is exponential.
11
Probability• Independence:
P(A B) = P(A).P(B) P(A) = P(A | B)
• Conditional independence:P(A B | E) = P(A | E).P(B | E)
P(A | E) = P(A | E B)
12
Probability• Example:
P(Toothache | Cavity Catch) = P(Toothache | Cavity)
P(Catch | Cavity Toothache) = P(Catch | Cavity)
13
Probability"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."
14
Probability"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."Q1: If earthquake happens, how likely will John make a call?
15
Probability"In John's and Mary's house, an alarm is installed to sound in case of burglary or earthquake. When the alarm sounds, John and Mary may make a call for help or rescue."Q1: If earthquake happens, how likely will John make a call?
Q2: If the alarm sounds, how likely is the house burglarized?
16
Bayesian Networks• Pearl, J. (1982). Reverend Bayes on
Inference Engines: A Distributed Hierarchical Approach, presented at the Second National Conference on Artificial Intelligence (AAAI-82), Pittsburgh, Pennsylvania
• 2000 AAAI Classic Paper Award
17
Bayesian Networks
Burglary Earthquake
Alarm
JohnCalls MaryCalls
P(B).001
P(E).002
B E P(A)T TT FF TF F
.95
.94
.29
.001
A P(J)TF
.90
.05
A P(M)TF
.70
.01
18
Bayesian Networks• Syntax:
– A set of random variables making up the nodes– A set of directed links connecting pairs of nodes– Each node has a conditional probability table that
quantifies the effects of its parent nodes– The graph has no directed cycles
A
B C
D
A
B C
D
yes no
19
Bayesian Networks• Semantics:
Ordering the nodes: Xi is a predecessor of Xj i < j
P(X1, X2, …, Xn) = P(Xn | Xn-1, …, X1).P(Xn-1 | Xn-2, …, X1). … .P(X2 | X1).P(X1)= iP(Xi | Xi-1, …, X1) = iP(Xi | Parents(Xi))
P (Xi | Xi-1, …, X1) = P(Xi | Parents(Xi)) Parents(Xi) {Xi-1, …, X1}
Each node is conditionally independent of its predecessors given its parents
20
Bayesian Networks• Example:
P(J M A B E) = P(J | A).P(M | A).P(A | B E).P(B).P(E)
21
Bayesian Networks• P(Query | Evidence) = ?
– Diagnostic (from effects to causes): P(B | J)– Causal (from causes to effects): P(J | B)– Intercausal (between causes of a common
effect): P(B | A, E)– Mixed: P(A | J, E), P(B | J, E)
22
P(B | A) = P(B A)/P(A)= P(B A)
Bayesian Networks
23
Bayesian Networks• Why Bayesian Networks?
24
General Conditional Independence
Y1 Yn
U1 Um
Z1j Znj
X
A node (X) is conditionally independent of its non-descendents (Zij's), given its parents (Ui's)
25
General Conditional Independence
Y1 Yn
U1 Um
Z1j Znj
X
A node (X) is conditionally independent of all other nodes, given its parents (Ui's), children (Yi's), and children's parents (Zij's)
26
General Conditional Independence
X and Y are conditionally independent given E
Z
Z
Z
X YE
27
General Conditional Independence
Example:Battery
GasIgnitionRadio
Starts
Moves
28
General Conditional Independence
Example: Battery
GasIgnitionRadio
Starts
Moves
Gas - (no evidence at all) - RadioGas Radio | StartsGas Radio | Moves
29
Network Construction• Wrong ordering:
– Produces a more complicated net– Requires more probabilities to be specified– Produces slight relationships that require
difficult and unnatural probability judgments – Fails to represent all the conditional
independence relationships
a lot of unnecessary probabilities
30
Network Construction• Models:
– Diagnostic: symptoms to causes– Causal: causes to symptoms
Add "root causes" first, then the variables they influence
fewer and more comprehensible probabilities
31
Dempster-Shafer Theory• Example: disease diagnostics
D = {Allergy, Flu, Cold, Pneumonia}
32
Dempster-Shafer Theory• Example: disease diagnostics
D = {Allergy, Flu, Cold, Pneumonia}
• Basic probability assignment:m: 2D [0, 1]
m1: {Flu, Cold, Pneu} 0.6 D 0.4
33
Dempster-Shafer Theory• Belief:
Bel: 2D [0, 1]Bel(S) = XS m(X)
34
Dempster-Shafer Theory• Belief:
Bel: 2D [0, 1]Bel(S) = XS m(X)
• Plausibility:Pl(p) = 1 Bel(p)p: [Bel(p), Pl(p)]
35
Dempster-Shafer Theory• Ignorance:
Bel(p) + Bel(p) 1
36
Dempster-Shafer Theory• Combination rules: m1 and m2
XY=Zm1(X).m2(Y) m(Z) =
1 XY=m1(X).m2(Y)
37
Exercises• In Russell’s AIMA: Chapters 13-14.