the mind is a neural computer, tted by natural selection

268
“The mind is a neural computer, fitted by natural selection with combinatorial algorithms for causal and probabilistic reasoning about plants, animals, objects, and people.” ... “In a universe with any regularities at all, decisions informed about the past are better than decisions made at random. That has always been true, and we would expect organisms, especially informavores such as humans, to have evolved acute intuitions about probability. The founders of probability, like the founders of logic, assumed they were just formalizing common sense.” Steven Pinker, How the Mind Works, 1997, pp. 524, 343. c D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 1 1 / 17

Upload: others

Post on 12-Jul-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The mind is a neural computer, tted by natural selection

“The mind is a neural computer, fitted by naturalselection with combinatorial algorithms for causaland probabilistic reasoning about plants, animals,objects, and people.”

. . .“In a universe with any regularities at all,

decisions informed about the past are better thandecisions made at random. That has always beentrue, and we would expect organisms, especiallyinformavores such as humans, to have evolved acuteintuitions about probability. The founders ofprobability, like the founders of logic, assumed theywere just formalizing common sense.”

Steven Pinker, How the Mind Works, 1997, pp. 524, 343.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 1 1 / 17

Page 2: The mind is a neural computer, tted by natural selection

Learning Objectives

At the end of the class you should be able to:

justify the use and semantics of probability

know how to compute marginals and apply Bayes’theorem

build a belief network for a domain

predict the inferences for a belief network

explain the predictions of a causal model

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 2 2 / 17

Page 3: The mind is a neural computer, tted by natural selection

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make (informed) decisions given theiruncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

When an agent makes an action under uncertainty, it isgambling =⇒ probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 3 3 / 17

Page 4: The mind is a neural computer, tted by natural selection

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make (informed) decisions given theiruncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

When an agent makes an action under uncertainty, it isgambling =⇒ probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 4 3 / 17

Page 5: The mind is a neural computer, tted by natural selection

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make (informed) decisions given theiruncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

When an agent makes an action under uncertainty, it isgambling =⇒ probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 5 3 / 17

Page 6: The mind is a neural computer, tted by natural selection

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make (informed) decisions given theiruncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

When an agent makes an action under uncertainty, it isgambling =⇒ probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 6 3 / 17

Page 7: The mind is a neural computer, tted by natural selection

Using Uncertain Knowledge

Agents don’t have complete knowledge about the world.

Agents need to make (informed) decisions given theiruncertainty.

It isn’t enough to assume what the world is like.Example: wearing a seat belt.

An agent needs to reason about its uncertainty.

When an agent makes an action under uncertainty, it isgambling =⇒ probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 7 3 / 17

Page 8: The mind is a neural computer, tted by natural selection

Probability

Probability is an agent’s measure of belief in someproposition — subjective probability.

An agent’s belief depends on its prior belief and what itobserves.

Example: An agent’s probability of a particular bird flyingI Other agents may have different probabilitiesI An agent’s belief in a bird’s flying ability is affected by

what the agent knows about that bird.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 8 4 / 17

Page 9: The mind is a neural computer, tted by natural selection

Probability

Probability is an agent’s measure of belief in someproposition — subjective probability.

An agent’s belief depends on its prior belief and what itobserves.

Example: An agent’s probability of a particular bird flyingI Other agents may have different probabilitiesI An agent’s belief in a bird’s flying ability is affected by

what the agent knows about that bird.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 9 4 / 17

Page 10: The mind is a neural computer, tted by natural selection

Random Variables

A random variable starts with upper case.

The domain of a variable X , written domain(X ), is theset of values X can take. (Sometimes use “range”,“frame”, “possible values”).

A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with domaindomain(X1)× · · · × domain(Xn).Often the tuple is written as X1, . . . ,Xn.

Assignment X = x means variable X has value x .

A proposition is a Boolean formula made fromassignments of values to variables or inequality (e.g., <,≤,. . . ) between variables and values.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 10 5 / 17

Page 11: The mind is a neural computer, tted by natural selection

Random Variables

A random variable starts with upper case.

The domain of a variable X , written domain(X ), is theset of values X can take. (Sometimes use “range”,“frame”, “possible values”).

A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with domaindomain(X1)× · · · × domain(Xn).Often the tuple is written as X1, . . . ,Xn.

Assignment X = x means variable X has value x .

A proposition is a Boolean formula made fromassignments of values to variables or inequality (e.g., <,≤,. . . ) between variables and values.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 11 5 / 17

Page 12: The mind is a neural computer, tted by natural selection

Random Variables

A random variable starts with upper case.

The domain of a variable X , written domain(X ), is theset of values X can take. (Sometimes use “range”,“frame”, “possible values”).

A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with domaindomain(X1)× · · · × domain(Xn).Often the tuple is written as X1, . . . ,Xn.

Assignment X = x means variable X has value x .

A proposition is a Boolean formula made fromassignments of values to variables or inequality (e.g., <,≤,. . . ) between variables and values.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 12 5 / 17

Page 13: The mind is a neural computer, tted by natural selection

Random Variables

A random variable starts with upper case.

The domain of a variable X , written domain(X ), is theset of values X can take. (Sometimes use “range”,“frame”, “possible values”).

A tuple of random variables 〈X1, . . . ,Xn〉 is a complexrandom variable with domaindomain(X1)× · · · × domain(Xn).Often the tuple is written as X1, . . . ,Xn.

Assignment X = x means variable X has value x .

A proposition is a Boolean formula made fromassignments of values to variables or inequality (e.g., <,≤,. . . ) between variables and values.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 13 5 / 17

Page 14: The mind is a neural computer, tted by natural selection

Possible World Semantics

A possible world specifies an assignment of one value toeach random variable.

A random variable is a function from possible worlds intothe domain of the random variable.

ω |= X = xmeans variable X is assigned value x in world ω.

Logical connectives have their standard meaning:

ω |= α ∧ β if ω |= α and ω |= β

ω |= α ∨ β if ω |= α or ω |= β

ω |= ¬α if ω 6|= α

Let Ω be the set of all possible worlds.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 14 6 / 17

Page 15: The mind is a neural computer, tted by natural selection

Possible World Semantics

A possible world specifies an assignment of one value toeach random variable.

A random variable is a function from possible worlds intothe domain of the random variable.

ω |= X = xmeans variable X is assigned value x in world ω.

Logical connectives have their standard meaning:

ω |= α ∧ β if ω |= α and ω |= β

ω |= α ∨ β if ω |= α or ω |= β

ω |= ¬α if ω 6|= α

Let Ω be the set of all possible worlds.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 15 6 / 17

Page 16: The mind is a neural computer, tted by natural selection

Possible World Semantics

A possible world specifies an assignment of one value toeach random variable.

A random variable is a function from possible worlds intothe domain of the random variable.

ω |= X = xmeans variable X is assigned value x in world ω.

Logical connectives have their standard meaning:

ω |= α ∧ β if ω |= α and ω |= β

ω |= α ∨ β if ω |= α or ω |= β

ω |= ¬α if ω 6|= α

Let Ω be the set of all possible worlds.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 16 6 / 17

Page 17: The mind is a neural computer, tted by natural selection

Semantics of Probability

Probability defines a measure on sets of possible worlds.A probability measure is a function µ from sets of worlds intothe non-negative real numbers such that:

µ(Ω) = 1

µ(S1 ∪ S2) = µ(S1) + µ(S2)if S1 ∩ S2 = .

Then P(α) = µ(ω | ω |= α).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 17 7 / 17

Page 18: The mind is a neural computer, tted by natural selection

Semantics of Probability

Probability defines a measure on sets of possible worlds.A probability measure is a function µ from sets of worlds intothe non-negative real numbers such that:

µ(Ω) = 1

µ(S1 ∪ S2) = µ(S1) + µ(S2)if S1 ∩ S2 = .

Then P(α) = µ(ω | ω |= α).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 18 7 / 17

Page 19: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 19 8 / 17

Page 20: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 20 8 / 17

Page 21: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 21 8 / 17

Page 22: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 22 8 / 17

Page 23: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 23 8 / 17

Page 24: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 24 8 / 17

Page 25: The mind is a neural computer, tted by natural selection

Semantics

Possible Worlds:

Suppose the measure of each singleton world is 0.1.

What is the probability of circle?

What us the probability of star?

What is the probability of triangle?

What is the probability of orange?

What is the probability of orange and star?

What are the random variables?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 25 8 / 17

Page 26: The mind is a neural computer, tted by natural selection

Axioms of Probability (finite case)

Three axioms define what follows from a set of probabilities:

Axiom 1 0 ≤ P(a) for any proposition a.

Axiom 2 P(true) = 1

Axiom 3 P(a ∨ b) = P(a) + P(b) if a and b cannot bothbe true.

These axioms are sound and complete with respect to thesemantics.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 26 9 / 17

Page 27: The mind is a neural computer, tted by natural selection

Conditioning

Probabilistic conditioning specifies how to revise beliefsbased on new information.

An agent builds a probabilistic model taking allbackground information into account.This gives a prior probability.

All other information must be conditioned on.

If evidence e is the all of the information obtainedsubsequently, the conditional probability P(h | e) of hgiven e is the posterior probability of h.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 27 10 / 17

Page 28: The mind is a neural computer, tted by natural selection

Conditioning

Probabilistic conditioning specifies how to revise beliefsbased on new information.

An agent builds a probabilistic model taking allbackground information into account.This gives a prior probability.

All other information must be conditioned on.

If evidence e is the all of the information obtainedsubsequently, the conditional probability P(h | e) of hgiven e is the posterior probability of h.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 28 10 / 17

Page 29: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S)

if ω |= e for all ω ∈ S

0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 29 11 / 17

Page 30: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S)

if ω |= e for all ω ∈ S

0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 30 11 / 17

Page 31: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S)

if ω |= e for all ω ∈ S

0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 31 11 / 17

Page 32: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S

0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 32 11 / 17

Page 33: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S

0

if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 33 11 / 17

Page 34: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 34 11 / 17

Page 35: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for all ω ∈ S

We can show c =

1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 35 11 / 17

Page 36: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) =

µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 36 11 / 17

Page 37: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=

P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 37 11 / 17

Page 38: The mind is a neural computer, tted by natural selection

Semantics of Conditional Probability

Evidence e rules out possible worlds incompatible with e.

Evidence e induces a new measure, µe , over possibleworlds:

µe(S) =

c × µ(S) if ω |= e for all ω ∈ S0 if ω 6|= e for all ω ∈ S

We can show c = 1P(e)

.

The conditional probability of formula h given evidence eis

P(h | e) = µe(ω : ω |= h)

=P(h ∧ e)

P(e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 38 11 / 17

Page 39: The mind is a neural computer, tted by natural selection

Conditioning

Possible Worlds:

Observe Color = orange:

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 39 12 / 17

Page 40: The mind is a neural computer, tted by natural selection

Conditioning

Possible Worlds:

Observe Color = orange:

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 40 12 / 17

Page 41: The mind is a neural computer, tted by natural selection

Conditioning

Possible Worlds:

Observe Color = orange:

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 41 12 / 17

Page 42: The mind is a neural computer, tted by natural selection

Exercise

Flu Sneeze Snore µtrue true true 0.064true true false 0.096true false true 0.016true false false 0.024false true true 0.096false true false 0.144false false true 0.224false false false 0.336

What is:

(a) P(flu ∧ sneeze)

(b) P(flu ∧ ¬sneeze)

(c) P(flu)

(d) P(sneeze | flu)

(e) P(¬flu ∧ sneeze)

(f) P(flu | sneeze)

(g) P(sneeze | flu∧snore)

(h) P(flu | sneeze∧snore)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 42 13 / 17

Page 43: The mind is a neural computer, tted by natural selection

Chain Rule

P(f1 ∧ f2 ∧ . . . ∧ fn)

=

P(fn | f1 ∧ · · · ∧ fn−1)×P(f1 ∧ · · · ∧ fn−1)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)×P(f1 ∧ · · · ∧ fn−2)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)

× · · · × P(f3 | f1 ∧ f2)× P(f2 | f1)× P(f1)

=n∏

i=1

P(fi | f1 ∧ · · · ∧ fi−1)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 43 14 / 17

Page 44: The mind is a neural computer, tted by natural selection

Chain Rule

P(f1 ∧ f2 ∧ . . . ∧ fn)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(f1 ∧ · · · ∧ fn−1)

=

P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)×P(f1 ∧ · · · ∧ fn−2)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)

× · · · × P(f3 | f1 ∧ f2)× P(f2 | f1)× P(f1)

=n∏

i=1

P(fi | f1 ∧ · · · ∧ fi−1)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 44 14 / 17

Page 45: The mind is a neural computer, tted by natural selection

Chain Rule

P(f1 ∧ f2 ∧ . . . ∧ fn)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(f1 ∧ · · · ∧ fn−1)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)×P(f1 ∧ · · · ∧ fn−2)

= P(fn | f1 ∧ · · · ∧ fn−1)×P(fn−1 | f1 ∧ · · · ∧ fn−2)

× · · · × P(f3 | f1 ∧ f2)× P(f2 | f1)× P(f1)

=n∏

i=1

P(fi | f1 ∧ · · · ∧ fi−1)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 45 14 / 17

Page 46: The mind is a neural computer, tted by natural selection

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) =

P(h | e)× P(e)

= P(e | h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h | e) =P(e | h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 46 15 / 17

Page 47: The mind is a neural computer, tted by natural selection

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h | e)× P(e)

= P(e | h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h | e) =P(e | h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 47 15 / 17

Page 48: The mind is a neural computer, tted by natural selection

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h | e)× P(e)

= P(e | h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h | e) =P(e | h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 48 15 / 17

Page 49: The mind is a neural computer, tted by natural selection

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h | e)× P(e)

= P(e | h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h | e) =

P(e | h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 49 15 / 17

Page 50: The mind is a neural computer, tted by natural selection

Bayes’ theorem

The chain rule and commutativity of conjunction (h ∧ e isequivalent to e ∧ h) gives us:

P(h ∧ e) = P(h | e)× P(e)

= P(e | h)× P(h).

If P(e) 6= 0, divide the right hand sides by P(e):

P(h | e) =P(e | h)× P(h)

P(e).

This is Bayes’ theorem.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 50 15 / 17

Page 51: The mind is a neural computer, tted by natural selection

Why is Bayes’ theorem interesting?

Often you have causal knowledge:P(symptom | disease)P(light is off | status of switches and switch positions)P(alarm | fire)

P(image looks like | a tree is in front of a car)

and want to do evidential reasoning:P(disease | symptom)P(status of switches | light is off and switch positions)P(fire | alarm).

P(a tree is in front of a car | image looks like )

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 51 16 / 17

Page 52: The mind is a neural computer, tted by natural selection

Exercise

A cab was involved in a hit-and-run accident at night. Twocab companies, the Green and the Blue, operate in the city.You are given the following data:

85% of the cabs in the city are Green and 15% are Blue.

A witness identified the cab as Blue. The court tested thereliability of the witness in the circumstances that existedon the night of the accident and concluded that thewitness correctly identifies each one of the two colours80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accidentwas Blue?

[From D. Kahneman, Thinking Fast and Slow, 2011, p. 166.]

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.1, Page 52 17 / 17

Page 53: The mind is a neural computer, tted by natural selection

Conditional independence

Random variable X is independent of random variable Y givenrandom variable(s) Z if,

P(X | YZ ) = P(X | Z )

i.e. for all x ∈ dom(X ), y , y ′ ∈ dom(Y ), and z ∈ dom(Z ),

P(X = x | Y = y ∧ Z = z)

= P(X = x | Y = y ′ ∧ Z = z)

= P(X = x | Z = z).

That is, knowledge of Y ’s value doesn’t affect the belief in thevalue of X , given a value of Z .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 1 1 / 12

Page 54: The mind is a neural computer, tted by natural selection

Conditional independence

Random variable X is independent of random variable Y givenrandom variable(s) Z if,

P(X | YZ ) = P(X | Z )

i.e. for all x ∈ dom(X ), y , y ′ ∈ dom(Y ), and z ∈ dom(Z ),

P(X = x | Y = y ∧ Z = z)

= P(X = x | Y = y ′ ∧ Z = z)

= P(X = x | Z = z).

That is, knowledge of Y ’s value doesn’t affect the belief in thevalue of X , given a value of Z .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 2 1 / 12

Page 55: The mind is a neural computer, tted by natural selection

Example

Consider a student writing an exam.What are reasonable independences among the following?

Whether the student works hard

Whether the student is intelligent

The student’s answers on the exam

The student’s mark on an exam

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 3 2 / 12

Page 56: The mind is a neural computer, tted by natural selection

Example domain (diagnostic assistant)

light

two-wayswitch

switch

off

on

poweroutlet

circuitbreaker

outside power

l1

l2

w1

w0

w2

w4

w3

w6

w5

p2

p1

cb2

cb1s1

s2s3

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 4 3 / 12

Page 57: The mind is a neural computer, tted by natural selection

Examples of conditional independence?

Suppose you know whether there was power in w1 andwhether there was power in w2 what information isrelevant to whether light l1 is lit? What is independent?

Whether light l1 is lit is independent of the position oflight switch s2 given what?

Every other variable may be independent of whether lightl1 is lit given whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 5 4 / 12

Page 58: The mind is a neural computer, tted by natural selection

Examples of conditional independence?

Suppose you know whether there was power in w1 andwhether there was power in w2 what information isrelevant to whether light l1 is lit? What is independent?

Whether light l1 is lit is independent of the position oflight switch s2 given what?

Every other variable may be independent of whether lightl1 is lit given whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 6 4 / 12

Page 59: The mind is a neural computer, tted by natural selection

Examples of conditional independence?

Suppose you know whether there was power in w1 andwhether there was power in w2 what information isrelevant to whether light l1 is lit? What is independent?

Whether light l1 is lit is independent of the position oflight switch s2 given what?

Every other variable may be independent of whether lightl1 is lit given

whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 7 4 / 12

Page 60: The mind is a neural computer, tted by natural selection

Examples of conditional independence?

Suppose you know whether there was power in w1 andwhether there was power in w2 what information isrelevant to whether light l1 is lit? What is independent?

Whether light l1 is lit is independent of the position oflight switch s2 given what?

Every other variable may be independent of whether lightl1 is lit given whether there is power in wire w0 and thestatus of light l1 (if it’s ok , or if not, how it’s broken).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 8 4 / 12

Page 61: The mind is a neural computer, tted by natural selection

Idea of belief networks

l1 is lit (L1 lit) dependsonly on the status of thelight (L1 st) and whetherthere is power in wire w0.

In a belief network, W 0and L1 st are parents ofL1 lit.

w1 w2

s2_pos

s2_st

w0

l1_lit

l1_st

... ... ......

W 0 depends only on

whether there is power in w1,whether there is power in w2, the position of switch s2(S2 pos), and the status of switch s2 (S2 st).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 9 5 / 12

Page 62: The mind is a neural computer, tted by natural selection

Idea of belief networks

l1 is lit (L1 lit) dependsonly on the status of thelight (L1 st) and whetherthere is power in wire w0.

In a belief network, W 0and L1 st are parents ofL1 lit.

w1 w2

s2_pos

s2_st

w0

l1_lit

l1_st

... ... ......

W 0 depends only on whether there is power in w1,whether there is power in w2, the position of switch s2(S2 pos), and the status of switch s2 (S2 st).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 10 5 / 12

Page 63: The mind is a neural computer, tted by natural selection

Belief networks

Totally order the variables of interest: X1, . . . ,Xn

Theorem of probability theory (chain rule):P(X1, . . . ,Xn) =

∏ni=1 P(Xi | X1, . . . ,Xi−1)

The parents parents(Xi) of Xi are those predecessors ofXi that render Xi independent of the other predecessors.That is,

parents(Xi) ⊆ X1, . . . ,Xi−1 andP(Xi | parents(Xi)) = P(Xi | X1, . . . ,Xi−1)

So P(X1, . . . ,Xn) =∏n

i=1 P(Xi | parents(Xi))

A belief network is a graph: the nodes are randomvariables; there is an arc from the parents of each nodeinto that node.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 11 6 / 12

Page 64: The mind is a neural computer, tted by natural selection

Belief networks

Totally order the variables of interest: X1, . . . ,Xn

Theorem of probability theory (chain rule):P(X1, . . . ,Xn) =

∏ni=1 P(Xi | X1, . . . ,Xi−1)

The parents parents(Xi) of Xi are those predecessors ofXi that render Xi independent of the other predecessors.That is, parents(Xi) ⊆ X1, . . . ,Xi−1 andP(Xi | parents(Xi)) = P(Xi | X1, . . . ,Xi−1)

So P(X1, . . . ,Xn) =∏n

i=1 P(Xi | parents(Xi))

A belief network is a graph: the nodes are randomvariables; there is an arc from the parents of each nodeinto that node.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 12 6 / 12

Page 65: The mind is a neural computer, tted by natural selection

Example: fire alarm belief network

Variables:

Fire: there is a fire in the building

Tampering: someone has been tampering with the firealarm

Smoke: what appears to be smoke is coming from anupstairs window

Alarm: the fire alarm goes off

Leaving: people are leaving the building en masse.

Report: a colleague says that people are leaving thebuilding en masse. (A noisy sensor for leaving.)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 13 7 / 12

Page 66: The mind is a neural computer, tted by natural selection

Components of a belief network

A belief network consists of:

a directed acyclic graph with nodes labeled with randomvariables

a domain for each random variable

a set of conditional probability tables for each variablegiven its parents (including prior probabilities for nodeswith no parents).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 14 8 / 12

Page 67: The mind is a neural computer, tted by natural selection

Example belief network

Outside_power

W3

Cb1_st Cb2_st

W6

W2

W0

W1

W4

S1_st

S2_st

P1P2

S1_pos

S2_pos

S3_pos

S3_st

L2_st

L2_lit

L1_st

L1_lit

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 15 9 / 12

Page 68: The mind is a neural computer, tted by natural selection

Example belief network (continued)

The belief network also specifies:

The domain of the variables:W0, . . . ,W6 have domain live, deadS1 pos, S2 pos, and S3 pos have domain up, downS1 st has ok , upside down, short, intermittent, broken.Conditional probabilities, including:P(W1 = live | s1 pos = up ∧ S1 st = ok ∧W3 = live)P(W1 = live | s1 pos = up ∧ S1 st = ok ∧W3 = dead)P(S1 pos = up)P(S1 st = upside down)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 16 10 / 12

Page 69: The mind is a neural computer, tted by natural selection

Belief network summary

A belief network is a directed acyclic graph (DAG) wherenodes are random variables.

The parents of a node n are those variables on which ndirectly depends.

A belief network is automatically acyclic by construction.

A belief network is a graphical representation ofdependence and independence:

I A variable is independent of its non-descendants givenits parents.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 17 11 / 12

Page 70: The mind is a neural computer, tted by natural selection

Constructing belief networks

To represent a domain in a belief network, you need toconsider:

What are the relevant variables?I What will you observe?I What would you like to find out (query)?I What other features make the model simpler?

What values should these variables take?

What is the relationship between them? This should beexpressed in terms of a directed graph, representing howeach variable is generated from its predecessors.

How does the value of each variable depend on itsparents? This is expressed in terms of the conditionalprobabilities.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.2, Page 18 12 / 12

Page 71: The mind is a neural computer, tted by natural selection

Understanding Independence: Common ancestors

smokealarm

fire

alarm and smoke are

dependent

alarm and smoke are

independent

given fire

Intuitively, fire canexplain alarm andsmoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 1 1 / 20

Page 72: The mind is a neural computer, tted by natural selection

Understanding Independence: Common ancestors

smokealarm

fire

alarm and smoke aredependent

alarm and smoke are

independent

given fire

Intuitively, fire canexplain alarm andsmoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 2 1 / 20

Page 73: The mind is a neural computer, tted by natural selection

Understanding Independence: Common ancestors

smokealarm

fire

alarm and smoke aredependent

alarm and smoke are

independent

given fire

Intuitively, fire canexplain alarm andsmoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 3 1 / 20

Page 74: The mind is a neural computer, tted by natural selection

Understanding Independence: Common ancestors

smokealarm

fire

alarm and smoke aredependent

alarm and smoke areindependent given fire

Intuitively, fire canexplain alarm andsmoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 4 1 / 20

Page 75: The mind is a neural computer, tted by natural selection

Understanding Independence: Common ancestors

smokealarm

fire

alarm and smoke aredependent

alarm and smoke areindependent given fire

Intuitively, fire canexplain alarm andsmoke; learning onecan affect the other bychanging your belief infire.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 5 1 / 20

Page 76: The mind is a neural computer, tted by natural selection

Understanding Independence: Chain

report

alarm

leaving

alarm and report are

dependent

alarm and report are

independent

givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 6 2 / 20

Page 77: The mind is a neural computer, tted by natural selection

Understanding Independence: Chain

report

alarm

leaving

alarm and report aredependent

alarm and report are

independent

givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 7 2 / 20

Page 78: The mind is a neural computer, tted by natural selection

Understanding Independence: Chain

report

alarm

leaving

alarm and report aredependent

alarm and report are

independent

givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 8 2 / 20

Page 79: The mind is a neural computer, tted by natural selection

Understanding Independence: Chain

report

alarm

leaving

alarm and report aredependent

alarm and report areindependent givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 9 2 / 20

Page 80: The mind is a neural computer, tted by natural selection

Understanding Independence: Chain

report

alarm

leaving

alarm and report aredependent

alarm and report areindependent givenleaving

Intuitively, the onlyway that the alarmaffects report is byaffecting leaving .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 10 2 / 20

Page 81: The mind is a neural computer, tted by natural selection

Understanding Independence: Common

descendants

tampering

alarm

fire tampering and fire are

independent

tampering and fire are

dependent

given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 11 3 / 20

Page 82: The mind is a neural computer, tted by natural selection

Understanding Independence: Common

descendants

tampering

alarm

fire tampering and fire areindependent

tampering and fire are

dependent

given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 12 3 / 20

Page 83: The mind is a neural computer, tted by natural selection

Understanding Independence: Common

descendants

tampering

alarm

fire tampering and fire areindependent

tampering and fire are

dependent

given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 13 3 / 20

Page 84: The mind is a neural computer, tted by natural selection

Understanding Independence: Common

descendants

tampering

alarm

fire tampering and fire areindependent

tampering and fire aredependent given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 14 3 / 20

Page 85: The mind is a neural computer, tted by natural selection

Understanding Independence: Common

descendants

tampering

alarm

fire tampering and fire areindependent

tampering and fire aredependent given alarm

Intuitively, tamperingcan explain away fire

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 15 3 / 20

Page 86: The mind is a neural computer, tted by natural selection

Understanding independence: example

B CA D E F

G H

M

I J K L

N

Q

R

O

S

P

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 16 4 / 20

Page 87: The mind is a neural computer, tted by natural selection

Understanding independence: questions

1. On which given probabilities does P(N) depend?

2. If you were to observe a value for B , which variables’probabilities will change?

3. If you were to observe a value for N , which variables’probabilities will change?

4. Suppose you had observed a value for M ; if you were tothen observe a value for N , which variables’ probabilitieswill change?

5. Suppose you had observed B and Q; which variables’probabilities will change when you observe N?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 17 5 / 20

Page 88: The mind is a neural computer, tted by natural selection

Understanding independence: questions

1. On which given probabilities does P(N) depend?

2. If you were to observe a value for B , which variables’probabilities will change?

3. If you were to observe a value for N , which variables’probabilities will change?

4. Suppose you had observed a value for M ; if you were tothen observe a value for N , which variables’ probabilitieswill change?

5. Suppose you had observed B and Q; which variables’probabilities will change when you observe N?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 18 5 / 20

Page 89: The mind is a neural computer, tted by natural selection

Understanding independence: questions

1. On which given probabilities does P(N) depend?

2. If you were to observe a value for B , which variables’probabilities will change?

3. If you were to observe a value for N , which variables’probabilities will change?

4. Suppose you had observed a value for M ; if you were tothen observe a value for N , which variables’ probabilitieswill change?

5. Suppose you had observed B and Q; which variables’probabilities will change when you observe N?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 19 5 / 20

Page 90: The mind is a neural computer, tted by natural selection

Understanding independence: questions

1. On which given probabilities does P(N) depend?

2. If you were to observe a value for B , which variables’probabilities will change?

3. If you were to observe a value for N , which variables’probabilities will change?

4. Suppose you had observed a value for M ; if you were tothen observe a value for N , which variables’ probabilitieswill change?

5. Suppose you had observed B and Q; which variables’probabilities will change when you observe N?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 20 5 / 20

Page 91: The mind is a neural computer, tted by natural selection

Understanding independence: questions

1. On which given probabilities does P(N) depend?

2. If you were to observe a value for B , which variables’probabilities will change?

3. If you were to observe a value for N , which variables’probabilities will change?

4. Suppose you had observed a value for M ; if you were tothen observe a value for N , which variables’ probabilitieswill change?

5. Suppose you had observed B and Q; which variables’probabilities will change when you observe N?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 21 5 / 20

Page 92: The mind is a neural computer, tted by natural selection

What variables are affected by observing?

If you observe variable(s) Y , the variables whose posteriorprobability is different from their prior are:

I The ancestors of Y andI their descendants.

Intuitively (if you have a causal belief network):I You do abduction to possible causes andI prediction from the causes.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 22 6 / 20

Page 93: The mind is a neural computer, tted by natural selection

d-separation

A connection is a meeting of arcs in a belief network. Aconnection is open is defined as follows:

I If there are arcs A→ B and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs B → A and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs A→ B and C → B such that B (or adescendent of B) is in Z , then the connection at Bbetween A and C is open.

X is d-connected from Y given Z if there is a path fromX to Y , along open connections.

X is d-separated from Y given Z if it is not d-connected.

X is independent Y given Z for all conditionalprobabilities iff X is d-separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 23 7 / 20

Page 94: The mind is a neural computer, tted by natural selection

d-separation

A connection is a meeting of arcs in a belief network. Aconnection is open is defined as follows:

I If there are arcs A→ B and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs B → A and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs A→ B and C → B such that B (or adescendent of B) is in Z , then the connection at Bbetween A and C is open.

X is d-connected from Y given Z if there is a path fromX to Y , along open connections.

X is d-separated from Y given Z if it is not d-connected.

X is independent Y given Z for all conditionalprobabilities iff X is d-separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 24 7 / 20

Page 95: The mind is a neural computer, tted by natural selection

d-separation

A connection is a meeting of arcs in a belief network. Aconnection is open is defined as follows:

I If there are arcs A→ B and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs B → A and B → C such that B 6∈ Z ,then the connection at B between A and C is open.

I If there are arcs A→ B and C → B such that B (or adescendent of B) is in Z , then the connection at Bbetween A and C is open.

X is d-connected from Y given Z if there is a path fromX to Y , along open connections.

X is d-separated from Y given Z if it is not d-connected.

X is independent Y given Z for all conditionalprobabilities iff X is d-separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 25 7 / 20

Page 96: The mind is a neural computer, tted by natural selection

Markov Random Field

A Markov random field is composed of

of a set of discrete-valued random variables:X = X1,X2, . . . ,Xn and

a set of factors f1, . . . , fm, where a factor is anon-negative function of a subset of the variables.

and defines a joint probability distribution:

P(X = x) =1

Z

∏k

fk(Xk = xk) .

Z =∑

x

∏k

fk(Xk = xk)

where fk(Xk) is a factor on Xk ⊆ X, andxk is x projected onto Xk .Z is a normalization constant known as the partition function.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 26 8 / 20

Page 97: The mind is a neural computer, tted by natural selection

Markov Networks and Factor graphs

A Markov network is a graphical representation of aMarkov random field where the nodes are the randomvariables and there is an arc between any two variablesthat are in a factor together.

A factor graph is a bipartite graph, which contains avariable node for each random variable and a factor nodefor each factor. There is an edge between a variable nodeand a factor node if the variable appears in the factor.

A belief network is a type of Markov random field wherethe factors represent conditional probabilities, there is afactor for each variable, and directed graph is acyclic.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 27 9 / 20

Page 98: The mind is a neural computer, tted by natural selection

Markov Networks and Factor graphs

A Markov network is a graphical representation of aMarkov random field where the nodes are the randomvariables and there is an arc between any two variablesthat are in a factor together.

A factor graph is a bipartite graph, which contains avariable node for each random variable and a factor nodefor each factor. There is an edge between a variable nodeand a factor node if the variable appears in the factor.

A belief network is a type of Markov random field wherethe factors represent conditional probabilities, there is afactor for each variable, and directed graph is acyclic.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 28 9 / 20

Page 99: The mind is a neural computer, tted by natural selection

Markov Networks and Factor graphs

A Markov network is a graphical representation of aMarkov random field where the nodes are the randomvariables and there is an arc between any two variablesthat are in a factor together.

A factor graph is a bipartite graph, which contains avariable node for each random variable and a factor nodefor each factor. There is an edge between a variable nodeand a factor node if the variable appears in the factor.

A belief network is a

type of Markov random field wherethe factors represent conditional probabilities, there is afactor for each variable, and directed graph is acyclic.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 29 9 / 20

Page 100: The mind is a neural computer, tted by natural selection

Markov Networks and Factor graphs

A Markov network is a graphical representation of aMarkov random field where the nodes are the randomvariables and there is an arc between any two variablesthat are in a factor together.

A factor graph is a bipartite graph, which contains avariable node for each random variable and a factor nodefor each factor. There is an edge between a variable nodeand a factor node if the variable appears in the factor.

A belief network is a type of Markov random field wherethe factors represent conditional probabilities, there is afactor for each variable, and directed graph is acyclic.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 30 9 / 20

Page 101: The mind is a neural computer, tted by natural selection

Independence in a Markov Network

The Markov blanket of a variable X is the set of variablesthat are in factors with X .

A variable is independent of the other variables given itsMarkov blanket.

X is connected to Y given Z if there is a path from X toY in the Markov network, which does not contain anelement of Z .

X is separated from Y given Z if it is not connected.

A positive distribution is one that does not contain zeroprobabilities.

X is independent Y given Z for all positive distributionsiff X is separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 31 10 / 20

Page 102: The mind is a neural computer, tted by natural selection

Independence in a Markov Network

The Markov blanket of a variable X is the set of variablesthat are in factors with X .

A variable is independent of the other variables given itsMarkov blanket.

X is connected to Y given Z if there is a path from X toY in the Markov network, which does not contain anelement of Z .

X is separated from Y given Z if it is not connected.

A positive distribution is one that does not contain zeroprobabilities.

X is independent Y given Z for all positive distributionsiff X is separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 32 10 / 20

Page 103: The mind is a neural computer, tted by natural selection

Independence in a Markov Network

The Markov blanket of a variable X is the set of variablesthat are in factors with X .

A variable is independent of the other variables given itsMarkov blanket.

X is connected to Y given Z if there is a path from X toY in the Markov network, which does not contain anelement of Z .

X is separated from Y given Z if it is not connected.

A positive distribution is one that does not contain zeroprobabilities.

X is independent Y given Z for all positive distributionsiff X is separated from Y given Z

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 33 10 / 20

Page 104: The mind is a neural computer, tted by natural selection

Canonical Representations

The parameters of a graphical model are the numbersthat define the model.

A belief network is a canonical representation: given thestructure and the distribution, the parameters areuniquely determined.

A Markov random field is not a canonical representation.Many different parameterizations result in the samedistribution.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 34 11 / 20

Page 105: The mind is a neural computer, tted by natural selection

Representations of Conditional Probabilities

There are many representations of conditional probabilities andfactors:

Tables

Decision Trees

Rules

Weighted Logical Formulae

Contextual Tables

Logistic Functions

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 35 12 / 20

Page 106: The mind is a neural computer, tted by natural selection

Tabular Representation

P(D | A,B ,C ) :

A B C D Prob

true true true true 0.9true true true false 0.1true true false true 0.9true true false false 0.1true false true true 0.2true false true false 0.8true false false true 0.2true false false false 0.8false true true true 0.3false true true false 0.7false true false true 0.4false true false false 0.6false false true true 0.3false false true false 0.7false false false true 0.4false false false false 0.6

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 36 13 / 20

Page 107: The mind is a neural computer, tted by natural selection

Decision Tree Representation

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 37 14 / 20

Page 108: The mind is a neural computer, tted by natural selection

Rule Representation

0.9 : d ← a ∧ b

0.2 : d ← a ∧ ¬b

0.3 : d ← ¬a ∧ c

0.4 : d ← ¬a ∧ ¬c

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 38 15 / 20

Page 109: The mind is a neural computer, tted by natural selection

Weighted Logical Formulae

d ↔ ((a ∧ b ∧ n0)

∨ (a ∧ ¬b ∧ n1)

∨ (¬a ∧ c ∧ n2)

∨ (¬a ∧ ¬c ∧ n2))

ni are independent:

P(n0) = 0.9

P(n1) = 0.2

P(n2) = 0.3

P(n3) = 0.4

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 39 16 / 20

Page 110: The mind is a neural computer, tted by natural selection

Contextual-Table Representation

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 40 17 / 20

Page 111: The mind is a neural computer, tted by natural selection

Logistic Functions

P(h | e) =P(h ∧ e)

P(e)

=P(h ∧ e)

P(h ∧ e) + P(¬h ∧ e)

=1

1 + P(¬h ∧ e)/P(h ∧ e)

=1

1 + e− logP(h∧e)/P(¬h∧e)

= sigmoid(log odds(h | e))

sigmoid(x) =1

1 + e−x

odds(h | e) =P(h ∧ e)

P(¬h ∧ e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 41 18 / 20

Page 112: The mind is a neural computer, tted by natural selection

Logistic Functions

P(h | e) =P(h ∧ e)

P(e)

=P(h ∧ e)

P(h ∧ e) + P(¬h ∧ e)

=1

1 + P(¬h ∧ e)/P(h ∧ e)

=1

1 + e− logP(h∧e)/P(¬h∧e)

= sigmoid(log odds(h | e))

sigmoid(x) =1

1 + e−x

odds(h | e) =P(h ∧ e)

P(¬h ∧ e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 42 18 / 20

Page 113: The mind is a neural computer, tted by natural selection

Logistic Functions

P(h | e) =P(h ∧ e)

P(e)

=P(h ∧ e)

P(h ∧ e) + P(¬h ∧ e)

=1

1 + P(¬h ∧ e)/P(h ∧ e)

=1

1 + e− logP(h∧e)/P(¬h∧e)

= sigmoid(log odds(h | e))

sigmoid(x) =1

1 + e−x

odds(h | e) =P(h ∧ e)

P(¬h ∧ e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 43 18 / 20

Page 114: The mind is a neural computer, tted by natural selection

Logistic Functions

P(h | e) =P(h ∧ e)

P(e)

=P(h ∧ e)

P(h ∧ e) + P(¬h ∧ e)

=1

1 + P(¬h ∧ e)/P(h ∧ e)

=1

1 + e− logP(h∧e)/P(¬h∧e)

= sigmoid(log odds(h | e))

sigmoid(x) =1

1 + e−x

odds(h | e) =P(h ∧ e)

P(¬h ∧ e)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 44 18 / 20

Page 115: The mind is a neural computer, tted by natural selection

Logistic Functions

P(h | e) =P(h ∧ e)

P(e)

=P(h ∧ e)

P(h ∧ e) + P(¬h ∧ e)

=1

1 + P(¬h ∧ e)/P(h ∧ e)

=1

1 + e− logP(h∧e)/P(¬h∧e)

= sigmoid(log odds(h | e))

sigmoid(x) =1

1 + e−x

odds(h | e) =P(h ∧ e)

P(¬h ∧ e)c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 45 18 / 20

Page 116: The mind is a neural computer, tted by natural selection

Logistic Functions

A conditional probability is the sigmoid of the log-odds.

00.10.20.30.40.50.60.70.80.9

1

-10 -5 0 5 10

1

1 + e- x

sigmoid(x) =1

1 + e−x

A logistic function is the sigmoid of a linear function.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 46 19 / 20

Page 117: The mind is a neural computer, tted by natural selection

Logistic Representation of Conditional Probability

P(d | A,B ,C ) = sigmoid(0.9† ∗ A ∗ B

+ 0.2† ∗ A ∗ (1− B)

+ 0.3† ∗ (1− A) ∗ C

+ 0.4† ∗ (1− A) ∗ (1− C ))

where 0.9† is sigmoid−1(0.9).

P(d | A,B ,C ) = sigmoid(0.4†

+ (0.2† − 0.4†) ∗ A

+ (0.9† − 0.2†) ∗ A ∗ B

+ ...

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 47 20 / 20

Page 118: The mind is a neural computer, tted by natural selection

Logistic Representation of Conditional Probability

P(d | A,B ,C ) = sigmoid(0.9† ∗ A ∗ B

+ 0.2† ∗ A ∗ (1− B)

+ 0.3† ∗ (1− A) ∗ C

+ 0.4† ∗ (1− A) ∗ (1− C ))

where 0.9† is sigmoid−1(0.9).

P(d | A,B ,C ) = sigmoid(0.4†

+ (0.2† − 0.4†) ∗ A

+ (0.9† − 0.2†) ∗ A ∗ B

+ ...

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.3, Page 48 20 / 20

Page 119: The mind is a neural computer, tted by natural selection

Belief network inference

Main approaches to determine posterior distributions in graphicalmodels:

Variable Elimination, recursive conditioning: exploit thestructure of the network to eliminate (sum out) thenon-observed, non-query variables one at a time.

Stochastic simulation: random cases are generated accordingto the probability distributions.

Variational methods: find the closest tractable distribution tothe (posterior) distribution we are interested in.

Bounding approaches: bound the conditional probabilitesabove and below and iteratively reduce the bounds.

. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 1 1 / 17

Page 120: The mind is a neural computer, tted by natural selection

Belief network inference

Main approaches to determine posterior distributions in graphicalmodels:

Variable Elimination, recursive conditioning: exploit thestructure of the network to eliminate (sum out) thenon-observed, non-query variables one at a time.

Stochastic simulation: random cases are generated accordingto the probability distributions.

Variational methods: find the closest tractable distribution tothe (posterior) distribution we are interested in.

Bounding approaches: bound the conditional probabilitesabove and below and iteratively reduce the bounds.

. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 2 1 / 17

Page 121: The mind is a neural computer, tted by natural selection

Belief network inference

Main approaches to determine posterior distributions in graphicalmodels:

Variable Elimination, recursive conditioning: exploit thestructure of the network to eliminate (sum out) thenon-observed, non-query variables one at a time.

Stochastic simulation: random cases are generated accordingto the probability distributions.

Variational methods: find the closest tractable distribution tothe (posterior) distribution we are interested in.

Bounding approaches: bound the conditional probabilitesabove and below and iteratively reduce the bounds.

. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 3 1 / 17

Page 122: The mind is a neural computer, tted by natural selection

Belief network inference

Main approaches to determine posterior distributions in graphicalmodels:

Variable Elimination, recursive conditioning: exploit thestructure of the network to eliminate (sum out) thenon-observed, non-query variables one at a time.

Stochastic simulation: random cases are generated accordingto the probability distributions.

Variational methods: find the closest tractable distribution tothe (posterior) distribution we are interested in.

Bounding approaches: bound the conditional probabilitesabove and below and iteratively reduce the bounds.

. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 4 1 / 17

Page 123: The mind is a neural computer, tted by natural selection

Factors

A factor is a representation of a function from a tuple ofrandom variables into a number.

We write factor f on variables X1, . . . ,Xj as f (X1, . . . ,Xj).

We can assign some or all of the variables of a factor:I f (X1 = v1,X2, . . . ,Xj), where v1 ∈ dom(X1), is a factor on

X2, . . . ,Xj .I f (X1 = v1,X2 = v2, . . . ,Xj = vj) is a number that is the value

of f when each Xi has value vi .

The former is also written as f (X1,X2, . . . ,Xj)X1 = v1 , etc.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 5 2 / 17

Page 124: The mind is a neural computer, tted by natural selection

Factors

A factor is a representation of a function from a tuple ofrandom variables into a number.

We write factor f on variables X1, . . . ,Xj as f (X1, . . . ,Xj).

We can assign some or all of the variables of a factor:I f (X1 = v1,X2, . . . ,Xj), where v1 ∈ dom(X1), is a factor on

X2, . . . ,Xj .I f (X1 = v1,X2 = v2, . . . ,Xj = vj) is a number that is the value

of f when each Xi has value vi .

The former is also written as f (X1,X2, . . . ,Xj)X1 = v1 , etc.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 6 2 / 17

Page 125: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f

0.9

f t

0.2

f f

0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 7 3 / 17

Page 126: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t

0.2

f f

0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 8 3 / 17

Page 127: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f

0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 9 3 / 17

Page 128: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 10 3 / 17

Page 129: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 11 3 / 17

Page 130: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t

0.9

f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 12 3 / 17

Page 131: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t 0.9f

0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 13 3 / 17

Page 132: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t 0.9f 0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 14 3 / 17

Page 133: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t 0.9f 0.8

r(X=t,Y=f ,Z=f ) =

0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 15 3 / 17

Page 134: The mind is a neural computer, tted by natural selection

Example factors

r(X ,Y ,Z ):

X Y Z val

t t t 0.1t t f 0.9t f t 0.2t f f 0.8f t t 0.4f t f 0.6f f t 0.3f f f 0.7

r(X=t,Y ,Z ):

Y Z val

t t 0.1t f 0.9f t 0.2f f 0.8

r(X=t,Y ,Z=f ):

Y val

t 0.9f 0.8

r(X=t,Y=f ,Z=f ) = 0.8

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 16 3 / 17

Page 135: The mind is a neural computer, tted by natural selection

Multiplying factors

The product of factor f1(X ,Y ) and f2(Y ,Z ), where Y are thevariables in common, is the factor (f1 ∗ f2)(X ,Y ,Z ) defined by:

(f1 ∗ f2)(X ,Y ,Z ) = f1(X ,Y )f2(Y ,Z ).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 17 4 / 17

Page 136: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f

0.07

t f t

0.54

t f f

0.36

f t t

0.06

f t f

0.14

f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 18 5 / 17

Page 137: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t

0.54

t f f

0.36

f t t

0.06

f t f

0.14

f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 19 5 / 17

Page 138: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f

0.36

f t t

0.06

f t f

0.14

f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 20 5 / 17

Page 139: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t

0.06

f t f

0.14

f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 21 5 / 17

Page 140: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f

0.14

f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 22 5 / 17

Page 141: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t

0.48

f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 23 5 / 17

Page 142: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f

0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 24 5 / 17

Page 143: The mind is a neural computer, tted by natural selection

Multiplying factors example

f1:

A B valt t 0.1t f 0.9f t 0.2f f 0.8

f2:

B C valt t 0.3t f 0.7f t 0.6f f 0.4

f1 ∗ f2:

A B C valt t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 25 5 / 17

Page 144: The mind is a neural computer, tted by natural selection

Summing out variables

We can sum out a variable, say X1 with range v1, . . . , vk, fromfactor f (X1, . . . ,Xj), resulting in a factor on X2, . . . ,Xj defined by:

(∑X1

f )(X2, . . . ,Xj)

= f (X1 = v1, . . . ,Xj) + · · ·+ f (X1 = vk , . . . ,Xj)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 26 6 / 17

Page 145: The mind is a neural computer, tted by natural selection

Summing out a variable example

f3:

A B C val

t t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

∑B f3:

A C val

t t 0.57t f

0.43

f t

0.54

f f

0.46

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 27 7 / 17

Page 146: The mind is a neural computer, tted by natural selection

Summing out a variable example

f3:

A B C val

t t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

∑B f3:

A C val

t t 0.57t f 0.43f t

0.54

f f

0.46

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 28 7 / 17

Page 147: The mind is a neural computer, tted by natural selection

Summing out a variable example

f3:

A B C val

t t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

∑B f3:

A C val

t t 0.57t f 0.43f t 0.54f f

0.46

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 29 7 / 17

Page 148: The mind is a neural computer, tted by natural selection

Summing out a variable example

f3:

A B C val

t t t 0.03t t f 0.07t f t 0.54t f f 0.36f t t 0.06f t f 0.14f f t 0.48f f f 0.32

∑B f3:

A C val

t t 0.57t f 0.43f t 0.54f f 0.46

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 30 7 / 17

Page 149: The mind is a neural computer, tted by natural selection

Exercise

Given factors:

s:

A val

t 0.75f 0.25

t:

A B val

t t 0.6t f 0.4f t 0.2f f 0.8

o:

A val

t 0.3f 0.1

What are the following factors over?

(a) s ∗ t(b)

∑A s ∗ t

(c)∑

B s ∗ t(d)

∑A

∑B s ∗ t

(e) s ∗ t ∗ o(f)

∑B s ∗ t ∗ o

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 31 8 / 17

Page 150: The mind is a neural computer, tted by natural selection

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj)

=

P(Z ,Y1 = v1, . . . ,Yj = vj)

P(Y1 = v1, . . . ,Yj = vj)

=P(Z ,Y1 = v1, . . . ,Yj = vj)∑Z P(Z ,Y1 = v1, . . . ,Yj = vj).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj).

Normalize at the end, by summing out Z and dividing.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 32 9 / 17

Page 151: The mind is a neural computer, tted by natural selection

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj)

=P(Z ,Y1 = v1, . . . ,Yj = vj)

P(Y1 = v1, . . . ,Yj = vj)

=

P(Z ,Y1 = v1, . . . ,Yj = vj)∑Z P(Z ,Y1 = v1, . . . ,Yj = vj).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj).

Normalize at the end, by summing out Z and dividing.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 33 9 / 17

Page 152: The mind is a neural computer, tted by natural selection

Evidence

If we want to compute the posterior probability of Z givenevidence Y1 = v1 ∧ . . . ∧ Yj = vj :

P(Z |Y1 = v1, . . . ,Yj = vj)

=P(Z ,Y1 = v1, . . . ,Yj = vj)

P(Y1 = v1, . . . ,Yj = vj)

=P(Z ,Y1 = v1, . . . ,Yj = vj)∑Z P(Z ,Y1 = v1, . . . ,Yj = vj).

So the computation reduces to the probability ofP(Z ,Y1 = v1, . . . ,Yj = vj).

Normalize at the end, by summing out Z and dividing.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 34 9 / 17

Page 153: The mind is a neural computer, tted by natural selection

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj)

=

∑Zk

· · ·∑Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj .

=∑Zk

· · ·∑Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 35 10 / 17

Page 154: The mind is a neural computer, tted by natural selection

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj)

=∑Zk

· · ·∑Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj .

=

∑Zk

· · ·∑Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 36 10 / 17

Page 155: The mind is a neural computer, tted by natural selection

Probability of a conjunction

Suppose the variables of the belief network are X1, . . . ,Xn.To compute P(Z ,Y1 = v1, . . . ,Yj = vj), we sum out the othervariables, Z1, . . . ,Zk = X1, . . . ,Xn − Z − Y1, . . . ,Yj.We order the Zi into an elimination ordering.

P(Z ,Y1 = v1, . . . ,Yj = vj)

=∑Zk

· · ·∑Z1

P(X1, . . . ,Xn)Y1 = v1,...,Yj = vj .

=∑Zk

· · ·∑Z1

n∏i=1

P(Xi |parents(Xi ))Y1 = v1,...,Yj = vj .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 37 10 / 17

Page 156: The mind is a neural computer, tted by natural selection

Computing sums of products

Computation in belief networks reduces to computing the sums ofproducts.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi )) efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 38 11 / 17

Page 157: The mind is a neural computer, tted by natural selection

Computing sums of products

Computation in belief networks reduces to computing the sums ofproducts.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi )) efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 39 11 / 17

Page 158: The mind is a neural computer, tted by natural selection

Computing sums of products

Computation in belief networks reduces to computing the sums ofproducts.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi )) efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 40 11 / 17

Page 159: The mind is a neural computer, tted by natural selection

Computing sums of products

Computation in belief networks reduces to computing the sums ofproducts.

How can we compute ab + ac efficiently?

Distribute out the a giving a(b + c)

How can we compute∑

Z1

∏ni=1 P(Xi |parents(Xi )) efficiently?

Distribute out those factors that don’t involve Z1.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 41 11 / 17

Page 160: The mind is a neural computer, tted by natural selection

Variable elimination algorithm

To compute P(Z |Y1 = v1 ∧ . . . ∧ Yj = vj):

Construct a factor for each conditional probability.

Set the observed variables to their observed values.

Sum out each of the other variables (the Z1, . . . ,Zk)according to some elimination ordering.

Multiply the remaining factors. Normalize by dividing theresulting factor f (Z ) by

∑Z f (Z ).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 42 12 / 17

Page 161: The mind is a neural computer, tted by natural selection

Summing out a variable

To sum out a variable Zj from a product f1, . . . , fk of factors:

Partition the factors intoI those that don’t contain Zj , say f1, . . . , fi ,I those that contain Zj , say fi+1, . . . , fk

We know:

∑Zj

f1∗ · · · ∗fk = f1∗ · · · ∗fi∗

∑Zj

fi+1∗ · · · ∗fk

.

Explicitly construct a representation of the rightmost factor.Replace the factors fi+1, . . . , fk by the new factor.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 43 13 / 17

Page 162: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =

P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 44 14 / 17

Page 163: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 45 14 / 17

Page 164: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 46 14 / 17

Page 165: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 47 14 / 17

Page 166: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)

(∑D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 48 14 / 17

Page 167: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 49 14 / 17

Page 168: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)

∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 50 14 / 17

Page 169: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)

∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 51 14 / 17

Page 170: The mind is a neural computer, tted by natural selection

Example

A C

B D

E

F G

P(E | g) =P(E ∧ g)∑E P(E ∧ g)

P(E ∧ g)

=∑F

∑B

∑C

∑A

∑D

P(A)P(B | AC )

P(C )P(D | C )P(E | B)P(F | E )P(g | ED)

=

(∑F

P(F | E )

)∑B

P(E | B)∑C

(P(C )

(∑A

P(A)P(B | AC )

)(∑

D

P(D | C )P(g | ED)

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 52 14 / 17

Page 171: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→

f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G )

obs H−→ f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 53 15 / 17

Page 172: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G )

obs H−→ f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 54 15 / 17

Page 173: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→

f2(BDE )

P(F |D)P(G |FE )

P(H|G )

obs H−→ f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 55 15 / 17

Page 174: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G )

obs H−→ f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 56 15 / 17

Page 175: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G ) obs H−→

f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 57 15 / 17

Page 176: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G ) obs H−→ f3(G )

P(I |G )

elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 58 15 / 17

Page 177: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G ) obs H−→ f3(G )

P(I |G ) elim I−→

f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 59 15 / 17

Page 178: The mind is a neural computer, tted by natural selection

Variable elimination example

A

BC

D

E

F

G

H I

P(A)P(B|A)

elim A−→ f1(B)

P(C )P(D|BC )P(E |C )

elim C−→ f2(BDE )

P(F |D)P(G |FE )

P(H|G ) obs H−→ f3(G )

P(I |G ) elim I−→ f4(G )

P(D, h) = ...(∑

A P(A)P(B|A))(∑

I P(I |G ))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 60 15 / 17

Page 179: The mind is a neural computer, tted by natural selection

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H,E ,D,B,C

P(G |f ) ∝

∑C

∑B

∑D

∑E

∑H

∑A

P(A)P(B|A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H|E )

=∑C

(∑B

(∑A

P(A)P(B|A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑H

P(H|E )

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 61 16 / 17

Page 180: The mind is a neural computer, tted by natural selection

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H,E ,D,B,C

P(G |f ) ∝∑C

∑B

∑D

∑E

∑H

∑A

P(A)P(B|A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H|E )

=∑C

(∑B

(∑A

P(A)P(B|A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑H

P(H|E )

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 62 16 / 17

Page 181: The mind is a neural computer, tted by natural selection

Variable Elimination example

A B C D E F

G H

Query: P(G |f ); elimination ordering: A,H,E ,D,B,C

P(G |f ) ∝∑C

∑B

∑D

∑E

∑H

∑A

P(A)P(B|A)P(C |B)

P(D|C )P(E |D)P(f |E )P(G |C )P(H|E )

=∑C

(∑B

(∑A

P(A)P(B|A)

)P(C |B)

)P(G |C )(∑

D

P(D|C )

(∑E

P(E |D)P(f |E )∑H

P(H|E )

))

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 63 16 / 17

Page 182: The mind is a neural computer, tted by natural selection

Pruning Irrelevant Variables (Belief networks)

Suppose you want to compute P(X | e1 . . . ek):

Prune any variables that have no observed or querieddescendents.

Connect the parents of any observed variable.

Remove arc directions.

Remove observed variables.

Remove any variables not connected to X in the resulting(undirected) graph.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.4, Page 64 17 / 17

Page 183: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 1 1 / 23

Page 184: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 2 1 / 23

Page 185: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 3 1 / 23

Page 186: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 4 1 / 23

Page 187: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 5 1 / 23

Page 188: The mind is a neural computer, tted by natural selection

Markov chain

A Markov chain is a special sort of belief network:

S0 S1 S2 S3 S4

What probabilities need to be specified?

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

What independence assumptions are made?

P(Si+1|S0, . . . , Si) = P(Si+1|Si).

Often St represents the state at time t. Intuitively St

conveys all of the information about the history that canaffect the future states.

“The future is independent of the past given the present.”c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 6 1 / 23

Page 189: The mind is a neural computer, tted by natural selection

Stationary Markov chain

A stationary Markov chain is when for all i > 0, i ′ > 0,P(Si+1|Si) = P(Si ′+1|Si ′).

We specify P(S0) and P(Si+1|Si).I Simple model, easy to specifyI Often the natural modelI The network can extend indefinitely

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 7 2 / 23

Page 190: The mind is a neural computer, tted by natural selection

Stationary Distribution

A distribution over states, P is a stationary distribution iffor each state s, P(Si+1=s) = P(Si=s).

Every Markov chain has a stationary distribution.

A Markov chain is ergodic if, for any two states s1 and s2,there is a non-zero probability of eventually reaching s2from s1.

A Markov chain is periodic if there is a strict temporalregularity in visiting states. A state is only visited divisibleat time t if t mod n = m for some n,m.

An ergodic and aperiodic Markov chain has a uniquestationary distribution P andP(s) = limi→∞ Pi(s) — equilibrium distribution

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 8 3 / 23

Page 191: The mind is a neural computer, tted by natural selection

Stationary Distribution

A distribution over states, P is a stationary distribution iffor each state s, P(Si+1=s) = P(Si=s).

Every Markov chain has a stationary distribution.

A Markov chain is ergodic if, for any two states s1 and s2,there is a non-zero probability of eventually reaching s2from s1.

A Markov chain is periodic if there is a strict temporalregularity in visiting states. A state is only visited divisibleat time t if t mod n = m for some n,m.

An ergodic and aperiodic Markov chain has a uniquestationary distribution P andP(s) = limi→∞ Pi(s) — equilibrium distribution

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 9 3 / 23

Page 192: The mind is a neural computer, tted by natural selection

Stationary Distribution

A distribution over states, P is a stationary distribution iffor each state s, P(Si+1=s) = P(Si=s).

Every Markov chain has a stationary distribution.

A Markov chain is ergodic if, for any two states s1 and s2,there is a non-zero probability of eventually reaching s2from s1.

A Markov chain is periodic if there is a strict temporalregularity in visiting states. A state is only visited divisibleat time t if t mod n = m for some n,m.

An ergodic and aperiodic Markov chain has a uniquestationary distribution P andP(s) = limi→∞ Pi(s) — equilibrium distribution

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 10 3 / 23

Page 193: The mind is a neural computer, tted by natural selection

Stationary Distribution

A distribution over states, P is a stationary distribution iffor each state s, P(Si+1=s) = P(Si=s).

Every Markov chain has a stationary distribution.

A Markov chain is ergodic if, for any two states s1 and s2,there is a non-zero probability of eventually reaching s2from s1.

A Markov chain is periodic if there is a strict temporalregularity in visiting states. A state is only visited divisibleat time t if t mod n = m for some n,m.

An ergodic and aperiodic Markov chain has a uniquestationary distribution P andP(s) = limi→∞ Pi(s) — equilibrium distribution

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 11 3 / 23

Page 194: The mind is a neural computer, tted by natural selection

Stationary Distribution

A distribution over states, P is a stationary distribution iffor each state s, P(Si+1=s) = P(Si=s).

Every Markov chain has a stationary distribution.

A Markov chain is ergodic if, for any two states s1 and s2,there is a non-zero probability of eventually reaching s2from s1.

A Markov chain is periodic if there is a strict temporalregularity in visiting states. A state is only visited divisibleat time t if t mod n = m for some n,m.

An ergodic and aperiodic Markov chain has a uniquestationary distribution P andP(s) = limi→∞ Pi(s) — equilibrium distribution

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 12 3 / 23

Page 195: The mind is a neural computer, tted by natural selection

Pagerank

Consider the Markov chain:

Domain of Si is the set of all web pages

P(S0) is uniform; P(S0 = pj) = 1/N

P(Si+1 = pj | Si = pk)

= (1− d)/N + d ∗

1/nk if pk links to pj1/N if pk has no links0 otherwise

where there are N web pages and nk links from page pk

d ≈ 0.85 is the probability someone keeps surfing web

This Markov chain converges to a distribution over webpages (original P(Si) for i = 52 for 322 million links):Pagerank - basis for Google’s initial search engine

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 13 4 / 23

Page 196: The mind is a neural computer, tted by natural selection

Pagerank

Consider the Markov chain:

Domain of Si is the set of all web pages

P(S0) is uniform; P(S0 = pj) = 1/N

P(Si+1 = pj | Si = pk)

= (1− d)/N + d ∗

1/nk if pk links to pj1/N if pk has no links0 otherwise

where there are N web pages and nk links from page pk

d ≈ 0.85 is the probability someone keeps surfing web

This Markov chain converges to a distribution over webpages (original P(Si) for i = 52 for 322 million links):Pagerank - basis for Google’s initial search engine

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 14 4 / 23

Page 197: The mind is a neural computer, tted by natural selection

Hidden Markov Model

A Hidden Markov Model (HMM) is a belief network:

S0 S1 S2 S3 S4

O0 O1 O2 O3 O4

The probabilities that need to be specified:

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

P(Oi |Si) specifies the sensor model

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 15 5 / 23

Page 198: The mind is a neural computer, tted by natural selection

Hidden Markov Model

A Hidden Markov Model (HMM) is a belief network:

S0 S1 S2 S3 S4

O0 O1 O2 O3 O4

The probabilities that need to be specified:

P(S0) specifies initial conditions

P(Si+1|Si) specifies the dynamics

P(Oi |Si) specifies the sensor model

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 16 5 / 23

Page 199: The mind is a neural computer, tted by natural selection

Filtering

Filtering:

P(Si |o1, . . . , oi)

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi) ∝ P(oi |Sio1, . . . , oi−1)P(Si |o1, . . . , oi−1)

=???∑Si−1

P(SiSi−1|o1, . . . , oi−1)

=???

Observe O0, query S0.then observe O1, query S1.then observe O2, query S2.. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 17 6 / 23

Page 200: The mind is a neural computer, tted by natural selection

Filtering

Filtering:

P(Si |o1, . . . , oi)

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi) ∝ P(oi |Sio1, . . . , oi−1)P(Si |o1, . . . , oi−1)

=???∑Si−1

P(SiSi−1|o1, . . . , oi−1)

=???

Observe O0, query S0.then observe O1, query S1.then observe O2, query S2.. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 18 6 / 23

Page 201: The mind is a neural computer, tted by natural selection

Filtering

Filtering:

P(Si |o1, . . . , oi)

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi) ∝ P(oi |Sio1, . . . , oi−1)P(Si |o1, . . . , oi−1)

=???∑Si−1

P(SiSi−1|o1, . . . , oi−1)

=???

Observe O0, query S0.

then observe O1, query S1.then observe O2, query S2.. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 19 6 / 23

Page 202: The mind is a neural computer, tted by natural selection

Filtering

Filtering:

P(Si |o1, . . . , oi)

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi) ∝ P(oi |Sio1, . . . , oi−1)P(Si |o1, . . . , oi−1)

=???∑Si−1

P(SiSi−1|o1, . . . , oi−1)

=???

Observe O0, query S0.then observe O1, query S1.

then observe O2, query S2.. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 20 6 / 23

Page 203: The mind is a neural computer, tted by natural selection

Filtering

Filtering:

P(Si |o1, . . . , oi)

What is the current belief state based on the observationhistory?

P(Si |o1, . . . , oi) ∝ P(oi |Sio1, . . . , oi−1)P(Si |o1, . . . , oi−1)

=???∑Si−1

P(SiSi−1|o1, . . . , oi−1)

=???

Observe O0, query S0.then observe O1, query S1.then observe O2, query S2.. . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 21 6 / 23

Page 204: The mind is a neural computer, tted by natural selection

Example: localization

Suppose a robot wants to determine its location based onits actions and its sensor readings: Localization

This can be represented by the augmented HMM:

Loc0 Loc1 Loc2 Loc3 Loc4

Obs0 Obs1 Obs2 Obs3 Obs4

Act0 Act1 Act2 Act3

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 22 7 / 23

Page 205: The mind is a neural computer, tted by natural selection

Example localization domain

Circular corridor, with 16 locations:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Doors at positions: 2, 4, 7, 11.

Noisy Sensors

Stochastic Dynamics

Robot starts at an unknown location and must determinewhere it is.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 23 8 / 23

Page 206: The mind is a neural computer, tted by natural selection

Example Sensor Model

P(Observe Door | At Door) = 0.8

P(Observe Door | Not At Door) = 0.1

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 24 9 / 23

Page 207: The mind is a neural computer, tted by natural selection

Example Dynamics Model

P(loct+1 = L|actiont = goRight ∧ loct = L) = 0.1

P(loct+1 = L + 1|actiont = goRight ∧ loct = L) = 0.8

P(loct+1 = L + 2|actiont = goRight ∧ loct = L) = 0.074

P(loct+1 = L′|actiont = goRight ∧ loct = L) = 0.002 forany other location L′.

I All location arithmetic is modulo 16.I The action goLeft works the same but to the left.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 25 10 / 23

Page 208: The mind is a neural computer, tted by natural selection

Combining sensor information

Example: we can combine information from a light sensorand the door sensor Sensor Fusion

Loc0 Loc1 Loc2 Loc3 Loc4

D0 D1 D2 D3 D4

Act0 Act1 Act2 Act3

L0 L1 L2 L3 L4

St robot location at time tDt door sensor value at time tLt light sensor value at time t

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 26 11 / 23

Page 209: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.

What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query: condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 27 12 / 23

Page 210: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query: condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 28 12 / 23

Page 211: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query: condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 29 12 / 23

Page 212: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query: condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 30 12 / 23

Page 213: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query:

condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 31 12 / 23

Page 214: The mind is a neural computer, tted by natural selection

Naive Bayes Classifier: User’s request for help

H

"able" "absent" "add" "zoom". . .

H is the help page the user is interested in.What probabilities are required?

P(hi) for each help page hi . The user is interested in onebest web page, so

∑i P(hi) = 1.

P(wj | hi) for each word wj given page hi . There can bemultiple words used in a query.

Given a help query: condition on the words in the queryand display the most likely help page.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 32 12 / 23

Page 215: The mind is a neural computer, tted by natural selection

Simple Language Models: set-of-words

Sentence: w1,w2,w3, . . . .Set-of-words model:

“aardvark” “zzz”“a” ...

Each variable is Boolean: true when word is in thesentence and false otherwise.

What probabilities are provided?I P(”a”), P(”aardvark”), . . . , P(”zzz”)

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 33 13 / 23

Page 216: The mind is a neural computer, tted by natural selection

Simple Language Models: set-of-words

Sentence: w1,w2,w3, . . . .Set-of-words model:

“aardvark” “zzz”“a” ...

Each variable is Boolean: true when word is in thesentence and false otherwise.

What probabilities are provided?

I P(”a”), P(”aardvark”), . . . , P(”zzz”)

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 34 13 / 23

Page 217: The mind is a neural computer, tted by natural selection

Simple Language Models: set-of-words

Sentence: w1,w2,w3, . . . .Set-of-words model:

“aardvark” “zzz”“a” ...

Each variable is Boolean: true when word is in thesentence and false otherwise.

What probabilities are provided?I P(”a”), P(”aardvark”), . . . , P(”zzz”)

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 35 13 / 23

Page 218: The mind is a neural computer, tted by natural selection

Simple Language Models: set-of-words

Sentence: w1,w2,w3, . . . .Set-of-words model:

“aardvark” “zzz”“a” ...

Each variable is Boolean: true when word is in thesentence and false otherwise.

What probabilities are provided?I P(”a”), P(”aardvark”), . . . , P(”zzz”)

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 36 13 / 23

Page 219: The mind is a neural computer, tted by natural selection

Simple Language Models: bag-of-words

Sentence: w1,w2,w3, . . . ,wn.Bag-of-words or unigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi ) is a distribution over words for each position

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 37 14 / 23

Page 220: The mind is a neural computer, tted by natural selection

Simple Language Models: bag-of-words

Sentence: w1,w2,w3, . . . ,wn.Bag-of-words or unigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?

I P(wi ) is a distribution over words for each position

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 38 14 / 23

Page 221: The mind is a neural computer, tted by natural selection

Simple Language Models: bag-of-words

Sentence: w1,w2,w3, . . . ,wn.Bag-of-words or unigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi ) is a distribution over words for each position

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 39 14 / 23

Page 222: The mind is a neural computer, tted by natural selection

Simple Language Models: bag-of-words

Sentence: w1,w2,w3, . . . ,wn.Bag-of-words or unigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi ) is a distribution over words for each position

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 40 14 / 23

Page 223: The mind is a neural computer, tted by natural selection

Simple Language Models: bigram

Sentence: w1,w2,w3, . . . ,wn.bigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi |wi−1) is a distribution over words for each position

given the previous word

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 41 15 / 23

Page 224: The mind is a neural computer, tted by natural selection

Simple Language Models: bigram

Sentence: w1,w2,w3, . . . ,wn.bigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?

I P(wi |wi−1) is a distribution over words for each positiongiven the previous word

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 42 15 / 23

Page 225: The mind is a neural computer, tted by natural selection

Simple Language Models: bigram

Sentence: w1,w2,w3, . . . ,wn.bigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi |wi−1) is a distribution over words for each position

given the previous word

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 43 15 / 23

Page 226: The mind is a neural computer, tted by natural selection

Simple Language Models: bigram

Sentence: w1,w2,w3, . . . ,wn.bigram:

W2 ...W3 WnW1

Domain of each variable is the set of all words.

What probabilities are provided?I P(wi |wi−1) is a distribution over words for each position

given the previous word

How do we condition on the question “how can I phonemy phone”?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 44 15 / 23

Page 227: The mind is a neural computer, tted by natural selection

Simple Language Models: trigram

Sentence: w1,w2,w3, . . . ,wn.trigram:

W2 ...W3 WnW1 W4

Domain of each variable is the set of all words.

What probabilities are provided?

P(wi |wi−1,wi−2)

N-gram

P(wi |wi−1, . . .wi−n+1) is a distribution over words giventhe previous n − 1 words

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 45 16 / 23

Page 228: The mind is a neural computer, tted by natural selection

Simple Language Models: trigram

Sentence: w1,w2,w3, . . . ,wn.trigram:

W2 ...W3 WnW1 W4

Domain of each variable is the set of all words.What probabilities are provided?

P(wi |wi−1,wi−2)

N-gram

P(wi |wi−1, . . .wi−n+1) is a distribution over words giventhe previous n − 1 words

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 46 16 / 23

Page 229: The mind is a neural computer, tted by natural selection

Simple Language Models: trigram

Sentence: w1,w2,w3, . . . ,wn.trigram:

W2 ...W3 WnW1 W4

Domain of each variable is the set of all words.What probabilities are provided?

P(wi |wi−1,wi−2)

N-gram

P(wi |wi−1, . . .wi−n+1) is a distribution over words giventhe previous n − 1 words

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 47 16 / 23

Page 230: The mind is a neural computer, tted by natural selection

Logic, Probability, Statistics, Ontology over time

From: Google Books Ngram Viewer(https://books.google.com/ngrams)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 48 17 / 23

Page 231: The mind is a neural computer, tted by natural selection

Predictive Typing and Error Correction

W2 ...W1

L11 Lk1L21 ... L12 Lk2L22 ...

W3

L13 Lk3L23 ...

domain(Wi) = ”a”, ”aarvark”, . . . , ”zzz”, ”⊥”, ”?”domain(Lji) = ”a”, ”b”, ”c”, . . . , ”z”, ”1”, ”2”, . . .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 49 18 / 23

Page 232: The mind is a neural computer, tted by natural selection

Beyond N-grams

A man with a big hairy cat drank the cold milk.

Who or what drank the milk?

Simple syntax diagram:s

np vp

pp

np

npa man

with

a big hairy cat

drank

the cold milk

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 50 19 / 23

Page 233: The mind is a neural computer, tted by natural selection

Beyond N-grams

A man with a big hairy cat drank the cold milk.

Who or what drank the milk?

Simple syntax diagram:s

np vp

pp

np

npa man

with

a big hairy cat

drank

the cold milk

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 51 19 / 23

Page 234: The mind is a neural computer, tted by natural selection

Topic Model

tools food topics

"nut" "tuna""bolt" words

fish

“shark”“salmon”

finance

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 52 20 / 23

Page 235: The mind is a neural computer, tted by natural selection

Topic Model

tools food topics

"nut" "tuna""bolt" words

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 53 21 / 23

Page 236: The mind is a neural computer, tted by natural selection

Google’s rephil

900,000 topics

"Aaron's beard" "zzz""aardvark" 12,000,000 words...

350,000,000 links

...

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 54 22 / 23

Page 237: The mind is a neural computer, tted by natural selection

Deep Belief Networks

...

...

...

...

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.5, Page 55 23 / 23

Page 238: The mind is a neural computer, tted by natural selection

Stochastic Simulation

Idea: probabilities ↔ samples

Get probabilities from samples:

X count

x1 n1...

...xk nk

total m

X probability

x1 n1/m...

...xk nk/m

If we could sample from a variable’s (posterior)probability, we could estimate its (posterior) probability.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 1 1 / 15

Page 239: The mind is a neural computer, tted by natural selection

Generating samples from a distribution

For a variable X with a discrete domain or a (one-dimensional)real domain:

Totally order the values of the domain of X .

Generate the cumulative probability distribution:f (x) = P(X ≤ x).

Select a value y uniformly in the range [0, 1].

Select the x such that f (x) = y .

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 2 2 / 15

Page 240: The mind is a neural computer, tted by natural selection

Cumulative Distribution

v1 v2 v3 v4 v1 v2 v3 v4

P(X)

f(X)

0

1

0

1

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 3 3 / 15

Page 241: The mind is a neural computer, tted by natural selection

Hoeffding’s inequality

Theorem (Hoeffding): Suppose p is the true probability, and sis the sample average from n independent samples; then

P(|s − p| > ε) ≤ 2e−2nε2

.

Guarantees a probably approximately correct estimate ofprobability.

If you are willing to have an error greater than ε in δ of thecases, solve 2e−2nε

2< δ for n, which gives

n >− ln δ

2

2ε2.

ε δ n0.1 0.05 1850.01 0.05 18,4450.1 0.01 265

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 4 4 / 15

Page 242: The mind is a neural computer, tted by natural selection

Hoeffding’s inequality

Theorem (Hoeffding): Suppose p is the true probability, and sis the sample average from n independent samples; then

P(|s − p| > ε) ≤ 2e−2nε2

.

Guarantees a probably approximately correct estimate ofprobability.If you are willing to have an error greater than ε in δ of thecases, solve 2e−2nε

2< δ for n, which gives

n >− ln δ

2

2ε2.

ε δ n0.1 0.05 1850.01 0.05 18,4450.1 0.01 265

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 5 4 / 15

Page 243: The mind is a neural computer, tted by natural selection

Hoeffding’s inequality

Theorem (Hoeffding): Suppose p is the true probability, and sis the sample average from n independent samples; then

P(|s − p| > ε) ≤ 2e−2nε2

.

Guarantees a probably approximately correct estimate ofprobability.If you are willing to have an error greater than ε in δ of thecases, solve 2e−2nε

2< δ for n, which gives

n >− ln δ

2

2ε2.

ε δ n0.1 0.05 1850.01 0.05 18,4450.1 0.01 265

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 6 4 / 15

Page 244: The mind is a neural computer, tted by natural selection

Forward sampling in a belief network

Sample the variables one at a time; sample parents of Xbefore sampling X .

Given values for the parents of X , sample from theprobability of X given its parents.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 7 5 / 15

Page 245: The mind is a neural computer, tted by natural selection

Rejection Sampling

To estimate a posterior probability given evidenceY1 = v1 ∧ . . . ∧ Yj = vj :

Reject any sample that assigns Yi to a value other thanvi .

The non-rejected samples are distributed according to theposterior probability:

P(α | evidence) ≈∑

sample|=α 1∑sample 1

where we consider only samples consistent with evidence.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 8 6 / 15

Page 246: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false

8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 9 7 / 15

Page 247: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true

4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 10 7 / 15

Page 248: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false

— — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 11 7 / 15

Page 249: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true

4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 12 7 / 15

Page 250: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false

— — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 13 7 / 15

Page 251: The mind is a neural computer, tted by natural selection

Rejection Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Observe Sm = true,Re = true

Ta Fi Al Sm Le Res1 false true false true false false 8

s2 false true true true true true 4

s3 true false true false — — 8

s4 true true true true true true 4

. . .s1000 false false false false — — 8

P(sm) = 0.02P(re | sm) = 0.32How many samples are rejected?How many samples are used?

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 14 7 / 15

Page 252: The mind is a neural computer, tted by natural selection

Importance Sampling

Samples have weights: a real number associated witheach sample that takes the evidence into account.

Probability of a proposition is weighted average ofsamples:

P(α | evidence) ≈∑

sample|=α weight(sample)∑sample weight(sample)

Mix exact inference with sampling: don’t sample all ofthe variables, but weight each sample according toP(evidence | sample).

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 15 8 / 15

Page 253: The mind is a neural computer, tted by natural selection

Importance Sampling (Likelihood Weighting)

procedure likelihood weighting(Bn, e,Q, n):ans[1 : k] := 0 where k is size of dom(Q)repeat n times:

weight := 1for each variable Xi in order:

if Xi = oi is observedweight := weight × P(Xi = oi | parents(Xi))

else assign Xi a random sample of P(Xi | parents(Xi))if Q has value v :

ans[v ] := ans[v ] + weightreturn ans/

∑v ans[v ]

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 16 9 / 15

Page 254: The mind is a neural computer, tted by natural selection

Importance Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Ta Fi Al Le Weights1 true false true false

0.01× 0.01

s2 false true false false

0.9× 0.01

s3 false true true true

0.9× 0.75

s4 true true true true

0.9× 0.75

. . .s1000 false false true true

0.01× 0.75

P(sm | fi) = 0.9P(sm | ¬fi) = 0.01P(re | le) = 0.75P(re | ¬le) = 0.01

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 17 10 / 15

Page 255: The mind is a neural computer, tted by natural selection

Importance Sampling Example: P(ta | sm, re)

Ta Fi

SmAl

Le

Re

Ta Fi Al Le Weights1 true false true false 0.01× 0.01s2 false true false false 0.9× 0.01s3 false true true true 0.9× 0.75s4 true true true true 0.9× 0.75. . .s1000 false false true true 0.01× 0.75

P(sm | fi) = 0.9P(sm | ¬fi) = 0.01P(re | le) = 0.75P(re | ¬le) = 0.01

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 18 10 / 15

Page 256: The mind is a neural computer, tted by natural selection

Importance Sampling Example: P(le | sm, ta,¬re)

Ta Fi

SmAl

Le

Re

P(ta) = 0.02P(fi) = 0.01P(al | fi ∧ ta) = 0.5P(al | fi ∧ ¬ta) = 0.99P(al | ¬fi ∧ ta) = 0.85P(al | ¬fi ∧ ¬ta) = 0.0001P(sm | fi) = 0.9P(sm | ¬fi) = 0.01P(le | al) = 0.88P(le | ¬al) = 0.001P(re | le) = 0.75P(re | ¬le) = 0.01

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 19 11 / 15

Page 257: The mind is a neural computer, tted by natural selection

Computing Expectations & Proposal Distributions

Expected value of f with respect to distribution P :

EP(f ) =∑w

f (w) ∗ P(w)

≈ 1

n

∑s

f (s)

s is sampled with probability P . There are n samples.

EP(f ) =∑w

f (w) ∗ P(w)/Q(w) ∗ Q(w)

≈ 1

n

∑s

f (s) ∗ P(s)/Q(s)

s is selected according the distribution Q.The distribution Q is called a proposal distribution.P(c) > 0 then Q(c) > 0.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 20 12 / 15

Page 258: The mind is a neural computer, tted by natural selection

Computing Expectations & Proposal Distributions

Expected value of f with respect to distribution P :

EP(f ) =∑w

f (w) ∗ P(w)

≈ 1

n

∑s

f (s)

s is sampled with probability P . There are n samples.

EP(f ) =∑w

f (w) ∗ P(w)/Q(w) ∗ Q(w)

≈ 1

n

∑s

f (s) ∗ P(s)/Q(s)

s is selected according the distribution Q.The distribution Q is called a proposal distribution.P(c) > 0 then Q(c) > 0.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 21 12 / 15

Page 259: The mind is a neural computer, tted by natural selection

Computing Expectations & Proposal Distributions

Expected value of f with respect to distribution P :

EP(f ) =∑w

f (w) ∗ P(w)

≈ 1

n

∑s

f (s)

s is sampled with probability P . There are n samples.

EP(f ) =∑w

f (w) ∗ P(w)/Q(w) ∗ Q(w)

≈ 1

n

∑s

f (s) ∗ P(s)/Q(s)

s is selected according the distribution Q.The distribution Q is called a proposal distribution.P(c) > 0 then Q(c) > 0.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 22 12 / 15

Page 260: The mind is a neural computer, tted by natural selection

Computing Expectations & Proposal Distributions

Expected value of f with respect to distribution P :

EP(f ) =∑w

f (w) ∗ P(w)

≈ 1

n

∑s

f (s)

s is sampled with probability P . There are n samples.

EP(f ) =∑w

f (w) ∗ P(w)/Q(w) ∗ Q(w)

≈ 1

n

∑s

f (s) ∗ P(s)/Q(s)

s is selected according the distribution Q.The distribution Q is called a proposal distribution.P(c) > 0 then Q(c) > 0.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 23 12 / 15

Page 261: The mind is a neural computer, tted by natural selection

Particle Filtering

Importance sampling can be seen as:

for each particle:for each variable:

sample / absorb evidence / update query

where particle is one of the samples.

Instead we could do:

for each variable:for each particle:

sample / absorb evidence / update query

Why?

We can have a new operation of resampling

It works with infinitely many variables (e.g., HMM)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 24 13 / 15

Page 262: The mind is a neural computer, tted by natural selection

Particle Filtering

Importance sampling can be seen as:

for each particle:for each variable:

sample / absorb evidence / update query

where particle is one of the samples.Instead we could do:

for each variable:for each particle:

sample / absorb evidence / update query

Why?

We can have a new operation of resampling

It works with infinitely many variables (e.g., HMM)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 25 13 / 15

Page 263: The mind is a neural computer, tted by natural selection

Particle Filtering

Importance sampling can be seen as:

for each particle:for each variable:

sample / absorb evidence / update query

where particle is one of the samples.Instead we could do:

for each variable:for each particle:

sample / absorb evidence / update query

Why?

We can have a new operation of resampling

It works with infinitely many variables (e.g., HMM)

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 26 13 / 15

Page 264: The mind is a neural computer, tted by natural selection

Particle Filtering for HMMs

Start with random chosen particles (say 1000)

Each particle represents a history.

Initially, sample states in proportion to their probability.

Repeat:I Absorb evidence: weight each particle by the probability

of the evidence given the state of the particle.I Resample: select each particle at random, in proportion

to the weight of the particle.Some particles may be duplicated, some may beremoved. All new particles have same weight.

I Transition: sample the next state for each particleaccording to the transition probabilities.

To answer a query about the current state, use the set ofparticles as data.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 27 14 / 15

Page 265: The mind is a neural computer, tted by natural selection

Markov Chain Monte Carlo

To sample from a distribution P :

Create (ergodic and aperiodic) Markov chain with P asequilibrium distribution.Let T (Si+1 | Si) be the transition probability.Given state s, sample state s ′ from T (S | s)

After a while, the states sampled will be distributedaccording to P .Ignore the first samples “burn-in”— use the remaining samples.Samples are not independent of each other“autocorrelation”.Sometimes use subset (e.g., 1/1000) of them “thinning”Gibbs sampler: sample each non-observed variable fromthe distribution of the variable given the current (orobserved) value of the variables in its Markov blanket.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 28 15 / 15

Page 266: The mind is a neural computer, tted by natural selection

Markov Chain Monte Carlo

To sample from a distribution P :

Create (ergodic and aperiodic) Markov chain with P asequilibrium distribution.Let T (Si+1 | Si) be the transition probability.Given state s, sample state s ′ from T (S | s)After a while, the states sampled will be distributedaccording to P .

Ignore the first samples “burn-in”— use the remaining samples.Samples are not independent of each other“autocorrelation”.Sometimes use subset (e.g., 1/1000) of them “thinning”Gibbs sampler: sample each non-observed variable fromthe distribution of the variable given the current (orobserved) value of the variables in its Markov blanket.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 29 15 / 15

Page 267: The mind is a neural computer, tted by natural selection

Markov Chain Monte Carlo

To sample from a distribution P :

Create (ergodic and aperiodic) Markov chain with P asequilibrium distribution.Let T (Si+1 | Si) be the transition probability.Given state s, sample state s ′ from T (S | s)After a while, the states sampled will be distributedaccording to P .Ignore the first samples “burn-in”— use the remaining samples.Samples are not independent of each other“autocorrelation”.Sometimes use subset (e.g., 1/1000) of them “thinning”

Gibbs sampler: sample each non-observed variable fromthe distribution of the variable given the current (orobserved) value of the variables in its Markov blanket.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 30 15 / 15

Page 268: The mind is a neural computer, tted by natural selection

Markov Chain Monte Carlo

To sample from a distribution P :

Create (ergodic and aperiodic) Markov chain with P asequilibrium distribution.Let T (Si+1 | Si) be the transition probability.Given state s, sample state s ′ from T (S | s)After a while, the states sampled will be distributedaccording to P .Ignore the first samples “burn-in”— use the remaining samples.Samples are not independent of each other“autocorrelation”.Sometimes use subset (e.g., 1/1000) of them “thinning”Gibbs sampler: sample each non-observed variable fromthe distribution of the variable given the current (orobserved) value of the variables in its Markov blanket.

c©D. Poole and A. Mackworth 2017 Artificial Intelligence, Lecture 8.6, Page 31 15 / 15