cs 416 artificial intelligence lecture 14 uncertainty chapters 13 and 14 lecture 14 uncertainty...

35
CS 416 Artificial Intelligence Lecture 14 Lecture 14 Uncertainty Uncertainty Chapters 13 and 14 Chapters 13 and 14

Upload: dwayne-malone

Post on 13-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

CS 416Artificial Intelligence

Lecture 14Lecture 14

UncertaintyUncertainty

Chapters 13 and 14Chapters 13 and 14

Lecture 14Lecture 14

UncertaintyUncertainty

Chapters 13 and 14Chapters 13 and 14

Page 2: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

TA Office Hours

Chris cannot attend today’s office hoursChris cannot attend today’s office hours

He will be available Wed, 3:30 – 4:30He will be available Wed, 3:30 – 4:30

Chris cannot attend today’s office hoursChris cannot attend today’s office hours

He will be available Wed, 3:30 – 4:30He will be available Wed, 3:30 – 4:30

Page 3: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional probability

The probability of a given all we know is bThe probability of a given all we know is b

• P (a | b)P (a | b)

Written as an unconditional probabilityWritten as an unconditional probability

The probability of a given all we know is bThe probability of a given all we know is b

• P (a | b)P (a | b)

Written as an unconditional probabilityWritten as an unconditional probability

Page 4: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditioning

A distribution over Y can be obtained by summing A distribution over Y can be obtained by summing out all the other variables from any joint out all the other variables from any joint distribution containing Ydistribution containing Y

P(Y) = SUM P(Y|z) P(z)P(Y) = SUM P(Y|z) P(z)

A distribution over Y can be obtained by summing A distribution over Y can be obtained by summing out all the other variables from any joint out all the other variables from any joint distribution containing Ydistribution containing Y

P(Y) = SUM P(Y|z) P(z)P(Y) = SUM P(Y|z) P(z)

Page 5: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Independence

Independence of variables in a domain can Independence of variables in a domain can dramatically reduce the amount of information dramatically reduce the amount of information necessary to specify the full joint distributionnecessary to specify the full joint distribution

Independence of variables in a domain can Independence of variables in a domain can dramatically reduce the amount of information dramatically reduce the amount of information necessary to specify the full joint distributionnecessary to specify the full joint distribution

Page 6: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Bayes’ Rule

Page 7: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional independence

In general, when a single cause influences In general, when a single cause influences multiple effects, all of which are conditionally multiple effects, all of which are conditionally independent (given the cause)independent (given the cause)

In general, when a single cause influences In general, when a single cause influences multiple effects, all of which are conditionally multiple effects, all of which are conditionally independent (given the cause)independent (given the cause)

2n+1

2*n*(22)= 8n

Assumingbinaryvariables

Page 8: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Wumpus

Are there pits in (1,3) (2,2) (3,1)Are there pits in (1,3) (2,2) (3,1)given breezes in (1,2) and (2,1)?given breezes in (1,2) and (2,1)?

One way to solve…One way to solve…

• Find the full joint distributionFind the full joint distribution

– P (PP (P1,11,1, …, P, …, P4,44,4, B, B1,11,1, B, B1,21,2, B, B2,12,1))

Are there pits in (1,3) (2,2) (3,1)Are there pits in (1,3) (2,2) (3,1)given breezes in (1,2) and (2,1)?given breezes in (1,2) and (2,1)?

One way to solve…One way to solve…

• Find the full joint distributionFind the full joint distribution

– P (PP (P1,11,1, …, P, …, P4,44,4, B, B1,11,1, B, B1,21,2, B, B2,12,1))

Page 9: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Find the full joint distribution• Remember the product ruleRemember the product rule

• P (PP (P1,11,1, …, P, …, P4,44,4, B, B1,11,1, B, B1,21,2, B, B2,12,1) )

• P(BP(B1,11,1, B, B1,21,2, B, B2,1 2,1 | P| P1,11,1, …, P, …, P4,44,4) P(P) P(P1,11,1, …, P, …, P4,44,4))

– Solve this for all P and B valuesSolve this for all P and B values

• Remember the product ruleRemember the product rule

• P (PP (P1,11,1, …, P, …, P4,44,4, B, B1,11,1, B, B1,21,2, B, B2,12,1) )

• P(BP(B1,11,1, B, B1,21,2, B, B2,1 2,1 | P| P1,11,1, …, P, …, P4,44,4) P(P) P(P1,11,1, …, P, …, P4,44,4))

– Solve this for all P and B valuesSolve this for all P and B values

Page 10: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Find the full joint distribution

• P(BP(B1,11,1, B, B1,21,2, B, B2,1 2,1 | P| P1,11,1, …, P, …, P4,44,4) P(P) P(P1,11,1, …, P, …, P4,44,4))

– Givens:Givens:

the rules relating breezes to pitsthe rules relating breezes to pits

each square contains a pit with probability = 0.2each square contains a pit with probability = 0.2

– For any given PFor any given P1,11,1, …, P, …, P4,44,4 setting with n pits setting with n pits

The rules of breezes tells us the value of P (B | P)The rules of breezes tells us the value of P (B | P)

0.20.2nn * 0.8 * 0.8(16-n)(16-n) tells us the value of P(P) tells us the value of P(P)

• P(BP(B1,11,1, B, B1,21,2, B, B2,1 2,1 | P| P1,11,1, …, P, …, P4,44,4) P(P) P(P1,11,1, …, P, …, P4,44,4))

– Givens:Givens:

the rules relating breezes to pitsthe rules relating breezes to pits

each square contains a pit with probability = 0.2each square contains a pit with probability = 0.2

– For any given PFor any given P1,11,1, …, P, …, P4,44,4 setting with n pits setting with n pits

The rules of breezes tells us the value of P (B | P)The rules of breezes tells us the value of P (B | P)

0.20.2nn * 0.8 * 0.8(16-n)(16-n) tells us the value of P(P) tells us the value of P(P)

Page 11: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Solving an instance

We have the following facts:We have the following facts:

Query: P (PQuery: P (P1,31,3 | known, b) | known, b)

We have the following facts:We have the following facts:

Query: P (PQuery: P (P1,31,3 | known, b) | known, b)

P?

Page 12: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Solving an instance

Query: P (PQuery: P (P1,31,3 | known, b) | known, b)Query: P (PQuery: P (P1,31,3 | known, b) | known, b)

Page 13: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Solving: P (P1,3 | known, b)

• We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this

– 221212 = 4096 terms must be summed = 4096 terms must be summed

• We know the full joint probability so we can solve thisWe know the full joint probability so we can solve this

– 221212 = 4096 terms must be summed = 4096 terms must be summed

P?

Page 14: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Solving an instance more quickly

IndependenceIndependence

• The contents of [4,4] don’t affect theThe contents of [4,4] don’t affect thepresence of a pit at [1,3]presence of a pit at [1,3]

• Create Create FringeFringe and and OtherOther

– Fringe = Pitness of cells on fringeFringe = Pitness of cells on fringe

– Other = Pitness of cells in otherOther = Pitness of cells in other

– Breezes are conditionally Breezes are conditionally independent of the independent of the OtherOther variables variables

IndependenceIndependence

• The contents of [4,4] don’t affect theThe contents of [4,4] don’t affect thepresence of a pit at [1,3]presence of a pit at [1,3]

• Create Create FringeFringe and and OtherOther

– Fringe = Pitness of cells on fringeFringe = Pitness of cells on fringe

– Other = Pitness of cells in otherOther = Pitness of cells in other

– Breezes are conditionally Breezes are conditionally independent of the independent of the OtherOther variables variables

FringeOther

Query

Page 15: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Independence

(by Bayes and summing out)

(by independence of fringe and other)

Page 16: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Independence

(relocate summation)

(by independence)

(relocate summation)

(new alpha & sum 1)

Page 17: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Independence

4096 terms dropped to 44096 terms dropped to 4

• Fringe has two cells, four possible pitness combinationsFringe has two cells, four possible pitness combinations

4096 terms dropped to 44096 terms dropped to 4

• Fringe has two cells, four possible pitness combinationsFringe has two cells, four possible pitness combinations

Page 18: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Chapter 14

Probabilistic ReasoningProbabilistic Reasoning

• First, Bayesian NetworksFirst, Bayesian Networks

• Then, InferenceThen, Inference

Probabilistic ReasoningProbabilistic Reasoning

• First, Bayesian NetworksFirst, Bayesian Networks

• Then, InferenceThen, Inference

Page 19: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Bayesian Networks

Difficult to build a probability table with a large Difficult to build a probability table with a large amount of dataamount of data

• Independence and conditional independence seek to reduce Independence and conditional independence seek to reduce complications (time) of building full joint distributioncomplications (time) of building full joint distribution

Bayesian Network captures these dependenciesBayesian Network captures these dependencies

Difficult to build a probability table with a large Difficult to build a probability table with a large amount of dataamount of data

• Independence and conditional independence seek to reduce Independence and conditional independence seek to reduce complications (time) of building full joint distributioncomplications (time) of building full joint distribution

Bayesian Network captures these dependenciesBayesian Network captures these dependencies

Page 20: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Bayesian Network

Directed Acyclic Graph (DAG)Directed Acyclic Graph (DAG)

• Random variables are the nodesRandom variables are the nodes

• Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships

• Each node labeled with P(XEach node labeled with P(X ii | Parents (X | Parents (Xii))))

Directed Acyclic Graph (DAG)Directed Acyclic Graph (DAG)

• Random variables are the nodesRandom variables are the nodes

• Arcs indicate conditional independence relationshipsArcs indicate conditional independence relationships

• Each node labeled with P(XEach node labeled with P(X ii | Parents (X | Parents (Xii))))

Page 21: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Another example

Burglar AlarmBurglar Alarm

• Goes off when intruder (usually)Goes off when intruder (usually)

• Goes off during earthquake (sometimes)Goes off during earthquake (sometimes)

• Neighbor John calls when he hears the alarm, but he also Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmcalls when he confuses the phone for the alarm

• Neighbor Mary calls when she hears the alarm, but she Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicdoesn’t hear it when listening to music

Burglar AlarmBurglar Alarm

• Goes off when intruder (usually)Goes off when intruder (usually)

• Goes off during earthquake (sometimes)Goes off during earthquake (sometimes)

• Neighbor John calls when he hears the alarm, but he also Neighbor John calls when he hears the alarm, but he also calls when he confuses the phone for the alarmcalls when he confuses the phone for the alarm

• Neighbor Mary calls when she hears the alarm, but she Neighbor Mary calls when she hears the alarm, but she doesn’t hear it when listening to musicdoesn’t hear it when listening to music

Page 22: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Another example

Burglar AlarmBurglar AlarmBurglar AlarmBurglar Alarm Note the absence ofInformation about Johnand Mary’s errors.

Note the presence ofConditional ProbabilityTables (CPTs)

Page 23: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Full joint distribution

The Bayesian Network describes the full joint The Bayesian Network describes the full joint distributiondistribution

P(P(XX11 = x = x11 ^ ^ XX22 = x = x22 ^ … ^ ^ … ^ XXnn = x = xnn))

abbreviated as…abbreviated as…

P (xP (x11, x, x22, …, x, …, xnn) = ) =

The Bayesian Network describes the full joint The Bayesian Network describes the full joint distributiondistribution

P(P(XX11 = x = x11 ^ ^ XX22 = x = x22 ^ … ^ ^ … ^ XXnn = x = xnn))

abbreviated as…abbreviated as…

P (xP (x11, x, x22, …, x, …, xnn) = ) =

CPT

Page 24: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Burglar alarm exampleP (John calls, Mary calls, alarm goes off, no burglar or earthquake)P (John calls, Mary calls, alarm goes off, no burglar or earthquake)P (John calls, Mary calls, alarm goes off, no burglar or earthquake)P (John calls, Mary calls, alarm goes off, no burglar or earthquake)

Page 25: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Constructing a Bayesian Network• Top-down is more likely to workTop-down is more likely to work

• Causal rules are betterCausal rules are better

• Adding arcs is a judgment callAdding arcs is a judgment call

– Consider decision not to add error info about John/MaryConsider decision not to add error info about John/Mary

No reference to telephones or music playing in networkNo reference to telephones or music playing in network

• Top-down is more likely to workTop-down is more likely to work

• Causal rules are betterCausal rules are better

• Adding arcs is a judgment callAdding arcs is a judgment call

– Consider decision not to add error info about John/MaryConsider decision not to add error info about John/Mary

No reference to telephones or music playing in networkNo reference to telephones or music playing in network

Page 26: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional distributions

It can be time consuming to fill up all the CPTs of It can be time consuming to fill up all the CPTs of discrete random variablesdiscrete random variables

• Sometimes standard templates can be usedSometimes standard templates can be used

– The canonical 20% of the work solves 80% of the problemThe canonical 20% of the work solves 80% of the problem

Thanks Pareto and JuranThanks Pareto and Juran

• Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table

– A V B V C => DA V B V C => D

It can be time consuming to fill up all the CPTs of It can be time consuming to fill up all the CPTs of discrete random variablesdiscrete random variables

• Sometimes standard templates can be usedSometimes standard templates can be used

– The canonical 20% of the work solves 80% of the problemThe canonical 20% of the work solves 80% of the problem

Thanks Pareto and JuranThanks Pareto and Juran

• Sometimes simple logic summarizes a tableSometimes simple logic summarizes a table

– A V B V C => DA V B V C => D

Page 27: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional distributions

Continuous random variablesContinuous random variables

• DiscretizationDiscretization

– Subdivide continuous region into a fixed set of intervalsSubdivide continuous region into a fixed set of intervals

Where do you put the regions?Where do you put the regions?

• Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs)

– P(X) = Gaussian, where only mean and variance need to P(X) = Gaussian, where only mean and variance need to be specifiedbe specified

Continuous random variablesContinuous random variables

• DiscretizationDiscretization

– Subdivide continuous region into a fixed set of intervalsSubdivide continuous region into a fixed set of intervals

Where do you put the regions?Where do you put the regions?

• Standard Probability Density Functions (PDFs)Standard Probability Density Functions (PDFs)

– P(X) = Gaussian, where only mean and variance need to P(X) = Gaussian, where only mean and variance need to be specifiedbe specified

Page 28: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional distributions

Mixing discrete and continuousMixing discrete and continuous

Example:Example:

• Probability I buy fruit is a function of its Probability I buy fruit is a function of its costcost

• Its cost is a function of the Its cost is a function of the harvest qualityharvest quality and the presence and the presence of of government subsidiesgovernment subsidies

How do we mix the items?How do we mix the items?

Mixing discrete and continuousMixing discrete and continuous

Example:Example:

• Probability I buy fruit is a function of its Probability I buy fruit is a function of its costcost

• Its cost is a function of the Its cost is a function of the harvest qualityharvest quality and the presence and the presence of of government subsidiesgovernment subsidies

How do we mix the items?How do we mix the items?

Continuous

Discrete

Page 29: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Hybrid Bayesians

P(Cost | Harvest, Subsidy)P(Cost | Harvest, Subsidy)

• P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy)

• P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy)

P(Cost | Harvest, Subsidy)P(Cost | Harvest, Subsidy)

• P (Cost | Harvest, subsidy)P (Cost | Harvest, subsidy)

• P (Cost | Harvest, ~subsidy)P (Cost | Harvest, ~subsidy)

Enumerate the discrete choices

Page 30: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Hybrid Bayesians

How does Cost change as a function of Harvest?How does Cost change as a function of Harvest?

• Linear GaussianLinear Gaussian

– Cost is a Gaussian distribution with mean that varies Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation linearly with the value of the parent and standard deviation is constantis constant

How does Cost change as a function of Harvest?How does Cost change as a function of Harvest?

• Linear GaussianLinear Gaussian

– Cost is a Gaussian distribution with mean that varies Cost is a Gaussian distribution with mean that varies linearly with the value of the parent and standard deviation linearly with the value of the parent and standard deviation is constantis constant Need two of these…

One for each subsidy

Page 31: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14
Page 32: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Multivariate Gaussian

A network of continuous variables with linear A network of continuous variables with linear Gaussian distributions has a joint distribution that Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the is a multivariate Gaussian distribution over all the variablesvariables

• A surface in n-dimensional space where there is a peak at A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s the point with coordinates constructed from each dimension’s meansmeans

• It drops off in all directions from the meanIt drops off in all directions from the mean

A network of continuous variables with linear A network of continuous variables with linear Gaussian distributions has a joint distribution that Gaussian distributions has a joint distribution that is a multivariate Gaussian distribution over all the is a multivariate Gaussian distribution over all the variablesvariables

• A surface in n-dimensional space where there is a peak at A surface in n-dimensional space where there is a peak at the point with coordinates constructed from each dimension’s the point with coordinates constructed from each dimension’s meansmeans

• It drops off in all directions from the meanIt drops off in all directions from the mean

Page 33: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Conditional Gaussian

Adding discrete variables to a multivariate Adding discrete variables to a multivariate Gaussian results in a conditional GaussianGaussian results in a conditional Gaussian

• Given any assignment to the discrete variables, the Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussiandistribution over the continuous ones is multivariate Gaussian

Adding discrete variables to a multivariate Adding discrete variables to a multivariate Gaussian results in a conditional GaussianGaussian results in a conditional Gaussian

• Given any assignment to the discrete variables, the Given any assignment to the discrete variables, the distribution over the continuous ones is multivariate Gaussiandistribution over the continuous ones is multivariate Gaussian

Page 34: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Discrete variables with cont. parents

Either you buy or you don’tEither you buy or you don’t

• But there is a soft thresholdBut there is a soft thresholdaround your desired costaround your desired cost

Either you buy or you don’tEither you buy or you don’t

• But there is a soft thresholdBut there is a soft thresholdaround your desired costaround your desired cost

Page 35: CS 416 Artificial Intelligence Lecture 14 Uncertainty Chapters 13 and 14 Lecture 14 Uncertainty Chapters 13 and 14

Discrete variables with cont. parents

Normal Dist.