bayesnetwork

42
Bayesian Networks CSC 371: Spring 2012

Upload: digvijay-singh

Post on 05-Dec-2014

758 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Bayesnetwork

Bayesian Networks

CSC 371: Spring 2012

Page 2: Bayesnetwork

Today’s Lecture• Recap: Joint distribution, independence,

marginal independence, conditional independence

• Bayesian networks

• Reading:– Sections 14.1-14.4 in AIMA [Russel & Norvig]

Page 3: Bayesnetwork

Marginal Independence

• Intuitively: if X ╨ Y, then– learning that Y=y does not change your belief in X– and this is true for all values y that Y could take

• For example, weather is marginally independent of the result of a coin toss

3

Page 4: Bayesnetwork

Marginal Independence

4

Page 5: Bayesnetwork

Conditional Independence

5

• Intuitively: if X ╨ Y | Z, then– learning that Y=y does not change your belief in X

when we already know Z=z– and this is true for all values y that Y could take

and all values z that Z could take• For example,

ExamGrade ╨ AssignmentGrade | UnderstoodMaterial

Page 6: Bayesnetwork

Conditional Independence

Page 7: Bayesnetwork

“…probability theory is more fundamentally concerned with the structure of reasoning and causation than with numbers.”

Glenn Shafer and Judea PearlIntroduction to Readings in Uncertain Reasoning,Morgan Kaufmann, 1990

Page 8: Bayesnetwork

Bayesian Network Motivation• We want a representation and reasoning system that is

based on conditional (and marginal) independence– Compact yet expressive representation– Efficient reasoning procedures

• Bayesian (Belief) Networks are such a representation– Named after Thomas Bayes (ca. 1702 –1761)– Term coined in 1985 by Judea Pearl (1936 – )– Their invention changed the primary focus of AI from logic to

probability!

Thomas Bayes Judea Pearl8

Page 9: Bayesnetwork

Bayesian Networks: Intuition• A graphical representation for a joint probability

distribution– Nodes are random variables

• Can be assigned (observed) or unassigned (unobserved)– Arcs are interactions between nodes

• Encode conditional independence• An arrow from one variable to another indicates direct

influence• Directed arcs between nodes reflect dependence

– A compact specification of full joint distributions• Some informal examples:

9

UnderstoodMaterial

Assignment Grade

ExamGrade Alarm

Smoking At Sensor

Fire

Page 10: Bayesnetwork

Example of a simple Bayesian network

A B

C

• Probability model has simple factored form

• Directed edges => direct dependence

• Absence of an edge => conditional independence

• Also known as belief networks, graphical models, causal networks

• Other formulations, e.g., undirected graphical models

p(A,B,C) = p(A)p(B) p(C|A,B)

Page 11: Bayesnetwork

Bayesian Networks: Definition

11

Page 12: Bayesnetwork

Bayesian Networks: Definition

12

• Discrete Bayesian networks:– Domain of each variable is finite– Conditional probability distribution is a conditional

probability table – We will assume this discrete case

• But everything we say about independence (marginal & conditional) carries over to the continuous case

Page 13: Bayesnetwork

Examples of 3-way Bayesian Networks

A CB Marginal Independence:p(A,B,C) = p(A) p(B) p(C)

Page 14: Bayesnetwork

Examples of 3-way Bayesian Networks

A

CB

Conditionally independent effects:p(A,B,C) = p(B|A)p(C|A)p(A)

B and C are conditionally independentGiven A

e.g., A is a disease, and we model B and C as conditionally independentsymptoms given A

Page 15: Bayesnetwork

Examples of 3-way Bayesian Networks

A B

C

Independent Causes:p(A,B,C) = p(C|A,B)p(A)p(B)

“Explaining away” effect:Given C, observing A makes B less likelye.g., earthquake/burglary/alarm example

A and B are (marginally) independent but become dependent once C is known

Page 16: Bayesnetwork

Examples of 3-way Bayesian Networks

A CB Markov dependence:p(A,B,C) = p(C|B) p(B|A)p(A)

Page 17: Bayesnetwork

Example: Burglar Alarm• I have a burglar alarm that is sometimes set off by

minor earthquakes. My two neighbors, John and Mary, promised to call me at work if they hear the alarm– Example inference task: suppose Mary calls and John doesn’t

call. What is the probability of a burglary?

• What are the random variables? – Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

Page 18: Bayesnetwork

Example 5 binary variables:

B = a burglary occurs at your house E = an earthquake occurs at your house A = the alarm goes off J = John calls to report the alarm M = Mary calls to report the alarm

What is P(B | M, J) ? (for example) We can use the full joint distribution to answer this question

Requires 25 = 32 probabilities Can we use prior domain knowledge to come up with a

Bayesian network that requires fewer probabilities? What are the direct influence relationships?

A burglary can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call

Page 19: Bayesnetwork

Example: Burglar Alarm

What are the model parameters?

Page 20: Bayesnetwork

Conditional probability distributions

• To specify the full joint distribution, we need to specify a conditional distribution for each node given its parents: P (X | Parents(X))

Z1 Z2 Zn

X

P (X | Z1, …, Zn)

Page 21: Bayesnetwork

Example: Burglar Alarm

Page 22: Bayesnetwork

The joint probability distribution• For each node Xi, we know P(Xi | Parents(Xi))

• How do we get the full joint distribution P(X1, …, Xn)?

• Using chain rule:

• For example, P(j, m, a, b, e)= P(b) P(e) P(a | b, e) P(j | a) P(m | a)

n

iii

n

iiin XParentsXPXXXPXXP

11111 )(|,,|),,(

Page 23: Bayesnetwork

Constructing a Bayesian Network: Step 1

• Order the variables in terms of causality (may be a partial order)

e.g., {E, B} -> {A} -> {J, M}

• P(J, M, A, E, B) = P(J, M | A, E, B) P(A| E, B) P(E, B)

~ P(J, M | A) P(A| E, B) P(E) P(B)

~ P(J | A) P(M | A) P(A| E, B) P(E) P(B)

These CI assumptions are reflected in the graph

structure of the Bayesian network

Page 24: Bayesnetwork

Constructing this Bayesian Network: Step 2

• P(J, M, A, E, B) = P(J | A) P(M | A) P(A | E, B) P(E) P(B)

• There are 3 conditional probability tables (CPDs) to be determined: P(J | A), P(M | A), P(A | E, B)

– Requiring 2 + 2 + 4 = 8 probabilities

• And 2 marginal probabilities P(E), P(B) -> 2 more probabilities

• 2 + 2 + 4 + 1 + 1 = 10 numbers (vs. 25-1 = 31)

• Where do these probabilities come from?– Expert knowledge– From data (relative frequency estimates)– Or a combination of both

Page 25: Bayesnetwork

Number of Probabilities in Bayesian Networks

• Consider n binary variables

• Unconstrained joint distribution requires O(2n) probabilities

• If we have a Bayesian network, with a maximum of k parents for any node, then we need O(n 2k) probabilities

• Example– Full unconstrained joint distribution

• n = 30: need 109 probabilities for full joint distribution– Bayesian network

• n = 30, k = 4: need 480 probabilities

Page 26: Bayesnetwork

Constructing Bayesian networks1. Choose an ordering of variables X1, … , Xn

2. For i = 1 to n

– add Xi to the network

– select parents from X1, … ,Xi-1 such thatP(Xi | Parents(Xi)) = P(Xi | X1, ... Xi-1)

Page 27: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

Example

Page 28: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? No

Example

Page 29: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)?P(A | J, M) = P(A | J)?P(A | J, M) = P(A | M)?

Example

Page 30: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? No

Example

Page 31: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? P(B | A, J, M) = P(B | A)?

Example

Page 32: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? Yes

Example

Page 33: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? YesP(E | B, A ,J, M) = P(E)?P(E | B, A, J, M) = P(E | A, B)?

Example

Page 34: Bayesnetwork

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)? NoP(A | J, M) = P(A)? NoP(A | J, M) = P(A | J)? NoP(A | J, M) = P(A | M)? NoP(B | A, J, M) = P(B)? NoP(B | A, J, M) = P(B | A)? YesP(E | B, A ,J, M) = P(E)? NoP(E | B, A, J, M) = P(E | A, B)? Yes

Example

Page 35: Bayesnetwork

Example contd.

• Deciding conditional independence is hard in noncausal directions– The causal direction seems much more natural

• Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Page 36: Bayesnetwork

A more realistic Bayes Network: Car diagnosis

• Initial observation: car won’t start• Orange: “broken, so fix it” nodes• Green: testable evidence• Gray: “hidden variables” to ensure sparse structure, reduce

parameteres

Page 37: Bayesnetwork

The Bayesian Network from a different Variable Ordering

Page 38: Bayesnetwork

Given a graph, can we “read off” conditional independencies?

Page 39: Bayesnetwork

Are there wrong network structures?

• Some variable orderings yield more compact, some less compact structures– Compact ones are better– But all representations resulting from this process

are correct– One extreme: the fully connected network is

always correct but rarely the best choice

• How can a network structure be wrong?– If it misses directed edges that are required

Page 40: Bayesnetwork

Summary• Bayesian networks provide a natural

representation for (causally induced) conditional independence

• Topology + conditional probability tables• Generally easy for domain experts to

construct

Page 41: Bayesnetwork

Probabilistic inference

• A general scenario:– Query variables: X– Evidence (observed) variables: E = e – Unobserved variables: Y

• If we know the full joint distribution P(X , E, Y), how can we perform inference about X?

• Problems– Full joint distributions are too large– Marginalizing out Y may involve too many summation

terms

y

yeXe

eXeEX ),,(

)(

),()|( P

P

PP

Page 42: Bayesnetwork

Conclusions…• Full joint distributions are intractable to work

with– Conditional independence assumptions allow us to

model real-world phenomena with much simpler models

– Bayesian networks are a systematic way to construct parsimonious structured distributions

• How do we do inference (reasoning) in Bayesian networks?– Systematic algorithms exist– Complexity depends on the structure of the graph