learning with bayesian networks david heckerman presented by colin rickert

28
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Post on 20-Dec-2015

219 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Learning with Bayesian Networks

David Heckerman

Presented by Colin Rickert

Page 2: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Introduction to Bayesian Networks

Bayesian networks represent an advanced form of general Bayesian probability

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest1

The model has several advantages for data analysis over rule based decision trees1

Page 3: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Outline

1. Bayesian vs. classical probability methods

2. Advantages of Bayesian techniques3. The coin toss prediction model from a

Bayesian perspective4. Constructing a Bayesian network with

prior knowledge5. Optimizing a Bayesian network with

observed knowledge (data)6. Exam questions

Page 4: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Bayesian vs. the Classical Approach

The Bayesian probability of an event x, represents the person’s degree of belief or confidence in that event’s occurrence based on prior and observed facts.

Classical probability refers to the true or actual probability of the event and is not concerned with observed behavior.

Page 5: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Bayesian vs. the Classical Approach

Bayesian approach restricts its prediction to the next (N+1) occurrence of an event given the observed previous (N) events.

Classical approach is to predict likelihood of any given event regardless of the number of occurrences.

Page 6: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Example

Imagine a coin with irregular surfaces such that the probability of landing heads or tails is not equal.

Classical approach would be to analyze the surfaces to create a physical model of how the coin is likely to land on any given throw.

Bayesian approach simply restricts attention to predicting the next toss based on previous tosses.

Page 7: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Advantages of Bayesian Techniques

How do Bayesian techniques compare to other learning models? 1. Bayesian networks can readily handle

incomplete data sets.

2. Bayesian networks allow one to learn about causal relationships

3. Bayesian networks readily facilitate use of prior knowledge

4. Bayesian methods provide an efficient method for preventing the over fitting of data (there is no need for pre-processing).

Page 8: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Handling of Incomplete Data

Imagine a data sample where two attribute values are strongly anti-correlated

With decision trees both values must be present to avoid confusing the learning model

Bayesian networks need only one of the values to be present and can infer the absence of the other: Imagine two variables, one for gun-owner

and the other for peace activist. Data should indicate that you do not

need to check both values

Page 9: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Learning about Causal Relationships

We can use observed knowledge to determine the validity of the acyclic graph that represents the Bayesian network.

For instance is running a cause of knee damage? Prior knowledge may indicate that this is

the case. Observed knowledge may strengthen or

weaken this argument.

Page 10: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Use of Prior Knowledge and Observed Behavior

Construction of prior knowledge is relatively straightforward by constructing “causal” edges between any two factors that are believed to be correlated.

Causal networks represent prior knowledge where as the weight of the directed edges can be updated in a posterior manner based on new data

Page 11: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Avoidance of Over Fitting Data

Contradictions do not need to be removed from the data.

Data can be “smoothed” such that all available data can be used

Page 12: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

The “Irregular” Coin Toss from a Bayesian Perspective

Start with the set of probabilities = {1,…,n} for our hypothesis.

For coin toss we have only one representing our belief that we will toss a “heads”, 1- for tails.

Predict the outcome of the next (N+1) flip based on the previous N flips: for 1, … ,N D = {X1=x1,…, Xn=xn} Want to know probability that Xn+1=xn+1 = heads

represents information we have observed thus far (i.e. = {D}

Page 13: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Bayesian Probabilities Posterior Probability, p(|D,): Probability of a

particular value of given that D has been observed (our final value of ) . In this case = {D}.

Prior Probability, p(|): Prior Probability of a particular value of given no observed data (our previous “belief”)

Observed Probability or “Likelihood”, p(D|,): Likelihood of sequence of coin tosses D being observed given that is a particular value. In this case = {}.

p(D|): Raw probability of D

Page 14: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Bayesian Formulas for Weighted Coin Toss (Irregular Coin)

where

*Only need to calculate p( |D,) and p(|), the rest can be derived

Page 15: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Integration

To find the probability that Xn+1=heads, we must integrate over all possible values of to find the average value of which yields:

Page 16: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Expansion of Terms

1. Expand observed probability p(|D,):

2. Expand prior probability p(|):

*“Beta” function yields a bell curve upon integration which is a typical probability distribution. Can be viewed as our expectation of the shape of the curve.

Page 17: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Beta Function and Integration

Integrating gives the desired result:

Combine product of both functions to yield

Page 18: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Key Points

Multiply the results of the beta function (prior probability) with results of the coin toss function for (observed probability). Result is our confidence for this value of .

Integrating the product of the two with respect to over all values of 0<<1 , is necessary to yield the average value that best fits the observed facts + prior knowledge.

Page 19: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Bayesian Networks

1. Construct prior knowledge from graph of causal relationships among variables.

2. Update the weights of the edges to reflect confidence of that causal link based on observed data (i.e. posterior knowledge).

Page 20: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Example Network

Consider a credit fraud network designed to determine the probability of credit fraud based on certain events

Variables include: Fraud(f): whether fraud occurred or not Gas(g): whether gas was purchased within 24 hours Jewelry(J): whether jewelry was purchased in the last

24 hours Age(a): Age of card holder Sex(s): Sex of card holder

Task of determining which variables to include is not trivial, involves decision analysis.

Page 21: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Construct Graph Based on Prior Knowledge

If examining all possible causal networks, there are n! possibilities to consider when trying to find the best one

Search space can be reduced with domain knowledge:

If A is thought to be a “cause” of B then add edge A->B

For all nodes that do not have a causal link we can check for conditional independence between those nodes

Page 22: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Example

Using the above graph of expected causes, we can check for conditional independence of the following probabilities given initial sample data

p(a|f) = p(a)

p(s|f,a) = p(s)

p(g|f,a, s) = p(g|f)

p(j|f,a,s,g) = p(j|f,a,s)

Page 23: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Construction of “Posterior” knowledge based on observed data:

For every node i, we construct the vector of probabilities ij = {ij1,…,ijn} where ij is represented as row entry in a table of all possible combinations j of the parent nodes 1,…,n

The entries in this table are the weights that represent the degree of confidence that nodes 1,…,n influence node i (though we don’t know these values yet)

Page 24: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Determining Table Values for i

How do we determine the values for ij?

Perform multivariate integration to find the average ij for all i and j in a similar manner to the coin toss integration: Count all instances “m” that satisfy a

configuration ijk then observed probability for ijk becomes ijkm(1- ijk )n-m

Integrate over all vectors ijk to find the average value of each ijk

Page 25: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Question1: What is Bayesian Probability?

A person’s degree of belief in a certain event

i.e. Your own degree of certainty that a tossed coin will land “heads”

Page 26: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Question 2: What are the advantages and disadvantages of the Bayesian and classical approaches to probability?

Bayesian Probability: + Reflects an expert’s knowledge +Compiles with rules of probability - Arbitrary

Classical Probability: + Objective and unbiased - Generally not available

Page 27: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Question 3: Mention at least 3 Advantages of Bayesian analysis

Handle incomplete data sets Learning about causal relationships Combine domain knowledge and

data Avoid over fitting

Page 28: Learning with Bayesian Networks David Heckerman Presented by Colin Rickert

Conclusion

Bayesian networks can be used to express expert knowledge about a problem domain even when a precise model does not exist