probabilities, likelihood, bayesian...

54
Probabilities, Likelihood, Bayesian inference Tuesday, April 12, 2011

Upload: others

Post on 27-Mar-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Probabilities, Likelihood, Bayesian inference

Tuesday, April 12, 2011

Page 2: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Taking samples for statistical analysis

Tuesday, April 12, 2011

Page 3: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 4

Combining probabilities• Multiply probabilities if the component events

must happen simultaneously (i.e. where you would naturally use the word AND when describing the problem)

(1/6) ! (1/6) = 1/36

Using 2 dice, what is the probability of

AND ?AND

Tuesday, April 12, 2011

Page 4: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 6

Combining probabilities

• Add probabilities if the component events are mutually exclusive (i.e. where you would naturally use the word OR in describing the problem)

(1/6) + (1/6) = 1/3

Using one die, what is the probability of

OR ?OR

Tuesday, April 12, 2011

Page 5: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis

Combining AND and OR

7

(1/36) + (1/36) + (1/36) + (1/36) + (1/36) + (1/36) = 1/6

1 and 6

2 and 5

3 and 4

4 and 3

5 and 2

6 and 1

What is the probability that the sum of two dice is 7?

Tuesday, April 12, 2011

Page 6: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 4

Joint probabilities B = Black S = SolidW = White D = Dotted

Tuesday, April 12, 2011

Page 7: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 5

Conditional probabilities

Tuesday, April 12, 2011

Page 8: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 9

Independence

Pr(A and B) = Pr(A) Pr(B|A)joint probability conditional

probability

Pr(A and B) = Pr(A) Pr(B)

...then events A and B are independent and we can express the joint probability as the product of Pr(A) and Pr(B)

If we can say this...

Pr(B|A) = Pr(B)

This is always true...

Tuesday, April 12, 2011

Page 9: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 11

The Likelihood CriterionThe probability of the observations computed using a model tells

us how surprised we should be.The preferred model is the one that surprises us least.

Suppose I threw 20 dice down on the table and this was the result...

11

Tuesday, April 12, 2011

Page 10: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(obs.|fair dice model) =�

16

�20

=1

3, 656, 158, 440, 062, 976

Copyright © 2011 Paul O. Lewis 12

The Fair Dice model

12

You should have been very surprised at this result because the probability of this event is very small: only 1 in 3.6 quadrillion!

Tuesday, April 12, 2011

Page 11: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(obs.|trick dice model) = 120 = 1

Copyright © 2011 Paul O. Lewis 13

The Trick Dice model(assumes dice each have 5 on every side)

You should not be surprised at all at this result because the observed outcome is certain under this model

13

Tuesday, April 12, 2011

Page 12: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 14

Results

Model Likelihood Surprise level

Fair Dice 13,656,158,440,062,976

Very, very, very surprised

Trick Dice 1.0 Not surprised at all

winning model maximizes likelihood(and thus minimizes surprise)

Tuesday, April 12, 2011

Page 13: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis

Likelihood and model comparison

• Analyses using likelihoods ultimately involve model comparison

• The models compared can be discrete (as in the fair vs. trick dice example)

• More often the models compared differ continuously:– Model 1: branch length is 0.05– Model 2: branch length is 0.06

1515

Rather than having an infinity of models, we instead think of

the branch length as a parameter within

one model

Tuesday, April 12, 2011

Page 14: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Likelihood versus log likelihood

0.2 0.4 0.6 0.8 1.0

0.05

0.10

0.15

0.20

0.25

0.2 0.4 0.6 0.8 1.0

�14

�12

�10

�8

�6

�4

�2

Log[L]

L

Tuesday, April 12, 2011

Page 15: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Invariance principle

Tuesday, April 12, 2011

Page 16: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Coin flipping

Tuesday, April 12, 2011

Page 17: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(y|p) =�

n

y

�py(1− p)n−y = L(p|y)

Copyright © 2011 Paul O. Lewis 33

Coin-flipping

y = observed number of headsn = number of flips (sample size)p = (unobserved) proportion of heads

Note that the same formula serves as both the: - probability of y (if p is fixed)- likelihood of p (if y is fixed)

Tuesday, April 12, 2011

Page 18: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Outcome Fair coin model Two-heads model

H 0.5 1.0

T 0.5 0.0

1.0 1.0

Copyright © 2011 Paul O. Lewis 34

Likelihood vs. Probability

• Symbol Pr(D|M) often used to represent the likelihood (I will be following convention in this regard)

Pr(D | M)Probabilities

are functions of the data (the

model is fixed)

L(M | D)

Likelihoods are functions of

models (the data is fixed)

Tuesday, April 12, 2011

Page 19: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Examples

Mathematic

a example

Tuesday, April 12, 2011

Page 20: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Bayesian inference

Tuesday, April 12, 2011

Page 21: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 4

Joint probabilities B = Black S = SolidW = White D = Dotted

Tuesday, April 12, 2011

Page 22: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 5

Conditional probabilities

Tuesday, April 12, 2011

Page 23: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 6

Bayes’ rule

Tuesday, April 12, 2011

Page 24: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 7

Probability of "Dotted"

Tuesday, April 12, 2011

Page 25: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 8

Bayes' rule (cont.)

Pr(D) is the marginal probability of being dottedTo compute it, we marginalize over colors

Tuesday, April 12, 2011

Page 26: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 9

Joint probabilities

B W

Pr(D,B)D

S Pr(S,B) Pr(S,W)

Pr(D,W)

Tuesday, April 12, 2011

Page 27: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 11

Marginal probabilities

B W

Pr(D,B) + Pr(D,W)D

S Pr(S,B) + Pr(S,W)

Marginal probabilityof being solid

Marginal probabilityof being dotted

Tuesday, April 12, 2011

Page 28: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 12

Marginalizing over "dottedness"

B W

Pr(D,B)

D

S

Pr(S,B) Pr(S,W)

Pr(D,W) Marginal probability of being a white

marble

Tuesday, April 12, 2011

Page 29: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 13

Bayes' rule (cont.)

Tuesday, April 12, 2011

Page 30: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 14

Bayes' rule in Statistics

D refers to the "observables" (i.e. the Data)! refers to one or more "unobservables"

(i.e. parameters of a model, or the model itself):– tree model (i.e. tree topology)– substitution model (e.g. JC, F84, GTR, etc.)– parameter of a substitution model (e.g. a branch length,

a base frequency, transition/transversion rate ratio, etc.)– hypothesis (i.e. a special case of a model)– a latent variable (e.g. ancestral state)

Tuesday, April 12, 2011

Page 31: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(θ|D) =Pr(D|θ) Pr(θ)�θ Pr(D|θ) Pr(θ)

Copyright © 2011 Paul O. Lewis 15

Bayes’ rule in statistics

Likelihood of hypothesis ! Prior probability of hypothesis !

Posterior probabilityof hypothesis !

Marginal probabilityof the data (marginalizing

over hypotheses)

Tuesday, April 12, 2011

Page 32: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 16

Simple (albeit silly) paternity example

Possibilities !1 !2 Row sum

Genotypes AA Aa ---

Prior 1/2 1/2 1

Likelihood 1 1/2 ---

Prior X Likelihood 1/2 1/4 3/4

Posterior 2/3 1/3 1

!1 and !2 are assumed to be the only possible fathers, child has genotype Aa, mother has genotype aa, so child must have received allele A from the true father. Note: the data in this case is the child’s genotype (Aa)

Tuesday, April 12, 2011

Page 33: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis

The prior can be your friend

17

Suppose the test for a rare disease is 99% accurate.

Pr(+|disease) = 0.99Pr(+|healthy) = 0.01

datum hypothesis

(Note that we do not need to consider the case of a negative test result.)

It is very tempting to (mis)interpret the likelihood as a posterior probability and conclude “There is a 99% chance that I have the disease.”

Suppose further I test positive for the disease. How worried should I be?

Tuesday, April 12, 2011

Page 34: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(disease|+) =Pr(+|disease)

�12

Pr(+|disease)�

12

�+ Pr(+|healthy)

�12

=(0.99)

�12

(0.99)�

12

�+ (0.01)

�12

� = 0.99

Copyright © 2011 Paul O. Lewis

The prior can be your friend

18

The posterior probability is 0.99 only if the prior probability of having the disease is 0.5:

Pr(disease|+) =(0.99)

�1

1000000

(0.99)�

11000000

�+ (0.01)

�9999991000000

≈ 0.0001

If, however, the prior odds against having the disease are a million to 1, then the posterior probability is much more reassuring:

Tuesday, April 12, 2011

Page 35: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis

An important caveat

19

This (rare disease) example involves a tiny amount of data (one observation) and an extremely informative prior, and gives the impression that maximum likelihood (ML) inference is not very reliable.

However, in phylogenetics, we often have lots of data and use much less informative priors, so in phylogenetics ML inference is generally very reliable.

Tuesday, April 12, 2011

Page 36: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 20

Discrete vs. Continuous

• So far, we've been dealing with discrete hypotheses (e.g. either this father or that father, have disease or don’t have disease)

• In phylogenetics, substitution models represent an infinite number of hypotheses (each combination of parameter values is in some sense a separate hypothesis)

• How do we use Bayes' rule when our hypotheses form a continuum?

Tuesday, April 12, 2011

Page 37: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

f(θ|D) =f(D|θ)f(θ)�f(D|θ)f(θ)dθ

Copyright © 2011 Paul O. Lewis 21

Bayes’ rule: continuous case

Likelihood

Marginal probabilityof the data

Posterior probabilitydensity

Prior probabilitydensity

Tuesday, April 12, 2011

Page 38: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 22

If you had to guess...

0.0 !

1 meter

Not knowing anything about my archery abilities,draw a curve representingyour view of the chances of my arrow landing a distanced from the center of the target(if it helps, I'm standing 50meters away from the target)

d

Tuesday, April 12, 2011

Page 39: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 23

Case 1: assume I have talent

0.0

1 meter

20.0 40.0 60.0 80.0distance in centimeters from target center

An informative prior(low variance) thatsays most of my arrows will fall within10 cm of the center(thanks for yourconfidence!)

Tuesday, April 12, 2011

Page 40: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 24

Case 2: assume I have a talent for missing the target!

0.0

1 meter

20.0 40.0 60.0

distance in cm from target center

Also an informative prior,but one that says most of my arrows will fall withina narrow range justoutside the entire target!

Tuesday, April 12, 2011

Page 41: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 25

Case 3: assume I have no talent

0.0

1 meter

20.0 40.0 60.0 80.0distance in cm from target center

This is a vague prior:its high variance reflectsnearly total ignoranceof my abilities, saying that my arrows could land nearly anywhere!

Tuesday, April 12, 2011

Page 42: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 26

A matter of scale

!

Notice that I haven't provided a scale forthe vertical axis.

What exactly does the height of thiscurve mean?

For example, does the height of the dottedline represent the probability that my arrow lands 60 cm from the center of the target?

0.0 20.0 40.0 60.0

Tuesday, April 12, 2011

Page 43: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 27

Probabilities apply to intervals

Probabilities are attached to intervals(i.e. ranges of values), not individual values

The probability of any given point (e.g. d = 60.0) is zero!

However, we can ask about the probability that d falls in a particular range e.g. 50.0 < d < 65.0

0.0 20.0 40.0 60.0

Tuesday, April 12, 2011

Page 44: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 28

Probabilities vs. probability densities

Probability density functionNote: the height of this curve does not represent a probability (if it did, it would not exceed 1.0)

density.ai example_density.xls

Tuesday, April 12, 2011

Page 45: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 30density.ai

Integration of densities

The density curve is scaled so that the value of this integral(i.e. the total area) equals 1.0

!

Tuesday, April 12, 2011

Page 46: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 31density.ai

Integration of a probability density yields a probability

Area under the densitycurve from 0 to 2 is theprobability that ! is less

than 2

Tuesday, April 12, 2011

Page 47: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 32

mean=60var=3

! "! #! $! %! &! '!

!(!!

!(!&

!("!

!("&

!(#!

!(#&

!($!

mean=200var=40000

mean=2.5var=3.125 Archery Priors Revisited

These density curves areall variations of a gammaprobability distribution.

We could have used agamma distribution to

specify each of the priorprobability distributions

for the archery example.Note that higher variancemeans less informative

Tuesday, April 12, 2011

Page 48: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Pr(y|p) =�

n

y

�py(1− p)n−y = L(p|y)

Copyright © 2011 Paul O. Lewis 33

Coin-flipping

y = observed number of headsn = number of flips (sample size)p = (unobserved) proportion of heads

Note that the same formula serves as both the: - probability of y (if p is fixed)- likelihood of p (if y is fixed)

Tuesday, April 12, 2011

Page 49: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Outcome Fair coin model Two-heads model

H 0.5 1.0

T 0.5 0.0

1.0 1.0

Copyright © 2011 Paul O. Lewis 34

Likelihood vs. Probability

• Symbol Pr(D|M) often used to represent the likelihood (I will be following convention in this regard)

Pr(D | M)Probabilities

are functions of the data (the

model is fixed)

L(M | D)

Likelihoods are functions of

models (the data is fixed)

Tuesday, April 12, 2011

Page 50: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 35

The posterior is generally more informative than the prior (data contains information)

p

uniform prior density

posterior density

= posterior probability (mass)

Mathematic

a example

Tuesday, April 12, 2011

Page 51: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 36

Beta prior gives more flexibility

Beta(2,2) prior density

posterior density

Posterior probability of p between 0.45 and 0.55 is 0.223

Note:• prior is vague but not flat• posterior has smaller variance (i.e. it is more informative) than the prior

Tuesday, April 12, 2011

Page 52: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

Copyright © 2011 Paul O. Lewis 37

Usually there are many parameters...

Likelihood

Marginal probability of dataPosteriorprobability

density

Prior probabilitydensityA 2-parameter example

An analysis of 100 sequences under the simplestmodel (JC69) requires 197 branch length parameters.The denominator is a 197-fold integral in this case!

Now consider summing over all possible tree topologies!It would thus be nice to avoid having to calculate the

marginal probability of the data...

Tuesday, April 12, 2011

Page 53: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

ISC5317-Fall 2007-PB Computational Evolutionary Biology

influenced the posterior will be by the likelihood. Eventually, the posterior will be determined

entirely by the likelihood. For instance, if 10,000 throws still indicated that the coin had a heads

probability of 0.45, then the posterior would spike on that value even for a Beta(100, 100) prior

distribution that put a lot of probability on fair coins (p = 0.5).

6 Accumulating scientific knowledge

One of the most attractive aspects of Bayesian theory is that it provides a rational and consistent

framework for accumulating scientific knowledge. According to the Bayesian view, knowledge is

accumulated by successively updating probability distributions to take new data into account. In

fact, we can show that a successive series of Bayesian updates based on stepwise accumulation of

evidence is mathematically equivalent to a single update based on all the evidence. Assume that

we start with an initial prior P (θ) and an initial set of observations D1. The posterior probability

distribution after this step would be:

P (θ|D1) =P (θ)P (D1|θ)

P (D1)

If we use this posterior probability distribution as the prior in the next step, where we take a second

set of observations D2 into account, we get:

P (θ|D2, D1) =P (θ|D1)P (D2|θ)

P (D2)

=P (θ)P (D1|θ)P (D2|θ)

P (D1)P (D2)

=P (θ)P (D1, D2|θ)

P (D1, D2)

The last rearrangement can be made because D1 and D2 are independent sets of observations.

Note that the last equation is the same as applying Bayes’ theorem directly on the collected set

of observations, D1 + D2. In other words, there is a rational procedure for accumulating scientific

knowledge in the Bayesian framework and the end result is the same regardless of the order or

exact way in which the data are added as long as the posterior probability distribution obtained in

one analysis is used as the prior in the next analysis. In maximum likelihood analyses, background

knowledge is typically ignored.

7

Accumulation of scientific knowledge

Tuesday, April 12, 2011

Page 54: Probabilities, Likelihood, Bayesian inferencepbeerli/classes/isc5315-notes/rp1_slides.pdfProbabilities are attached to intervals (i.e. ranges of values), not individual values The

ISC5317-Fall 2007-PB Computational Evolutionary Biology

influenced the posterior will be by the likelihood. Eventually, the posterior will be determined

entirely by the likelihood. For instance, if 10,000 throws still indicated that the coin had a heads

probability of 0.45, then the posterior would spike on that value even for a Beta(100, 100) prior

distribution that put a lot of probability on fair coins (p = 0.5).

6 Accumulating scientific knowledge

One of the most attractive aspects of Bayesian theory is that it provides a rational and consistent

framework for accumulating scientific knowledge. According to the Bayesian view, knowledge is

accumulated by successively updating probability distributions to take new data into account. In

fact, we can show that a successive series of Bayesian updates based on stepwise accumulation of

evidence is mathematically equivalent to a single update based on all the evidence. Assume that

we start with an initial prior P (θ) and an initial set of observations D1. The posterior probability

distribution after this step would be:

P (θ|D1) =P (θ)P (D1|θ)

P (D1)

If we use this posterior probability distribution as the prior in the next step, where we take a second

set of observations D2 into account, we get:

P (θ|D2, D1) =P (θ|D1)P (D2|θ)

P (D2)

=P (θ)P (D1|θ)P (D2|θ)

P (D1)P (D2)

=P (θ)P (D1, D2|θ)

P (D1, D2)

The last rearrangement can be made because D1 and D2 are independent sets of observations.

Note that the last equation is the same as applying Bayes’ theorem directly on the collected set

of observations, D1 + D2. In other words, there is a rational procedure for accumulating scientific

knowledge in the Bayesian framework and the end result is the same regardless of the order or

exact way in which the data are added as long as the posterior probability distribution obtained in

one analysis is used as the prior in the next analysis. In maximum likelihood analyses, background

knowledge is typically ignored.

7

Tuesday, April 12, 2011