introduction to bayesian statistics

Introduction to Bayesian Statistics

Machine Learning and Data MiningPhilipp Singer

CC image courtesy of user mattbuck007 on Flickr

Conditional Probability

Conditional Probability

Probability of event A given that B is true

P(cough|cold) > P(cough)

Fundamental in probability theory

Before we start with Bayes ...

Another perspective on conditional probability

Conditional probability via growing trimmed trees

https://www.youtube.com/watch?v=Zxm4Xxvzohk

Bayes Theorem

Bayes Theorem

P(A|B) is conditional probability of observing A given B is true

P(B|A) is conditional probability of observing B given A is true

P(A) and P(B) are probabilities of A and B without conditioning on each other

Visualize Bayes Theorem

Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/

All possible
outcomesSome event


All people
in studyPeople having
cancer


All people
in studyPeople where
screening test
is positive


People having
positive screening
test and cancer


Given the test is positive, what is the probability that said person has cancer?


Given the test is positive, what is the probability that said person has cancer?


Given that someone has cancer, what is the probability that said person had a positive test?

Example: Fake coin

Two coinsOne fair

One unfair

What is the probability of having the fair coin after flipping Heads?

CC image courtesy of user pagedooley on Flickr

Example: Fake coin


Example: Fake coin


Update of beliefs

Allows new evidence to update beliefs

Prior can also be posterior of previous update

Example: Fake coin


Belief update

What is probability of seeing a fair coin after we have already seen one Heads

Bayesian Inference

Source: https://xkcd.com/1132/

Bayesian Inference

Statistical inference of parameters

ParametersDataAdditional
knowledge

Coin flip example

Flip a coin several times

Is it fair?

Let's use Bayesian inference

Binomial model

Probability p of flipping heads

Flipping tails: 1-p

Binomial model

Prior

Prior belief about parameter(s)

Conjugate priorPosterior of same distribution as prior

Beta distribution conjugate to binomial

Beta prior

Beta distribution

Continuous probability distribution

Interval [0,1]

Two shape parameters: and If >= 1, interpret as pseudo counts

would refer to flipping heads

Beta distribution

Beta distribution

Beta distribution

Beta distribution

Beta distribution

Posterior

Posterior also Beta distribution

For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

Posterior

AssumeBinomial p = 0.4

Uniform Beta prior: =1 and =1

200 random variates from binomial distribution (Heads=80)

Update posterior

Posterior


Biased Beta prior: =50 and =10

200 random variates from binomial distribution (Heads=80)

Update posterior

Posterior

Convex combination of prior and data

The stronger our prior belief, the more data we need to overrule the prior

The less prior belief we have, the quicker the data overrules the prior

Posterior


Biased Beta prior: =4 and =6

Random variates from binomial distribution

Update posterior

So is the coin fair?

Examine posterior95% posterior density interval

ROPE [1]: Region of practical equivalence for null hypothesis

Fair coin: [0.45,0.55]

95% HDI: (0.33, 0.47)

Cannot reject null

More samples we can

[1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.

Bayesian Model Comparison

Parameters marginalized out

Average of likelihood weighted by prior

Evidence

Bayesian Model Comparison

Bayes factors [1]

Ratio of marginal likelihoods

Interpretation table by Kass & Raftery [1]

>100 decisive evidence against M2

[1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
Journal of the american statistical association 90.430 (1995): 773-795.


Null hypothesis

Alternative hypothesisAnything is possible

Beta(1,1)

Bayes factor


n = 200

k = 80

Bayes factor

(Decent) preference for alt. hypothesis

Other priors

Prior can encode (theories) hypotheses

Biased hypothesis: Beta(101,11)

Haldane prior: Beta(0.001, 0.001)u-shaped

high probability on p=1 or (1-p)=1

Frequentist approach


Binomial test with null p=0.5one-tailed

0.0028

Chi test

Posterior prediction

Posterior mean

If data largeconverges to MLE

MAP: Maximum a posterioriBayesian estimator

uses mode

Bayesian prediction

Posterior predictive distribution

Distribution of unobserved observations conditioned on observed data (train, test)

Frequentist
MLE

Alternative Bayesian Inference

Often marginal likelihood not easy to evaluateNo analytical solution

Numerical integration expensive

AlternativesMonte Carlo integrationMarkov Chain Monte Carlo (MCMC)

Gibbs sampling

Metropolis-Hastings algorithm

Laplace approximation

Variational Bayes

Bayesian (Machine) Learning

Bayesian Models

Example: Markov Chain ModelDirichlet prior, Categorical Likelihood

Bayesian networks

Topic models (LDA)

Hierarchical Bayesian models

Generalized Linear Model

Multiple linear regression

Logistic regression

Bayesian ANOVA

Bayesian Statistical Tests

Alternatives to frequentist approaches

Bayesian correlation

Bayesian t-test

Questions?

Philipp [email protected]

Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf

introduction to bayesian statistics

Education