introduction to bayesian statistics
TRANSCRIPT
Introduction to Bayesian Statistics
Machine Learning and Data MiningPhilipp Singer
CC image courtesy of user mattbuck007 on Flickr
Conditional Probability
Conditional Probability
Probability of event A given that B is true
P(cough|cold) > P(cough)
Fundamental in probability theory
Before we start with Bayes ...
Another perspective on conditional probability
Conditional probability via growing trimmed trees
https://www.youtube.com/watch?v=Zxm4Xxvzohk
Bayes Theorem
Bayes Theorem
P(A|B) is conditional probability of observing A given B is true
P(B|A) is conditional probability of observing B given A is true
P(A) and P(B) are probabilities of A and B without conditioning on each other
Visualize Bayes Theorem
Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
All possible
outcomesSome event
Visualize Bayes Theorem
All people
in studyPeople having
cancer
Visualize Bayes Theorem
All people
in studyPeople where
screening test
is positive
Visualize Bayes Theorem
People having
positive screening
test and cancer
Visualize Bayes Theorem
Given the test is positive, what is the probability that said person has cancer?
Visualize Bayes Theorem
Given the test is positive, what is the probability that said person has cancer?
Visualize Bayes Theorem
Given that someone has cancer, what is the probability that said person had a positive test?
Example: Fake coin
Two coinsOne fair
One unfair
What is the probability of having the fair coin after flipping Heads?
CC image courtesy of user pagedooley on Flickr
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
Update of beliefs
Allows new evidence to update beliefs
Prior can also be posterior of previous update
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
Belief update
What is probability of seeing a fair coin after we have already seen one Heads
Bayesian Inference
Source: https://xkcd.com/1132/
Bayesian Inference
Statistical inference of parameters
ParametersDataAdditional
knowledge
Coin flip example
Flip a coin several times
Is it fair?
Let's use Bayesian inference
Binomial model
Probability p of flipping heads
Flipping tails: 1-p
Binomial model
Prior
Prior belief about parameter(s)
Conjugate priorPosterior of same distribution as prior
Beta distribution conjugate to binomial
Beta prior
Beta distribution
Continuous probability distribution
Interval [0,1]
Two shape parameters: and If >= 1, interpret as pseudo counts
would refer to flipping heads
Beta distribution
Beta distribution
Beta distribution
Beta distribution
Beta distribution
Posterior
Posterior also Beta distribution
For exact deviation: http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
Posterior
AssumeBinomial p = 0.4
Uniform Beta prior: =1 and =1
200 random variates from binomial distribution (Heads=80)
Update posterior
Posterior
AssumeBinomial p = 0.4
Biased Beta prior: =50 and =10
200 random variates from binomial distribution (Heads=80)
Update posterior
Posterior
Convex combination of prior and data
The stronger our prior belief, the more data we need to overrule the prior
The less prior belief we have, the quicker the data overrules the prior
Posterior
AssumeBinomial p = 0.4
Biased Beta prior: =4 and =6
Random variates from binomial distribution
Update posterior
So is the coin fair?
Examine posterior95% posterior density interval
ROPE [1]: Region of practical equivalence for null hypothesis
Fair coin: [0.45,0.55]
95% HDI: (0.33, 0.47)
Cannot reject null
More samples we can
[1] Kruschke, John. Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press, 2014.
Bayesian Model Comparison
Parameters marginalized out
Average of likelihood weighted by prior
Evidence
Bayesian Model Comparison
Bayes factors [1]
Ratio of marginal likelihoods
Interpretation table by Kass & Raftery [1]
>100 decisive evidence against M2
[1] Kass, Robert E., and Adrian E. Raftery. "Bayes
factors."
Journal of the american statistical association 90.430 (1995):
773-795.
So is the coin fair?
Null hypothesis
Alternative hypothesisAnything is possible
Beta(1,1)
Bayes factor
So is the coin fair?
n = 200
k = 80
Bayes factor
(Decent) preference for alt. hypothesis
Other priors
Prior can encode (theories) hypotheses
Biased hypothesis: Beta(101,11)
Haldane prior: Beta(0.001, 0.001)u-shaped
high probability on p=1 or (1-p)=1
Frequentist approach
So is the coin fair?
Binomial test with null p=0.5one-tailed
0.0028
Chi test
Posterior prediction
Posterior mean
If data largeconverges to MLE
MAP: Maximum a posterioriBayesian estimator
uses mode
Bayesian prediction
Posterior predictive distribution
Distribution of unobserved observations conditioned on observed data (train, test)
Frequentist
MLE
Alternative Bayesian Inference
Often marginal likelihood not easy to evaluateNo analytical solution
Numerical integration expensive
AlternativesMonte Carlo integrationMarkov Chain Monte Carlo (MCMC)
Gibbs sampling
Metropolis-Hastings algorithm
Laplace approximation
Variational Bayes
Bayesian (Machine) Learning
Bayesian Models
Example: Markov Chain ModelDirichlet prior, Categorical Likelihood
Bayesian networks
Topic models (LDA)
Hierarchical Bayesian models
Generalized Linear Model
Multiple linear regression
Logistic regression
Bayesian ANOVA
Bayesian Statistical Tests
Alternatives to frequentist approaches
Bayesian correlation
Bayesian t-test
Questions?
Philipp [email protected]
Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf