1 bayesian methods with monte carlo markov chains ii henry horng-shing lu institute of statistics...
TRANSCRIPT
1
Bayesian Methods with Monte Carlo Markov
Chains II
Henry Horng-Shing LuInstitute of Statistics
National Chiao Tung [email protected]
http://tigpbp.iis.sinica.edu.tw/courses.htm
2
Part 3 An Example in Genetics
Example 1 in Genetics (1) Two linked loci with alleles A and a, and B
and b A, B: dominant a, b: recessive
A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab
3
A
B b
a B
A
b
a
1/2
1/2
a
B
b
A
A
B b
a 1/2
1/2
Example 1 in Genetics (2) Probabilities for genotypes in gametes
4
No Recombination Recombination
Male 1-r r
Female 1-r’ r’
AB ab aB Ab
Male (1-r)/2 (1-r)/2 r/2 r/2
Female (1-r’)/2 (1-r’)/2 r’/2 r’/2
A
B b
a B
A
b
a
1/2
1/2
a
B
b
A
A
B b
a 1/2
1/2
Example 1 in Genetics (3) Fisher, R. A. and Balmukand, B. (1928). The
estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.
More:http://en.wikipedia.org/wiki/Genetics http://www2.isye.gatech.edu/~brani/isyebayes/bank/handout12.pdf
5
Example 1 in Genetics (4)
6
MALE
AB (1-r)/2
ab(1-r)/2
aBr/2
Abr/2
FEMALE
AB (1-r’)/2
AABB (1-r) (1-r’)/4
aABb(1-r) (1-r’)/4
aABBr (1-r’)/4
AABbr (1-r’)/4
ab(1-r’)/2
AaBb(1-r) (1-r’)/4
aabb(1-r) (1-r’)/4
aaBbr (1-r’)/4
Aabbr (1-r’)/4
aB r’/2
AaBB(1-r) r’/4
aabB(1-r) r’/4
aaBBr r’/4
AabBr r’/4
Ab r’/2
AABb(1-r) r’/4
aAbb(1-r) r’/4
aABbr r’/4
AAbb r r’/4
Example 1 in Genetics (5) Four distinct phenotypes:
A*B*, A*b*, a*B* and a*b*. A*: the dominant phenotype from (Aa, AA, aA). a*: the recessive phenotype from aa. B*: the dominant phenotype from (Bb, BB, bB). b*: the recessive phenotype from bb. A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. Total: 16 combinations.
7
Example 1 in Genetics (6) Let , then
8
(1 )(1 ')r r
2( * *)
41
( * *) ( * *)4
( * *)4
P A B
P A b P a B
P a b
Example 1 in Genetics (7) Hence, the random sample of n from the
offspring of selfed heterozygotes will follow a multinomial distribution:
We know that and
So
9
2 1 1; , , ,
4 4 4 4Multinomial n
(1 )(1 '), 0 1/ 2,r r r
1/ 4 1
0 ' 1/ 2r
Bayesian for Example 1 in Genetics (1) To simplify computation, we let
The random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution:
10
1 2
3 4
( * *) , ( * *)
( * *) , ( * *)
P A B P A b
P a B P a b
1 2 3 4; , , ,Multinomial n
Bayesian for Example 1 in Genetics (2) If we assume a Dirichlet prior distribution
with parameters: to estimate probabilities for A*B*, A*b*, a*B* and a*b*.
Recall that A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination.We consider
11
1 2 3 4( , , , )D
1 2 3 4( , , , ) (9,3,3,1).
Bayesian for Example 1 in Genetics (3) Suppose that we observe the data of
.
So the posterior distribution is also Dirichlet with parameters
The Bayesian estimation for probabilities are
12
1 2 3 4, , , 125,18,20,24y y y y y
134,21,23,25D
1 2 3 4( , , , ) 0.660,0.103,0.113,0.123
Bayesian for Example 1 in Genetics (4) Consider the original model,
The random sample of n also follow a multinomial distribution:
We will assume a Beta prior distribution:
13
2 1( * *) , ( * *) ( * *) , ( * *) .
4 4 4P A B P A b P a B P a b
1 2 3 4
2 1 1( , , , ) ~ ; , , , .
4 4 4 4y y y y Multinomial n
1 2( , ).Beta
Bayesian for Example 1 in Genetics (5) The posterior distribution becomes
The integration in the above denominator,
does not have a close form.
14
1 2 3 41 2 3 4
1 2 3 4
( , , , | ) ( )( | , , , ) .
( , , , | ) ( )
P y y y y PP y y y y
P y y y y P d
1 2 3 4( , , , | ) ( )P y y y y P d
Bayesian for Example 1 in Genetics (6) How to solve this problem?
Monte Carlo Markov Chains (MCMC) Method!
What value is appropriate for ?
15
1 2,
16
Part 4 Monte Carlo
Methods
Monte Carlo Methods (1) Consider the game of
solitaire: what’s the chance of winning with a properly shuffled deck?
http://en.wikipedia.org/wiki/Monte_Carlo_method
http://nlp.stanford.edu/local/talks/mcmc_2004_07_01.ppt
?
Lose
Lose
Win
Lose
Chance of winning is 1 in 4!17
18
Monte Carlo Methods (2) Hard to compute analytically because
winning or losing depends on a complex procedure of reorganizing cards.
Insight: why not just play a few hands, and see empirically how many do in fact win?
More generally, can approximate a probability density function using only samples from that density.
19
Monte Carlo Methods (3) Given a very large set and a distribution
over it. We draw a set of i.i.d. random samples. We can then approximate the distribution
using these samples.
X
f(x)
1
11
Ni
NN
i
f x x x f xN
X
f x
N
20
Monte Carlo Methods (4) We can also use these samples to compute
expectations:
And even use them to find a maximum:
1
1 Ni
NN
i x
E g g x E g g x f xN
ˆ arg maxi
i
x
x f x
Monte Carlo Example be i.i.d. , find ? Solution:
Use Monte Carlo method to approximation> x <- rnorm(100000) # 100000 samples from N(0,1)
> x <- x^4> mean(x)[1] 3.034175
21
1, , nX X 0,1N 4( )iE X
2
4 4
- 4/2
1 4! 24E = exp 3
42 82 2 ( )!2
i
xX x dx
22
Part 5 Monte Carlo Method
vs. Numerical Integration
Numerical Integration (1) Theorem (Riemann Integral):
If f is continuous, integrable, then the Riemann Sum:
, where
is a constant
http://en.wikipedia.org/wiki/Numerical_integrationhttp://en.wikipedia.org/wiki/Riemann_integral
23
11
( )( )n
n i i ii
S f c x x
11,..,
max | |( )
i ii n
n b
n ax xS I f x dx
1,i i ic x x
Numerical Integration (2) Trapezoidal Rule:
equally spaced intervals
http://en.wikipedia.org/wiki/Trapezoidal_rule24
1
1
1
1 2 1
2
1 1 ( ) ( ) ( ) ( )
2 2
i
i i
ni i
Ti
n n
b ax a i
nb a
x x hn
f x f xS h
h f x f x f x f x
Numerical Integration (3) An Example of Trapezoidal Rule by R:
f <- function(x) {((x-4.5)^3+5.6)/1234}x <- seq(-100, 100, 1)plot(x, f(x), ylab = "f(x)", type = 'l', lwd = 2)temp <- matrix(0, nrow = 6, ncol = 4)temp[, 1] <- seq(-100, 100, 40);temp[, 2] <- rep(f(-100), 6)temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1])segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red")temp <- matrix(0, nrow = 5, ncol = 4)temp[, 1] <- seq(-100, 80, 40); temp[, 2] <- f(temp[, 1])temp[, 3] <- seq(-60, 100, 40); temp[, 4] <- f(temp[, 3])segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red")
25
Numerical Integration (4)
26
Numerical Integration (5) Simpson’s Rule:
Specifically,
One derivation replaces the integrand by the quadratic polynomial which takes the same values as at the endpoints a and b and the midpoint
27
46 2
b
a
b a a bf x dx f a f f b
f x
P x f x
2
a bm
Numerical Integration (6)
http://en.wikipedia.org/wiki/Simpson_rule
28
x m x b x a x b x a x mP x f a f m f b
a m a b m a m b b a b m
46 2
b
a
b a a bP x f a f f b
Numerical Integration (7) Example of Simpson’s Rule by R:
f <- function(x) {((x-4.5)^3+100*x^2+5.6)/1234}x <- seq(-100, 100, 1)p <- function(x) {
f(-100)*(x-0)*(x-100)/(-100-0)/(-100-100)+f(0)*(x+100)*(x-100)/(0+100)/(0- 100)+f(100)*(x-0)*(x+100)/(100-0)/(100+100)}matplot(x, cbind(f(x), p(x)), ylab = "", type = 'l', lwd = 2, lty = 1)temp <- matrix(0, nrow = 3, ncol = 4)temp[, 1] <- seq(-100, 100, 100); temp[, 2] <- rep(p(-60), 3)temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1])segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red")
29
Numerical Integration (8)
30
Numerical Integration (9) Two Problems in numerical integration:
1. , i.e. How to use numerical integration?
Logistic transform:2. Two or more high dimension integration?
Monte Carlo Method!
http://en.wikipedia.org/wiki/Numerical_integration
31
, ,b a f x dx
1
1 xy
e
Monte Carlo Integration ,
where is a probability distribution
function and let If , then
http://en.wikipedia.org/wiki/Monte_Carlo_integration
32
f x
I f x dx g x dxg x
g x
f xh x
g x
1,..., ~iid
nx x g x
1
1 n LLN
in
i
h x E h x h x g x dx f x dx In
Example (1)
(It is )
Numerical Integration by Trapezoidal Rule:
33
1 1010
0
181 ?
7 11x x dx
2
0.035087757
Example (2) Monte Carlo Integration:
Let , then
34
1 110 11 110 4 7 1
0 0
18 7 111 1
7 11 7 11x x dx x x x dx
~ 7,11x Beta
1 11 14 7 1 4
0
7 111
7 11x x x dx E x
1 104 4 10
01
1811
7 11
n LLN
in
i
x E x x x dxn
Example by C/C++ and R
35
> x <- rbeta(10000, 7, 11)> x <- x^4> mean(x)[1] 0.03547369
36
Part 6 Markov Chains
Markov Chains (1) A Markov chain is a mathematical model
for stochastic systems whose states, discrete or continuous, are governed by transition probability.
Suppose the random variable take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network.
37
1tX tX1tX 0X 1X ... ...
0 1, ,X X
Markov Chains (2) The current state in Markov chain only
depends on the most recent previous states.
Transition probability where
http://en.wikipedia.org/wiki/Markov_chainhttp://civs.stat.ucla.edu/MCMC/MCMC_tutorial/Lect1_MCMC_Intro.pdf
38
1 1 1 0 0
1
| , , ,
|
t t t t
t t
P X j X i X i X i
P X j X i
0 , , ,i i j
An Example of Markov Chains
where is initial state and so on. is transition matrix.
39
0 1
1,2,3,4,5
, , , ,tX X X X
0X
P
1 0.4 0.6 0.0 0.0 0.0
2 0.5 0.0 0.5 0.0 0.0
3 0.0 0.3 0.0 0.7 0.0
4 0.0 0.0 0.1 0.3 0.6
5 0.0 0.3 0.0 0.5 0.2
P
1 2 3 4 5
Definition (1) Define the probability of going from state i
to state j in n time steps as
A state j is accessible from state i if there are n time steps such that , where
A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i.
40
( ) |nij t n tp P X j X i
( ) 0nijp
0,1,n
i j
Definition (2) A state i has period if any return to
state i must occur in multiples of time steps.
Formally, the period of a state is defined as
If , then the state is said to be aperiodic; otherwise ( ), the state is said to be periodic with period .
41
( )gcd : 0niid i n P
d i d i
d i
1d i 1d i
Definition (3) A set of states C is a communicating class
if every pair of states in C communicates with each other.
Every state in a communicating class must have the same period
Example:
42
Definition (4) A finite Markov chain is said to be
irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.
Example:
43
Definition (5) A finite state irreducible Markov chain is
said to be ergodic if its states are aperiodic
Example:
44
Definition (6) A state i is said to be transient if, given
that we start in state i, there is a non-zero probability that we will never return back to i.
Formally, let the random variable Ti be the next return time to state i (the “hitting time”):
Then, state i is transient iff there exists a finite Ti such that:
45
0min : |i nT n X i X i
1iP T
Definition (7) A state i is said to be recurrent or
persistent iff there exists a finite Ti such that: .
The mean recurrent time . State i is positive recurrent if is finite;
otherwise, state i is null recurrent. A state i is said to be ergodic if it is
aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic.
46
1iP T i iE T
i
Stationary Distributions Theorem: If a Markov Chain is irreducible
and aperiodic, then
Theorem: If a Markov chain is irreducible and aperiodic, thenand
where is stationary distribution.
47
( ) 1 as , ,n
ijj
P n i j
! limj nn
P X j
, 1, j i ij ji i
P j
Definition (8) A Markov chain is said to be reversible, if
there is a stationary distribution such that
Theorem: if a Markov chain is reversible, then
48
,i ij j jiP P i j
j i iji
P
An Example of Stationary Distributions A Markov chain:
The stationary distribution is
49
2
1 3
0.4
0.3
0.3
0.3
0.7 0.70.3
0.7 0.3 0.0
0.3 0.4 0.3
0.0 0.3 0.7
P
1 1 1
3 3 3
0.7 0.3 0.01 1 1 1 1 1
0.3 0.4 0.33 3 3 3 3 3
0.0 0.3 0.7
Properties of Stationary Distributions Regardless of the starting point, the process
of irreducible and aperiodic Markov chains will converge to a stationary distribution.
The rate of converge depends on properties of the transition probability.
50
51
Part 7 Monte Carlo Markov
Chains
Applications of MCMC Simulation:
Ex:
where are known. Integration: computing in high dimensions.
Ex:
Bayesian Inference:Ex: Posterior distributions, posterior means…
52
11, ~ , 1n xxn
x y f x y c y yx
0,1,2, , , 0 1, ,x n y
1
0E g y g y f y dy
Monte Carlo Markov Chains MCMC method are a class of algorithms for
sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.
The state of the chain after a large number of steps is then used as a sample from the desired distribution.
http://en.wikipedia.org/wiki/MCMC
53
Inversion Method vs. MCMC (1) Inverse transform sampling, also known
as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf).
http://en.wikipedia.org/wiki/Inverse_transform_sampling_method
54
Inversion Method vs. MCMC (2) A random variable with a cdf , then has a
uniform distribution on [0, 1]. The inverse transform sampling method
works as follows:1. Generate a random number from the standard
uniform distribution; call this .2. Compute the value such that ; call this
.3. Take to be the random number drawn
from the distribution described by .
55
F F
u
F x ux
chosenx
chosenxF
Inversion Method vs. MCMC (3) For one dimension random variable,
Inversion method is good, but for two or more high dimension random variables, Inverse Method maybe not.
For two or more high dimension random variables, the marginal distributions for those random variables respectively sometime be calculated difficult with more time.
56
Gibbs Sampling One kind of the MCMC methods. The point of Gibbs sampling is that given a
multivariate distribution it is simpler to sample from a conditional distribution rather than integrating over a joint distribution.
George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
http://en.wikipedia.org/wiki/Gibbs_sampling
57
Example 1 (1) To sample x from:
where are known is a constant
One can see that
58
11, ~ , 1n xxn
x y f x y c y yx
0,1,2, , , 0 1, , ,x n y n
c
,
| 1 ~ ,n xxnf x y
f x y y y Binomial n yxf y
11,
| 1 ~ ,n xxf x y
f y x y y Beta x n xf x
Example 1 (2) Gibbs sampling Algorithm:
Initial Setting:
or a arbitrary value
For , sample a value from
Return59
0
0 0
~ 0,1
~ ,
y Uniform
x Bin n y
0,1
1
1 1
~ ,
~ ,
t t t
t t
y Beta x n x
x Bin n y
1 1,t tx y
,n nx y
0, ,t n
Example 1 (3) Under regular conditions: How many steps are needed for
convergence? Within an acceptable error, such as
is large enough, such as .60
, .t tt t
x x y y
10 20
1 11 0.001, .10 10
i i
t tt i t i
x xi
1000nn
Example 1 (4) Inversion Method:
is Beta-Binomial distribution.
The cdf of is that has a uniform distribution on [0, 1].
61
1
0~ ,
x f x f x y dy
n x n x
x n
0
x
i
n i n iF x
i n
x
x F x
Gibbs sampling by R (1)N = 1000; num = 16; alpha = 5; beta = 7tempy <- runif(1); tempx <- rbeta(1, alpha, beta)j = 0; Forward = 1; Afterward = 0while((abs(Forward-Afterward) > 0.0001) && (j <= 1000)){
Forward = Afterward; Afterward = 0for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)Afterward = Afterward+tempx
}Afterward = Afterward/N; j = j+1
}sample <- matrix(0, nrow = N, ncol = 2)for(i in 1:N){
tempy <- rbeta(1, tempx+alpha, num-tempx+beta)tempx <- rbinom(1, num, tempy)
62
Gibbs sampling by R (2)sample[i, 1] = tempx; sample[i, 2] = tempy
}sample_Inverse <- rbetabin(N, num, alpha, beta)write(t(sample), "Sample for Ex1 by R.txt", ncol = 2)Xhist <- cbind(hist(sample[, 1], nclass = num)$count,
hist(sample_Inverse, nclass = num)$count)write(t(Xhist), "Histogram for Ex1 by R.txt", ncol = 2)prob <- matrix(0, nrow = num+1, ncol = 2)for(i in 0:num){
if(i == 0){prob[i+1, 2] = mean(pbinom(i, num, sample[, 2]))prob[i+1, 1] = gamma(alpha+beta)*gamma(num+beta)prob[i+1, 1] = prob[i+1,
1]/(gamma(beta)*gamma(num+beta+alpha))}else{
63
Inverse method
Gibbs sampling by R (3)if(i == 1){
prob[i+1, 1] = num*alpha/(num-1+alpha+beta)for(j in 0:(num-2))
prob[i+1, 1] = prob[i+1, 1]*(beta+j)/(alpha+beta+j)
}else
prob[i+1, 1] = prob[i+1, 1]*(num-i+1)/(i)*(i-1+alpha)/(num-i+beta)
prob[i+1, 2] = mean((pbinom(i, num, sample[, 2])-pbinom(i-1, num, sample[, 2])))}if(i != num)
prob[i+2, 1] = prob[i+1, 1]}write(t(prob), "ProbHistogram for Ex1 by R.txt", ncol = 2)
64
Inversion Method by R (1)rbetabin <- function(N, size, alpha, beta){
Usample <- runif(N)
Pr_0 = gamma(alpha+beta)*gamma(size+beta)/gamma(beta)/gamma(size+beta+alpha)
Pr = size*alpha/(size-1+alpha+beta)for(i in 0:(size-2))
Pr = Pr*(beta+i)/(alpha+beta+i)Pr_Initial = Pr
sample <- array(0,N)CDF <- array(0, (size+1))CDF[1] <- Pr_0
65
Inversion Method by R (2)for(i in 1:size){
CDF[i+1] = CDF[i]+PrPr = Pr*(size-i)/(i+1)*(i+alpha)/(size-i-1+beta)
}for(i in 1:N){ sample[i] = which.min(abs(Usample[i]-CDF))-1}return(sample)
}
66
Gibbs sampling by C/C++ (1)
67
Gibbs sampling by C/C++ (2)
68
Inverse method
Gibbs sampling by C/C++ (3)
69
Gibbs sampling by C/C++ (4)
70
Inversion Method by C/C++ (1)
71
Inversion Method by C/C++ (2)
72
Plot Histograms by Maple (1) Figure 1:1000 samples with n=16, α=5 and
β=7.
73
Blue-Inversion method
Red-Gibbs sampling
Plot Histograms by Maple (2)
74
Probability Histograms by Maple (1) Figure 2:
Blue histogram and yellow line are pmf of x. Red histogram is from Gibbs
sampling.
75
1
1ˆ |m
i ii
P X x P X x Y ym
Probability Histograms by Maple (2) The probability histogram of blue histogram
of Figure 1 would be similar to the bule probability histogram of Figue 2, when the sample size .
The probability histogram of red histogram of Figure 1 would be similar to the red probability histogram of Figue 2, when the iteration n .
76
Probability Histograms by Maple (3)
77
Exercises Write your own programs similar to those
examples presented in this talk, including Example 1 in Genetics and other examples.
Write programs for those examples mentioned at the reference web pages.
Write programs for the other examples that you know.
78