markov chain monte carlo methods

8/16/2019 Markov Chain Monte Carlo Methods

1/20

MARKOV CHAIN MONTE CARLO METHODS

12.1 INTRODUCTION

Bayesian inference is an important area in statistics and has also foundapplications in various disciplines. One of the main ingredients of Bayesianinference is the incorporation of prior information via the specification of prior distributions. As information flows freely in financial markets, incorporating prior information with Bayesian ideas constitutes a natural approach. In this finalchapter, we briefly introduce the essence of Bayesian statistics with reference torisk management. In particular, we discuss the celebrated Markov Chain MonteCarlo MCMC! method in detail and illustrate its uses via a case study.

12.2 BAYESIAN INFERENCE

"he essence of the Bayesian approach is to incorporate uncertainties for theunknown parameters. #redictive inference is conducted via the $oint probabilitydistribution of the parameters % &, ',(, r!, conditional on the observabledata ) % )&,(, )n!. "he $oint distribution is deduced from the distribution of observable *uantities via Baye+s theorem. Many e)cellent te)ts have been

written about the Bayesian paradigm see, for e)ample, -e root &/01!, Bo)and "iao &/02!, Berger &/34!, O+5agan &//6!, Bernardo and 7mith '111!,8ee '116!, and 9obert '11&!, to name $ust a few. "say '1&1! providessuccinct introduction to Bayesian inference for time series.

"he observational or sampling! distribution f ):! is the likelihood function.;nder the Bayesian framework, a prior distribution p ! is specified for theparameter . Inferences are conducted on the basis of the posterior distribution

:)! according to the following identity<

=here f )! is the marginal density such that


2/20

"he probability density function :)! is known as the posterior density function.Because ) is observed, the marginal density in >*uation &'.& is a constant. It ismore convenient to e)press >*uation &'.& as

=here 8 ! % f ): ! is the likelihood function. One way to estimate is tocompute the posterior mean of , that is

"he prior and posterior are relative to the observables. A posterior distributionconditional on ) can be used as a prior for a new observation y. "his processcan be iterated and eventually leads to a new posterior via Baye+s theorem. =eillustrate this idea with a concrete e)ample.

Example 12.1 7uppose that we observe )&,(, )n independent randomvariables each ? , '! with unknown and ' known. >stimate in a Bayesiansetting. "he likelihood function is

=here ) is the sample mean of the observation. It appears natural to assumethat follows a normal distribution by specifying the prior p ! ? m, '!,where m and ' are known as hyper parameters. 7ubstituting this prior into>*uation &'.', we have


3/20

=here

>*uivalently

"he posterior mean % > ! % m& is an estimate of given ). ?ote that m&tends to the sample mean ) and ' & tends to @ero as the number of observations increases. In most cases, the prior distribution plays a lesser rolewhen the sample si@e is large. Another interesting observation is that the prior

contains less information as ' increases. =hen ' , p ! constant, and :)! % ? ), ' n!. 7uch a prior is known as a noninformative prior, as itprovides no information about the distribution of . "here are many ways tospecify a prior distribution in the Bayesian setting. 7ome prefer noninformativepriors, and others prefer priors that are analytically tractable. Con$ugate priorsare adopted to address the latter concern. iven a likelihood function, thecon$ugate prior distribution is a prior distribution such that the posterior

distribution belongs to the same class of distributions as the prior. Con$ugatepriors and posterior distributions are differed through hyper parameters.>)ample &'.& serves as a good e)ample. Con$ugate priors facilitate statisticalinferences because the posterior distributions belong to the same family as theprior distributions, which are usually in familiar forms. Moreover, updatingposterior distributions with new information becomes straightforward, as onlythe hyper parameters have to be updated. In the oneDdimensional case,

deriving con$ugate priors is relatively simple when the likelihood belongs to thee)ponential family. Con$ugacy within the e)ponential family is discussed in 8ee'116!. "able &'.& summari@es some of the commonly used con$ugate families.5erein, Be denotes the Beta distribution, the amma distribution, I theinverse amma distribution, and ? the ?ormal distribution.

12.3 SIMULATING OSTERIORS

Bayesian inference makes use of simulation techni*ues to estimate the

parameters naturally. As shown in >*uation &'.2, calculating a posterior mean


4/20

is tantamount to &31 MA9EOF C5AI? MO?"> CA98O M>"5O-7 numericallyevaluating an integral. It is not surprising, therefore, that Monte Carlo simulationplays an important role. "he integration in >*uation &'.2 is usually an improper integral integration over an unbounded region!, which renders standardnumerical techni*ues useless. Although numerical *uadrature can be used tobypass such a difficulty in the oneDdimensional case, applying *uadrature inhigher dimensions is far from simple. Ginancial modeling usually involves higher dimensions. Monte Carlo simulation with importance sampling simplifies thecomputation of >*uation &'.2. As it may be difficult to generate randomvariables from the posterior distribution :)! directly, we may take advantage of the fact that importance sampling enables us to compute integrations with aconveniently chosen density. Consider

=here * ! is a prior specified density function that can be generated easily.-rawing n random samples i from * !, we appro)imate the posterior mean by

?ote that the importance sampling is not used as a variance reduction device inthis case rather, it is applied to facilitate the computation of the posterior mean."he variance of the computation can be large in some cases

12.! MARKOV CHAIN MONTE CARLO

One desirable feature of combining Markov chain simulation with Bayesianideas is that the resulting method can handle highDdimensional problemsefficiently. Another desirable feature is to draw random samples from theposterior distribution directly. "he MCMC methods are developed with these twofeatures in mind.

12.!.1 G"##$ Sampl"%&

ibbs sampling is probably one of the most commonly used MCMC methods. It

is simple, intuitive, easily implemented, and designed to handle


5/20

multidimensional problems. "he basic limit theorem of Markov chain serves asthe theoretical building block to guarantee that draws from a ibbs samplingagree with the posterior asymptotically. Although con$ugate priors are useful inBayesian inference, it is difficult to construct a $oint con$ugate prior for severalparameters. Gor a normal distribution with both mean and variance unknown,deriving the corresponding con$ugate prior can be challenging. 5owever,conditional con$ugate priors can be obtained relatively easily see, for e)ample,ilks, 9ichardson, and 7piegelhalter &//4!. Conditioning on MA9EOF C5AI?MO?"> CA98O &3& other parameters, a conditional con$ugate prior is onedimensional and has the same distributional structure as the conditionalposterior. ibbs sampling takes advantage of this fact and offers a way toreduce a multidimensional problem to an iteration of lowDdimensional problems.7pecifically, let ) % )&,(, )n! be the data and let the distribution of each )i begoverned by r parameters, % &, ',(, r!. Gor each $ % &,(,r, specify theoneDdimensional conditional con$ugate prior p $ ! and construct the conditionalposterior by means of Baye+s theorem. "hen iterate the ibbs procedure asfollows. 7et an initial parameter vector 1! ' ,(, 1! r !. ;pdate theparameters by the following procedure< H 7ample&! & p &: 1! ' ,(, 1!r , )! H 7ample &! ' p ': &! & , 1! 2 ,(, 1! r , )! H 7ample &! r

p r: &! & , &! ' ,(, &! r &, )!. "his completes one ibbs iteration, andthe parameters are updated to &! & ,(, &! r !. ;sing these new parametersas starting values, repeat the iteration and obtain a new set of parameters '! &,(, '! r !. 9epeating these iterations M times, we get a se*uence of parameter vectors &! ,(, M! , where i! % i! & ,(, i! r !, for i % &,(, M.By virtue of the basic limit theorem of Markov chain, it can be shown that the

Markov chain J M! K has a limiting distribution converging to the $oint posterior p &, ',(, r:)! when M is sufficiently large see "ierney &//6!. "he number M is called the burnDin period. After simulating J ML&! ,ML'! ,(, MLn! Kfrom the ibbs sampling, Bayesian inference can be conducted easily. Gor e)ample, to compute the posterior mean, we evaluate


6/20

"o ac*uire a clearer understanding of ibbs sampling, consider the followinge)ample<

Example 12.2 One of the main uses of ibbs sampling is to generate

multivariate distributions that are usually hard to simulate by standard methods.=e present a simple e)ample to generate two correlated bivariate normalrandom variables & and ', where

"o use the ibbs sampling method, we construct a Markov chain J M! K that hasa limiting distribution converging to the bivariate normal distribution p&, '!."he ne)t &3' MA9EOF C5AI? MO?"> CA98O M>"5O-7 step is to find themarginal distribution of & given the value of '. By the conditional distributionformula, we have

Grom the aboveDmentioned functional form of the distribution function, we canconclude that, given ',

& : :' ? ', & '!.

7imilarly, for &, we have' : : & ? &, & '!

By taking the initial guess of 1! ' to be the mean 1, the normal randomvariables are generated by the following steps<

7tep &< 7et i % & and 1! ' % 1.

7tep '< enerate & ? 1, &! and set i! & % i &! ' L N& ' &.

7tep 2< enerate ' ? 1, &! and set i! ' % i! & L N& ' '.


7/20


8/20


9/20

Example 12.! 7imulate &11 sample paths from the asset price dynamics of >*uation &'.0 with the parameters % 1.13, % 1.6, % 2.4,s % 1.2, and k % 1.>ach sample path replicates the daily logDreturns of a stock over a &Dyear hori@on. On the basis of these &11 paths, estimate the values of , , , s, and kwith ibbs sampling. Compare the results with the input values. 7imulatingpaths 7ample paths are simulated by assuming n % '41 trading days a year, sothe discreti@ation >*. &'.3! has St % & '41. On each path, the logDasset price ateach time point is generated as follows<

=here ? 1, &! and ; ; 1, &! are independent random variables. "osimplify the notations, we denote )i % log 7iL& log 7i. A graph of three samplepaths is given in Gigure &'.&.

ibbs sampling "here are five parameters in the model, so we have to developfive conditional con$ugate priors from their conditional likelihood functions. 8etus proceed step by step.

&. Conditional prior and posterior for . Other things being fi)ed, thelikelihood function of happens to be proportional to a normal density.7pecifically,


10/20

"herefore, a normal distribution ? m, '! is suitable for as a conditionalcon$ugate prior. "he posterior distribution can be immediately obtained as

'. Conditional prior and posterior for '. "he conditional likelihood functionof ' is

=e select I , ! as the conditional prior for '. "hen, the posterior distributionbecomes

2. Conditional prior and posterior for . "he conditional likelihood of is

where ? is the total number of $umps in the hori@on. Grom "able &'.&, we findthat the appropriate con$ugate prior is Be a, b!. 7imple computation shows thatthe posterior distribution is

6. Conditional prior and posterior for k. As k is the mean of the normal $umpsi@e, its prior and posterior are obtained in the same manner as . =e state theresult without proof. "he prior is ? mQ, ' Q !, and the posterior is given by


11/20

4. Conditional prior and posterior for s'. Ass' is the variance of the normal $umpsi@e, its prior and posterior are obtained in the same manner as '. "he prior isI Q, Q !, and the posterior is given by

"he aforementioned priors and posteriors are distributions conditional on values

of Qi and S?i. "his complicates the ibbs sampling procedure because only )iis observable for all i. "herefore, at each time point ti, Qi, and S?i should besimulated from the distributions conditional on the observed value of )i beforesubstituting them into the priors or posteriors. =e need the following facts<

which together with Baye+s theorem show that

"he $ump si@e Qi is necessary only when S?i % &. ;nder such a situation, werecogni@e that the conditional density function of Qi is

which implies

=ith all of the ingredients ready, the ibbs sampling starts by choosing theinitial values of 1, ' 1 , k1, 1, and s' 1. =e also need initial values for Q 1! i


12/20

and S? 1! i , both of which can be obtained by a simulation with the initialparameters. "he ibbs sampling runs as follows<

7tep &< 7ample $ p $ : ' $ &, k$ &,s' $ &, $ &!, as given in >*uation &'./.

7tep '< 7ample $ p ' $ : $ , k$ &,s' $ &, $ &!, as given in >*uation &'.&1.

7tep 2< 7ample $ p $ : $ , ' $ , k$ &,s' $ &!, as given in >*uation &'.&&.

7tep 6< 7ample k$ p k$ : $ , ' $ ,s' $ &, $!, as given in >*uation &'.&'.

7tep 4< 7ample s' $ p s' $ : $ , ' $ , k$, $!, as given in >*uation &'.&2.

7tep < 7ample S? $! i p S? $! i : $ , ' $ , k$ ,s' $ ! as given in >*uation &'.&6for all i % &, ',(, n.

7tep 0< 7ample Q $! i p Q $! i : $ , ' $ , k$ ,s' $ !, as given in >*uation &'.&4 for the time point ti where S?i % &. 7tep 3< 7et $ % $ L & and go to 7tep &. 9epeatuntil $ % MT L M.

Inference is drawn by taking the sample means of the values of the last Msimulated parameters. "he FBA code is available online in the supplementarydocument. 9esults and comparisons "able &'.' shows our estimation results.

=e report the averaged posterior means over the &11 sample paths and thevariances. As the table shows, the estimates are close to the true values, andthe variances are small. ibbs sampling does a good $ob of estimating theparameters for $umpDdiffusion models. >)ample &'.6 shows the usefulness of ibbs sampling in estimating the $umpDdiffusion model. In practice, this

application can be crucial for a risk manager to assess how much risk is due to $umps. "o e)amine the $ump risk empirically, we estimate the effect of $umps onthe -ow Uones Industrial Inde). Our estimation is based on daily closing pricesover the &//4D'116 period. "he parameters are estimated on an annual basis.


13/20


14/20

transition probabilities converge to the posterior distribution. If the transitionprobabilities satisfy the time reversibility with M>"9O#O8I7V5A7"I? 7 A8 O9I"5M &3/ respect to $!, then its limiting distribution is guaranteed to bee*ual to $!. "o e)plain time reversibility, write the transition probabilities pi$ as

where p ii % 1, p i$ % pi$ for i X $, and ri % pii. If the e*uation

is satisfied for all i, then the probabilities pi$ are time reversible. "his conditionasserts that the probability of starting at i and ending at $ when the initialprobability is given by i! is the same as that of starting at $ and ending at i. Bysimple computation, we check that

"herefore, $! is the limiting distribution of the chain. In other words, a Markovchain whose limiting distribution is the posterior distribution can be constructedby finding a timeDreversible Markov chain. =e start this process by specifyingthe transition probabilities *i$. If the probabilities *i$ have already satisfied thetime reversibility, then the corresponding Markov chain is the one we want.Otherwise, suppose that

"hen, it has a higher probability of moving from i to $ than from $ to i. "herefore,we introduce a probability i$ to reduce the moves from i to $. =e would like tohave

so that


15/20

As we do not want to reduce the likelihood of moving from $ to i, we set $i % &.

"herefore, the general formula is

Grom >*uations &'.&0 and &'.&3, we see that the transition probabilities

are time reversible with respect to i! and hence define a Markov chain whoselimiting distribution is the re*uired one. "his method is called the MetropolisV5astings algorithm.

Example 12. Consider a random walk Markov chain<

All transition probabilities are 1.4, e)cept that the transitions Yfrom A to BZ andYfrom - to CZ are &. "he transition matri) of the chain is given by

On the basis of the MetropolisV5astings algorithm, construct a Markov chain

whose limiting distribution is & 6 , & 6 , & 6 , & 6 ! . A simple calculation shows that the limiting distribution of the original Markovchain is & , & 2 , & 2 , & ! . "o construct the desired Markov chain, we needto compute probabilities i$. Gor instance,

"his means that the transition probability Yfrom A to BZ is reduced from & to & [& ' % & ' . Gor node YA,Z the remaining transition probabilities correspond to the


16/20

event that no transition occurs. "ransition probabilities for the other nodes areobtained in the same manner. "he final transition matri) becomes

It is easy to verify that the limiting distribution of this Markov chain is & 6 , & 6 ,& 6 , & 6 ! .

"o apply the MetropolisV5astings algorithm for simulating a random variablewith the distribution !, we begin with any Markov chain Rk whose transition

density * Rk:Rk &! is easy to simulate and with a range similar to that of. Gor this Markov chain to have the desired limiting distribution !, we need to ad$ustthe transition density * Rk:Rk &! at each step k of the algorithm according tothe updating criteria in >*uation &'.&/ so that it is time reversible. "hat is, if thetransition probability from state Rk & to state Rk is too high, we reduce itsprobability by amount, then the new transition probability p Rk:Rk &! will forma timeDreversible Markov chain with a stationary distribution of !. "he

algorithm can be summari@ed as follows<7tep &< Choose a transition probability * to construct the Markov chain Rk.

7tep '< #ick an initial value for 1 and R1 and set k % &.

7tep 2< 7imulate Rk according to the probability law of * Rk:Rk &!.

7tep 6< If % * Rk &:Rk! Rk! * Rk:Rk &! Rk &! \ &, set k % Rk and go to7tep .

7tep 4< Otherwise, generate = ;]1, &^. If = _ , set k % Rk, otherwise set k% k & and Rk % Rk &.

7tep < 7et k % k L & and repeat 7tep ' until k is e*ual to a prespecified integer M.

Example 12.4 In the previous chapter, we showed how to generate a normalrandom variable, using the acceptanceDre$ection method, for e)ample. In thissection, we demonstrate how a normal random variable can be generated by


17/20

the MetropolisV5astings algorithm. 8et ? 1, &!. =e need to construct aMarkov chain that has a limiting distribution e*ual to a normal distribution.

8et Rk be a stochastic process such that for each k % 1, &, ',(, Rt is a double

e)ponential random variable that is Rk -ouble>)p &! with pdf, as follows<

iven the memoryless property of the double e)ponential,

it can be considered as a subclass of a Markov chain because the current state

is independent of all previous states. It takes a value from negative infinity topositive infinity, making it a good candidate to appro)imate the normal randomvariable. "he Rk is constructed as the initial distribution, and the transitionprobability will be ad$usted according to the MetropolisV5astings algorithm totransform it to a time reversible Markov chain as follows<

"he FBA code is available online on the book+s website. "he following theorem

$ustifies that the ibbs sampling algorithm is a special case of the MetropolisV5astings algorithm.

T,e/5em 12.1 ibbs sampling is a special case of the MetropolisV5astingsalgorithm in which every $ump is accepted with ` &.

=hen the conditional distribution of some parameters is not known e)plicitly, wecannot use ibbs sampling to update the parameters, but we can still use theMetropolisV5astings algorithm to estimate them. "he following e)ample


18/20

demonstrates the use of MetropolisV5astings in a discrete stochastic volatilitymodel

Example 12.6 In the following e)ample, we present a case study on a simple

discrete stochastic volatility 7F! model by using MCMC techni*ue to estimatethe model parameters.

8et yt % log 7t 7t & be the difference of the logDreturn of stock price betweentime t & and t, ht be the logDvolatility at time t, and t % &, ',(, n, where n is thenumber of observation. -enote y % y&, y',(, yn! and h % h&, h',(, hn!. =eassume the model follows<

where h& ? , '!. t and t are assumed to be independent and follownormal distribution with mean 1 and variance & as follows

for all t . "o sample the parameters, one of the possible ways is to performthe ibbs sampling algorithm as follows<

By Baye+s rule, we can derive the conditional posteriors as follows

=here p ! and p '! are independent priors. In this case, we take p ! ? , ! and p '! I , !, where I , ! denotes the inverse gammadistribution, ! M>"9O#O8I7V5A7"I? 7 A8 O9I"5M &/4 and ! are


19/20

hyperparameters specified by users. "o obtain the conditional posterior distribution for , we apply Baye+s rule as follows

=here

7imilarly, the conditional posterior distribution for' can be obtained as follows

=here

"o sample ht from p ht: , , y, h t!, we first derive its conditional posterior distribution as follows

"his density function is not easy to sample directly. One can use theacceptanceDre$ection method in Chapter by finding a density * ht! and a


20/20

constant c such that p ht! _ c* ht! for simulating the conditional posterior distribution. "he MetropolisV5astings algorithm provides an alternative way tosample this density easily. 8et Rt be a Markov chain with transition density asthe normal density * )t! e)pJ )t h! ' ' hK so that it does not depend onany of the previous states. 7pecifically, simulate a random variable )t from * )!,accept )t as h i! t with probability

Otherwise, set h i! t % h i &! t and move to sample h i! tL&. -ifferent choices of

* )! can lead to different efficiency of the algorithm. Interested readers mayrefer to Uac*uier et al. &//6! for details. In practice, the logDreturn can bead$usted to have a mean of @ero if we minus each of yt by the mean u % n t%&yt ! n. "hen, the meanDcorrected returns t % yt u can be applied directly inthis simple 7F model. 7ome software packages, such as =I?B;7, can performthe sampling conveniently for users.

markov chain monte carlo methods

Documents