introduction to monte carlo methodsmonte carlo simulation 6 all of the calculations reviewed earlier...

56
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) Introduction to Monte Carlo Methods Prof. Nicholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of Notre Dame Notre Dame, Indiana, USA Email: [email protected] URL: https://www.zabaras.com/ October 4, 2018 1

Upload: others

Post on 22-May-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Introduction to Monte Carlo Methods

    Prof. Nicholas Zabaras

    Center for Informatics and Computational Science

    https://cics.nd.edu/

    University of Notre Dame

    Notre Dame, Indiana, USA

    Email: [email protected]

    URL: https://www.zabaras.com/

    October 4, 2018

    1

    https://cics.nd.edu/mailto:[email protected]://www.zabaras.com/

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Contents

    2

    Review of the Bayesian inference framework, Introducing the Monte

    Carlo simulation, reviewing the central Limit Theorem and Law of Large

    Numbers, indicator functions, error approximations

    Examples showing convergence of the MC simulator, Generalization of

    the MC Estimator in High Dimensions

    Sample Representation of the Monte Carlo Estimator

    Deterministic vc MC integration, Using MC for Computing integrals,

    expectations and Bayes factors

    Following closely:

    C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)

    J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)

    J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)

    Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)

    http://books.google.com/books?id=HfhGAxn5GugC&dq=Monte+Carlo+statistical+methods,+google+books&printsec=frontcover&source=bn&hl=en&ei=JbxiTP6KDIK88gaK7bnsCQ&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCQQ6AEwAwhttp://books.google.com/books?id=HfhGAxn5GugC&printsec=frontcover&source=gbs_v2_summary_r&cad=0http://www.ceremade.dauphine.fr/~xian/Madrid.pdfhttp://videolectures.net/christian_robert/http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAAhttp://www.springerlink.com/content/x38261/http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    The Bayesian Model

    3

    Let us revisit our Bayesian model

    In most problems of interest there are no analytical closed forms for

    the posterior (using conjugate priors is one exception).

    Also note that the denominator in the Bayesโ€™ equation above implies

    the calculation of the following high-dimensional integral:

    ( ) ( | )f x d

    ( ) ( | )( | )

    ( ) ( | )

    f xx

    f x d

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    The Bayesian Model

    4

    Consider the posterior model

    Typical point estimates based on the posterior include the following:

    We are often interested in the mode of the posterior

    If ๐œƒ = (๐œƒ1, ๐œƒ2) and ๐œƒ2 is a nuisance parameter, marginal distributions are also often needed:

    1 1 2 2( | ) ( , | )x x d

    ( ) ( | )( | )

    ( ) ( | )

    f xx

    f x d

    22

    | ( | )

    | ( | ) |

    x x d

    Var x x d x

    ( | )x

    argmax ( | )MAP x

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    The Bayesian Model

    5

    Consider the posterior model

    The predictive distribution of ๐‘Œ, ๐‘Œ~๐‘”(๐‘ฆ|๐‘ฅ), and its mean , are:

    Similarly, for model selection, we have seen that the posterior takes the

    form:

    1

    ( ) ( ) ( | , )( , | )

    ( ) ( ) ( | , )

    k

    k k k kk

    k k k k k

    k

    k f x kk x

    k f x k d

    ( ) ( | )( | )

    ( ) ( | )

    f xx

    f x d

    ( | ) ( | ) ( | )

    | ( | ) ( | )

    g y x f y x d

    Y x yf y x d dy

    |Y x

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Monte Carlo Simulation

    6

    All of the calculations reviewed earlier require computing high-

    dimensional integrals.

    Monte Carlo simulation provides the means for effective calculation of

    these integrals and for resolving many more issues.

    Some historical (early) references on Monte Carlo Methods:

    N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341

    (1949)

    N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue

    (1987)

    http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Introducing Monte Carlo Simulation

    7

    Consider a 2 ร— 2 square as shown in the Figure below.

    Let an inscribed circle D of radius 1 in the square S.

    2S

    S

    D

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Introducing Monte Carlo Simulation

    8

    Let us consider dropping darts uniformly on the square S. This means that the probability of the dart falling in a subdomain ๐ด of S is proportional to the area of ๐ด.

    Let ๐ท = (๐‘ฅ, ๐‘ฆ) define a random variable on that represents the location of the drop of the dart. We have:

    Assume ๐‘ independent drops of the dart on the square S, i.e.

    ( ) Adxdy

    P D Adxdy

    S

    S

    1 2, ,..., ND D D

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Introducing Monte Carlo Simulation

    9

    A common sense estimation of the above probability would be

    where ๐‘ is the total number of dart falls.

    Can we give a statistical justification

    of this result?

    ( ) Adxdy

    P D Adxdy

    S

    ๐‘ƒ(๐ท โˆˆ ๐ด) โ‰ƒ๐‘‡๐‘–๐‘š๐‘’๐‘  ๐‘กโ„Ž๐‘’ ๐‘‘๐‘Ž๐‘Ÿ๐‘ก ๐‘“๐‘’๐‘™๐‘™ ๐‘–๐‘› ๐ด

    ๐‘

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Indicator Functions

    10

    Let us introduce the indicator function for the event ๐ด, i.e. let

    Let us compute the probability for

    The above results come immediately noticing that

    1 ,( , )A

    if drop point of dart d = (x, y) Ax y

    0, otherwise

    .D A

    ( , ) ( , )1

    ( ) ( , )4 4

    A A

    A

    x y dxdy x y dxdy

    P D A x y dxdydxdy

    S S

    S

    S

    \ \

    ( , ) ( , ) ( , ) 1 0 1A A AA A A A A

    x y dxdy x y dxdy x y dxdy dxdy dxdy dxdy S S S

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Indicator Functions

    11

    The probability density associated to ๐‘ƒ is ยผ - this is the density of the uniform distribution on ๐‘† denoted as

    Let us introduce the random variable where

    ๐‘‹, ๐‘Œ are the random variables representing the Cartesian coordinates of a uniformly distributed point ๐‘‘ = (๐‘ฅ, ๐‘ฆ) on S, where a dart falls.

    With this notation

    ( , ) ( , )1

    ( ) ( , )4 4

    A A

    A

    x y dxdy x y dxdy

    P D A x y dxdydxdy

    S S

    S

    S

    1( ) ( , ) ( )

    4 SAP d A x y dxdy V U

    S

    SU

    ( ) : ( ) : ( , ),A AV D D X Y

    ~ ,SD U

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Strong Law of Large Numbers

    12

    Let ๐‘‹๐‘– for ๐‘– = 1, 2, . . . , ๐‘ be independent and identically distritbutedrandom variables (i.i.d.) with mean ๐”ผ(๐‘‹๐‘–) = ๐œ‡ and variance ๐‘‰(๐‘‹๐‘–) =๐œŽ2 < โˆž, the strong LLN states for the sample mean the following:NX

    lim NX almost surelyN

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 13

    Let ๐‘‹๐‘– for ๐‘– = 1, 2, . . . , ๐‘ be independent and identically distritbutedrandom variables (i.i.d.) with mean ๐”ผ(๐‘‹๐‘–) = ๐œ‡ and variance ๐‘‰(๐‘‹๐‘–) =๐œŽ2 < โˆž, the weak LLN states that the sample mean is a random variable that converges to the true mean as , i.e.

    More formally:

    Note that

    NXN

    1

    N

    i

    iN

    X

    X as NN

    Pr 0 0NNlim X

    2

    2

    1 1 1 1

    2 2

    [ ] var[ ]

    [ ] var[ ]

    N N N N

    i i

    i i i iN N

    X X

    X and Xn n N N N

    Weak Law of Large Numbers

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    The Central Limit Theorem

    14

    Let ๐‘‹๐‘– for ๐‘– = 1, 2, . . . , ๐‘ be independent and identically distributed (i.i.d) random variables each with expectation ฮผ and variances Then:

    Under some weak conditions (such as finite variance), the sample

    mean has a limiting normal distribution:

    2.

    1

    1 1

    1 1 1[ ] ( )

    N

    i N Ni

    N i i

    i i

    X

    X X NN N N N

    22 21

    2 21 1

    1 1 1[ ] ( )

    N

    i N Ni

    N i

    i i

    X

    Var X Var Var X NN N N N N

    1

    N

    i

    iN

    X

    XN

    lim๐‘โ†’โˆž

    เดค๐‘‹๐‘ โˆ’ ๐œ‡

    ๐œŽ2

    ๐‘

    โˆผ ๐’ฉ(0,1)๐‘œ๐‘Ÿ ๐‘Ž๐‘  ๐‘ โ†’ โˆž เดค๐‘‹๐‘ โˆผ ๐’ฉ(๐œ‡,

    ๐œŽ2

    ๐‘)

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Back to Indicator Functions

    15

    Let ๐‘‹๐‘– for ๐‘– = 1, 2, . . . , ๐‘ be an indicator function for an event ๐น, i.e. let

    Then as a result of the law of large numbers,

    Note that herein we used that

    i

    1 if theoutcomeof theexperiment i is F,X

    0, otherwise

    1

    Pr( )N

    iN

    i

    XX F as N

    N

    1 Pr( ) 0 Pr( ) Pr( )iX F F F

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Let us introduce the random variables

    associated to the drops and consider the sum

    Assuming (๐‘โˆž), then the law of large numbers (since )yields

    where we already proved that

    When ๐‘โˆž, this justifies our intuitive result introduced earlier.

    Returning to Our Dart Drop Experiment

    16

    : ( ) : ( ), 1, 2., ,i i A iV V D D i N

    lim ( ) ( )SNN

    S V almost surely

    U

    1

    . . . : ( ), 1,2.,,

    i i

    N

    i

    iN

    Empirical averageof i i d V V D i N

    Vnumber of drops that fell in A

    SN N

    , 1,2,...,iD i N

    ( )S

    V U

    Pr ( )S

    D A V U

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Since is an unbiased estimator of

    To characterize the precision of the estimator, we can use the following:

    This means that

    Thus the mean square error between decreases as

    Mean Square Error

    17

    1( ) ,

    4 4P d dxdy

    D

    D

    121

    1 1( ) ( ) ( ) ( ) ( ' . . .)

    N

    N N i i

    i

    Var E Var S Var V Var V recall theV s are i i dN N

    NS .4

    4N N N

    Errorterm

    S is a random variable : S E

    2 2

    1

    1( ) ( ) Pr( ) ( )N N N NVar S S S S D Var V

    N

    D

    Pr( )NS and DD1

    N

    For our problem 2

    2

    1( ) ( ) ( ) 0.1685.4 4

    Var V P D P D

    D D

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Applying the central limit theorem (since ) :

    The probability of the error being larger than is given as:

    where

    One can similarly prove that for any integer ๐‘ โ‰ฅ 1 and ๐œ€ > 0,

    where and (see here).

    Properties of the Estimator

    18

    ( )Var V

    ( ),

    4

    N

    d

    Var VS

    NN

    var( )2

    V

    N

    var( )2

    var( )Pr 2 2 1 2 1 2 0.0456

    4 var( )N

    V

    V NS

    N V

    N

    .x the CDF of 0,1 N

    2

    2var( )

    2

    var( )Pr | | 2 1

    4 var( ) var( )2

    N

    V

    N

    VS erfc Ce

    NV V

    N N

    1

    12 2

    xx erf

    2

    :xe

    for large x erfc xx

    http://en.wikipedia.org/wiki/Standard_normal_tablehttp://mathworld.wolfram.com/Erf.html

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Coefficient of Variation

    19

    The coefficient of variation for our Monte Carlo estimator is given as:

    For different COVs, the number of samples needed is given as:

    With all of the above error estimates, we conclude that the

    approximation error varies as .

    (1 ) 4(1 )var 0.52334 4 4. .

    ( )4

    N

    N

    N

    SCoeff of Var COV S

    S N NN

    10% 28

    5% 110

    1% 2739

    COV N

    COV N

    COV N

    1O

    N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    An alternative non-asymptotic error estimatea using a Bernstein type

    inequality is:

    Using this result, we can show that:

    Alternatively, for any , using the Eq. above we can write:

    These expressions for the approximation error show that it is inversely

    proportional to

    Properties of the Estimator

    20

    2

    log(2 / )(0,1],

    4 2

    NP S for N

    1N

    log(40)Pr 0.05

    4 2NS

    N

    .NaFrom Statistical Computing & Monte Carlo Methods, A. Doucet.

    221 0, 24

    N

    NN and P S e

    http://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)#cite_ref-3http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Convergence of as a function of ๐‘ (one realization)

    Convergence of the Simulator

    21

    4NS

    See here for a

    C++

    implementation

    (Meanerror_plt)

    https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Convergence of as a function of ๐‘ (one hundred realizations)

    Convergence of the Simulator

    22

    4NS

    See here for a

    C++

    implementation

    (Meanerror.plt)

    https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Square root empirical mean square error across 100

    realizations as a function of ๐‘ and (dotted)

    Convergence of the Simulator

    23

    4NS

    ( )Var V

    N

    See here for a

    C++

    implementation

    (Var_error.plt)

    https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Convergence of the Estimator

    24

    Monte Carlo estimator of the area of a unit circle: (a) Circles: MC mean averaged over the S=100

    runs; (b) Theoretical error bars using the estimator discussed earlier; (c) Average of the MC

    numerical error bars (from the S=100 MC runs for each N). MatLab implementation is given here.

    1

    1

    1( ) ( )

    1( ) ( )

    S

    i

    S

    i

    Mean Area Estimator at each N circles MC Area Estimator of circle run iS

    Sampled Area Error Bars colored area Numerical Area Error Bars run iS

    24 4

    ( ) /

    Nour S MC estimatorr

    MC Estimator of the area A of circle run i Area of square Number of counts of samples falling in circle N

    2

    2

    1

    1 1( ) 4 ( )

    1

    N

    A j N

    Area of Square jIndicator function

    Numerical Area Error Bars run i r D SN N

    2

    2( ) 44 4

    N

    Area of Square

    exact std of our S MC estimator

    Theoretical Area Error Bars solid lines r N

    https://www.dropbox.com/s/6vl1rfq457k78xi/AreaCircle.rar?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Consider now the case where We assume a

    hypercube S nฮธ and the inscribed hyperball D nฮธ in

    We can build the same estimator as in 2๐ท. Our indicator function now becomes .

    All of our earlier derivations are still applicable. The rate of

    convergence of the estimator in the mean square sense is

    independent of n and equal to

    This is not the case using a deterministic method on a grid of regularly

    spaced points where the convergence rate is typically of the form

    1/Nr/n where ๐‘Ÿ is related to the smoothness of the contours of D nฮธ.

    Monte Carlo methods are thus attractive when n is large.

    Generalization of the Dart Drop Experiment

    25

    , 1.n

    n .

    n DD

    1 .N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Assume you are interested in computing the volume of the

    hypersphere of radius R = 1 in nโˆ’dimensions:

    We want to do so using samples from the hypercube of

    volume

    Using ๐‘ samples, the variance of our MC estimator is

    Generalization of the Dart Drop Experiment

    26

    2

    0

    12

    n

    nvol as nn

    S

    1

    .2

    n n n

    n

    n n

    p p pVar Xas n

    N N N

    volwith p

    S

    1,1n

    2 .

    n

    ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’: ๐‘‹ โˆผ โ„ฌโ„ฏ๐“‡๐“ƒโ„ด๐“Šโ„“โ„“๐’พ ๐‘๐‘›๐œƒ

    http://en.wikipedia.org/wiki/N-spherehttp://en.wikipedia.org/wiki/Bernoulli_distribution

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    The coefficient of variation is then:

    To obtain a reasonable relative error, we would need

    For this choice:

    For

    Clearly a plain Monte Carlo approach is not as helpful in 20 and higher dimensions! You are trying in a huge volume of the hypercube to hit

    the infinitesimal hypersphere!

    Monte Carlo and the Curse of Dimensionality

    27

    1

    n

    n n

    pVar X

    N NCOV as n

    X p Np

    1100 .nN p

    0.1COV

    1 20 2010 1

    9

    0

    11 10!20, 100 100 2 100 2 4.06 10nn N p

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    To obtain a fixed precision in the variance, a number of samples that

    is exponential in the dimension is needed!

    So using a plain Monte Carlo method still suffers from the curse of

    dimensionality

    MC not appropriate for rare events modeling.

    Generalization of the Dart Drop Experiment

    28

    1, ( ) , 1.

    : .

    n

    n

    Consider : X and Var X C with

    Var X CThen For N

    N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Assume i.i.d. samples

    Now consider any set and assume that we are interested in

    We naturally choose the following estimator:

    which by the law of large numbers is a consistent estimator of

    since

    Generalization

    29

    1N

    A

    ( ) ~ ( 1,2,..., )i i N

    ( ) ( ), ~ .A P A for

    ( )A

    ( )

    1

    1lim ( ) ( ( )) ( )

    Ni

    A AN

    i

    AN

    ๐œ‹(๐ด) โ‰ƒ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’๐‘  ๐‘–๐‘› ๐ด

    ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’๐‘ 

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Now we generalize the idea to tackle the generic problem

    of estimating

    where and ๐œ‹ is a probability distribution on

    We assume that and that cannot be

    calculated analytically.

    Generalization

    30

    ( ( )) ( ) ( )f f d

    : fn

    f .xn

    ( ( ) )f ( ( ))f

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    To evaluate we consider the unbiased estimator

    From the law of large numbers will converge and

    A good measure of the approximation is the variance of

    From the central limit (if ), we have:

    Generalization

    31

    ( )

    1

    1( ) ( )

    Ni

    N

    i

    S f fN

    ( )

    1

    ( )1( ) ( )

    Ni

    N

    i

    Var fVar S f Var f

    N N

    ( )

    1

    1lim ( ) ( ( )) ( . .)

    Ni

    Ni

    f f a sN

    ( ( )),f

    ( )NS f

    ( ),NS f

    var ( )f

    ( ) ( ) 0, ( )

    ( )( ) ( ) ,

    N

    Nd

    N

    Nd

    N S f f Var f or

    Var fS f f

    N

    N

    N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Generalization

    32

    The rate of convergence is independent of the dimension of

    Integration in complex domains is now not a problem.

    The method is easy to implement and rather general. You will need

    to be able to evaluate

    to be able to produce samples distributed according to

    ( )f

    Some historical references on Monte Carlo Methods:

    N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341

    (1949)

    N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue

    (1987)

    http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Sample Representation of the MC Estimator

    33

    Let us introduce the Dirac-Delta function for defined for

    any as follows:

    Note that this implies in particular that for

    For we can introduce the following mixture of Delta-

    Dirac functions:

    which is the empirical measure, and consider for any

    0( )

    0 0( ) ( ) ( )f d f

    ( ) ~ , 1,..., ,i i N

    0

    : fn

    f

    A

    0 0 0( ) ( ) ( ) ( )A A

    A

    d d

    A

    เทœ๐œ‹๐‘(๐œƒ):=1

    ๐‘

    ๐‘–=1

    ๐‘

    เตฏ๐›ฟ๐œƒ ๐‘– (๐œƒ

    เทœ๐œ‹๐‘(๐ด) โ‰œ เถฑ

    ๐ด

    เทœ๐œ‹๐‘ ๐œƒ ๐‘‘๐œƒ =เท

    ๐‘–=1

    ๐‘

    เถฑ

    ๐ด

    1

    ๐‘๐›ฟ๐œƒ ๐‘– (๐œƒ) ๐‘‘๐œƒ =

    1

    ๐‘

    ๐‘–=1

    ๐‘

    เตฏ๐•€๐ด(๐œƒ๐‘– = ๐‘†๐‘(๐ด) (# ๐‘œ๐‘“ ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’๐‘  ๐‘–๐‘›๐ด)

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Sample Representation of

    34

    The concentration of points in a given region of the space represents

    This is contrast with parametric statistics where one starts with

    samples and then introduces a distribution with an algebraic

    representation of the underlying population.

    Note that each sample has a weight of 1/๐‘, but that it is also possible to consider weighted sample representations of .

    ( )i

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Sample Representation of

    35

    Sample representation of a Gaussian distribution is shown below.

    See here for a MatLab Implementation

    Software/GaussianSampleInterpretation.mhttps://www.dropbox.com/s/tak5ny5evo3jqeh/GaussianSampleInterpretation.m?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Deterministic Vs. Monte Carlo Integration

    36

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Sample Representation of the MC Estimator

    37

    Now consider the problem of estimating We simply replace

    with its sample representation )เทœ๐œ‹๐‘(๐œƒ and obtain:

    This is precisely , the Monte Carlo estimator suggested earlier.

    Clearly based on , we can easily estimate for any ๐‘“.

    For example, the variance of ๐‘“ is approximated as:

    ( ).f

    ( )NS f

    ( )f)เทœ๐œ‹๐‘(๐œƒ

    ๐”ผ๐œ‹(๐‘“) โ‰ƒ เถฑ

    ๐›ฉ

    )๐‘“(๐œƒ เท

    ๐‘–=1

    ๐‘

    เถฑ

    ๐ด

    1

    ๐‘๐›ฟ๐œƒ ๐‘– (๐œƒ) ๐‘‘๐œƒ =เท

    ๐‘–=1

    ๐‘

    เถฑ

    ๐›ฉ

    1

    ๐‘๐‘“(๐œƒ)๐›ฟ๐œƒ ๐‘– (๐œƒ) ๐‘‘๐œƒ =

    1

    ๐‘

    ๐‘–=1

    ๐‘

    ๐‘“ ๐œƒ ๐‘–

    ๐‘‰๐‘Ž๐‘Ÿ๐œ‹(๐‘“) = ๐”ผ๐œ‹(๐‘“2) โˆ’ ๐”ผ๐œ‹

    2(๐‘“) โ‰ƒ1

    ๐‘

    ๐‘–=1

    ๐‘

    ๐‘“2 ๐œƒ ๐‘– โˆ’1

    ๐‘

    ๐‘–=1

    ๐‘

    ๐‘“ ๐œƒ ๐‘–

    2

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    From the Algebraic to the Sample Representation

    38

    Similarly, if we have:

    The marginal distribution is given as:

    If we want to estimate and is only known up to a

    normalizing constant, then a reasonable estimate is the following:

    arg max ( )

    ( )

    ( )( )arg max ( )

    i

    i

    เทœ๐œ‹๐‘(๐œƒ1, ๐œƒ2) =1

    ๐‘

    ๐‘–=1

    ๐‘

    ๐›ฟ๐œƒ1๐‘–๐œƒ2๐‘– ๐œƒ1 , ๐œƒ2

    เทœ๐œ‹๐‘(๐œƒ1) =1

    ๐‘

    ๐‘–=1

    ๐‘

    ๐›ฟ๐œƒ1๐‘– ๐œƒ1

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Sampling from an Arbitrary Distribution

    39

    We can now see that if we can sample easily from an arbitrary

    distribution, then we could easily compute any quantities of interest.

    But how do we sample from an arbitrary distribution?

    We will discuss about MCMC and other methods in follow up

    lectures

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Monte Carlo Integration

    40

    We can use the same estimation for performing integration. Consider

    computing the following:

    We can write this integration as where the

    mean is wrt the uniform distribution in ๐ท:

    To compute the integral ๐ผ, we thus need to draw samples from this distribution. Our estimator will then be as follows:

    The estimator converges as .

    ( )D

    I f x dx

    1( ) ( )DD

    I f x dx f x

    1( )

    0,

    if x Dx

    otherwise

    1

    1( )

    n

    i

    i

    I f xn

    1( )O

    N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Monte Carlo Integration

    41

    We can extend this to high (d) dimensions:

    The estimator converges as regardless of the dimensionality ๐‘‘.

    For a Riemann integration of this, the rate of convergence is better

    where a grid of equally spaced points is used (e.g. for ๐ท = [0,1], ฮ”๐‘ฅ =1/๐‘).

    However, in let us say ten dimensions, ๐ท = [0,1]10, you will need O (๐‘10) grid points to achieve the same rate of convergence as in 1D!

    Generally, the rate of convergence of the Riemann approximation of

    the above integral is where ๐‘ is the total number of grid points and r depends on the smoothness of the domain ๐ท.

    ( )D

    I f d x x

    1( )

    N

    1( )N

    1( )N

    J. S. Liu, MC Strategies in Scientific Computing (Chapter 2, Introduction)

    /(1 )r dN

    http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAA

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    ( )x( )x

    Riemann integration

    ( )x

    MC integration

    Monte Carlo for Integration

    42

    In computing with Riemann integration, one e.g.

    constructs a grid with ฮ”๐‘ฅ = 1/๐‘, computes and with approximation error evaluates:

    In ๐‘‘ โˆ’dimensions, we need ๐‘๐‘‘ evaluations to keep an error

    Clearly the MC estimator is slower in convergence but it

    converges independently of the dimension ๐‘‘. It requires no grid of points. But you need to be able to sample from ๐‘(๐‘ฅ).

    [0,1]

    ( )D

    I f x dx

    if(x )= f(iฮ”x)

    1 1

    1( ) ( )

    N N

    N i i

    i i

    I f x x f xN

    1x N

    if(x )

    1( )

    N

    1N

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Monte Carlo Integration Consider calculating for any function ๐‘“(๐‘ฅ), the following

    integral

    We can write this integral as:

    Here

    The Monte Carlo estimator of the integral is now:

    1

    0

    ( ) ( )f f x dx

    [0,1]~X U

    43

    เทœ๐œ‡๐‘(๐‘“) =1

    ๐‘

    ๐‘–=1

    ๐‘

    )๐‘“(๐‘‹๐‘– , ๐‘‹๐‘– ~๐‘–.๐‘–.๐‘‘

    ๐‘ˆ 0,1

    ๐œ‡(๐‘“) = เถฑ

    โˆ’โˆž

    +โˆž

    ๐‘“(๐‘ฅ)๐•€ 0,1 (๐‘ฅ)๐‘‘๐‘ฅ = ๐”ผ )๐‘“(๐‘‹

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    0 5 10 15 20 25 30 35 40 45 500.2

    0.4

    0.6

    0.8

    f(x)=x

    Empirical Average

    Theoretical Mean

    0 5 10 15 20 25 30 35 40 45 500

    0.1

    0.2

    0.3

    0.4f(x)=x2

    Empirical Average

    Theoretical Mean

    0 5 10 15 20 25 30 35 40 45 50-0.5

    0

    0.5

    1f(x)=cos( x)

    Empirical Average

    Theoretical Mean

    Monte Carlo Integration

    MatLab Implementation

    ( )

    0.5

    f x x

    2( )

    0.333

    f x x

    ( ) cos

    0

    f x x

    44

    https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Monte Carlo Integration: Variance From the earlier discussed properties of the MC

    estimator:

    where:

    When the variance is unknown, we

    can use its MC estimator instead: 2 ( ) ( )f Var f X

    22

    [0,1]( ) ( ) ( ) ( )f Var f X f x x dx

    45

    เทœ๐œŽ๐‘2(๐‘“) =

    1

    ๐‘ โˆ’ 1

    ๐‘–=1

    ๐‘

    )๐‘“(๐‘‹๐‘–) โˆ’ เทœ๐œ‡๐‘(๐‘“2, เทœ๐œ‡๐‘(๐‘“) =

    1

    ๐‘

    ๐‘—=1

    ๐‘

    เตฏ๐‘“(๐‘‹๐‘— , ๐‘‹๐‘– ~๐‘–.๐‘–.๐‘‘

    ๐’ฐ 0,1

    ๐”ผ เทœ๐œ‡๐‘ = ๐œ‡

    ๐‘‰๐‘Ž๐‘Ÿ เทœ๐œ‡๐‘ =1

    ๐‘๐‘‰๐‘Ž๐‘Ÿ )๐‘“(๐‘‹ =

    1

    ๐‘๐œŽ2(๐‘“), ๐‘‹~๐’ฐ 0,1

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    0 5 10 15 20 25 30 35 40 45 500

    0.05

    0.1f(x)=x

    Empirical Variance

    Theoretical Variance

    0 5 10 15 20 25 30 35 40 45 500

    0.05

    0.1f(x)=x2

    Empirical Variance

    Theoretical Variance

    0 5 10 15 20 25 30 35 40 45 500.1

    0.2

    0.3

    0.4

    0.5f(x)=cos( x)

    Empirical Variance

    Theoretical Variance

    Monte Carlo Integration: Empirical Variance

    MatLab Implementation

    ( )

    ( ) 1/12

    f x x

    Var f X

    2( )

    ( ) 4 / 45

    f x x

    Var f X

    ( ) cos

    ( ) 1/ 2

    f x x

    Var f X

    46

    เทœ๐œŽ๐‘2(๐‘“) =

    1

    ๐‘ โˆ’ 1

    ๐‘–=1

    ๐‘

    )๐‘“(๐‘‹๐‘–) โˆ’ ฦธ๐œ‡๐‘(๐‘“2, ๐‘‹๐‘– ~

    ๐‘–.๐‘–.๐‘‘๐‘ˆ 0,1

    https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Optimal Number of MC Samples We can compute the optimal number of MC samples for a

    desired accuracy of the estimator:

    From the asymptotic properties we

    can estimate:

    where:

    Approximating the variance , the number of needed ๐‘should satisfy:

    2

    2

    2( )

    ( )

    XNX N f

    f

    1 11 , [0,1]2

    X inverse cdf of

    N

    2 ( )f

    47

    Pr | )เทœ๐œ‡๐‘(๐‘“) โˆ’ ๐œ‡(๐‘“ | โ‰ค ๐œ€ = 1 โˆ’ ๐›ผ โ‡’ Pr| )เทœ๐œ‡๐‘(๐‘“) โˆ’ ๐œ‡(๐‘“ |

    เต—๐œŽ

    ๐‘

    โ‰ค๐œ€ ๐‘

    ๐œŽ= 1 โˆ’ ๐›ผ

    ๐‘ เทœ๐œ‡๐‘ โˆ’ ๐œ‡ โ†’๐‘โ†’โˆž

    ๐ท๐’ฉ )0, ๐œŽ2(๐‘“

    เทœ๐œŽ๐‘2(๐‘“) โ‰ค

    ๐‘๐œ€2

    ๐‘‹๐›ผ2

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Optimal Number of MC Samples The condition can be satisfied iteratively

    1. Start with n1 MC samples from

    2. If then stop; otherwise

    3. Evaluate and generate ๐‘˜1 samples

    where here indicates the integer

    part of ๐‘ฅ.

    11,..., ~ [0,1]nX X U

    1 11,..., ~ [0,1]n n kX X U x

    48

    เทœ๐œŽ๐‘2(๐‘“) โ‰ค

    ๐‘๐œ€2

    ๐‘‹๐›ผ2

    เทœ๐œŽ๐‘›12 (๐‘“) โ‰ค

    ๐‘›1๐œ€2

    ๐‘‹๐›ผ2

    ๐‘˜1 =เตฏ๐‘‹๐›ผ

    2 เทœ๐œŽ๐‘›12 (๐‘“

    ๐œ€2โˆ’ ๐‘›1

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Computing Expectations

    49

    Expectations with respect to any distribution also involve high-

    dimensional integration:

    If we can draw samples from ๐œ‹(๐’™), then we can use the estimator

    ( ) ( )D

    I f d x x x

    ( )x( )x

    Riemann integration

    ( )x

    MC integration

    แˆ˜๐ผ =1

    ๐‘

    ๐‘–=1

    ๐‘

    (๐‘“(๐‘ฅ๐‘–), ๐‘ค๐‘–๐‘กโ„Ž ๐ถ๐‘œ๐‘’๐‘“๐‘“. ๐‘œ๐‘“๐‘‰๐‘Ž๐‘Ÿ๐‘–๐‘Ž๐‘ก. แˆ˜๐ผ) =๐‘‰๐‘Ž๐‘Ÿ )๐‘“(๐‘ฅ

    ๐‘๐”ผ )๐œ‹(๐‘ฅ )๐‘“(๐‘ฅ

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    For the CMBdata, consider 2 subsamples and

    We consider that both come from distributions

    and

    We want to decide if both means are the same, i.e. test

    We assume that the variance (error) 2 is the same for both models

    and use a prior

    The Bayes factor can then be computed as follows:

    Note that the Bayes factor does not depend on the normalizing

    constant of and thus the use of an improper prior

    is fine.

    50

    1,..., nx x 1,..., .ny y

    0 : .x yH

    2 21

    10B

    2 2 2

    10 2 2 2

    , , | ,

    , |

    x y n x y x y

    n

    d d dB

    d d

    D

    D

    2 2 21

    Bayes Factor Approximation

    J-M Marin and C. P. Robert, Bayesian Core, Chapter 2

    ๐‘ฆ1, . . . , ๐‘ฆ๐‘› โˆผ ๐’ฉ ๐œ‡๐‘ฆ , ๐œŽ2 .

    ๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆผ ๐’ฉ ๐œ‡๐‘ฅ , ๐œŽ2

    https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0http://www.springerlink.com/content/x38261/

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    We introduce a new parametrization by writing:

    with the prior

    This allows the same prior to be used in both ๐ป0 and the alternative ๐ป1: ๐œ‡๐‘ฅ โ‰  ๐œ‡๐‘ฆ.

    We choose the improper prior and

    The Bayesโ€™ factor can now be re-written as:

    51

    ,x y

    ,

    1

    2

    2 2 2

    10 2 2 2

    2 2/2 /2 12 2 2 2 2 2 2 22

    2 2/2 /2 12 2 2 2 2 2 2 2

    , , | ,

    , |

    1exp 2 exp 2

    2

    exp 2 exp 2

    x y n x y x y

    n

    n n

    x y

    n n

    x y

    d d dB

    d d

    n x s n y s e d d d

    n x s n y s d d

    D

    D

    Bayes Factor Approximation

    ๐œ‰ โˆผ ๐’ฉ(0,1).

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Bayes Factor Approximation

    52

    To simplify the Bayes factor, let us define:

    We can then write:

    Performing the integrals in ๐œŽ2:

    Lets now integrate in ๐œ‡. Start with the denominator. Note that:

    22 2 2

    2

    2 2 2

    2

    12 22 2

    101

    2 22

    1

    2

    nx y Sn

    nx y Sn

    e e d d d

    B

    e d d

    2 2

    222 1 1

    n n

    i iyx i i

    x x y yss

    Sn n n n

    2

    2 22 2

    102 2

    2

    1

    2

    n

    n

    x y S e d d

    B

    x y S d

    22

    2 2

    22 2

    x yx yx y

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Bayes Factor Approximation

    53

    So the denominator takes the form:

    With the definitions and we can simplify as:

    2

    2 22 2

    102 2

    2

    1

    2

    n

    n

    x y S e d d

    B

    x y S d

    22

    22 2

    2 22 4 2

    n

    nn

    x yx y Sx y S d d

    2

    2

    2 4 2

    2 1

    x y S

    n

    10

    ( 1)/22

    2 22 2 2

    2

    2 1

    2 22 1 2

    1

    2

    1 1

    2 22 2

    2 2

    nn n

    n n n

    Common Factorin the demomin.and numer.of

    B

    x y

    x y S d d

    1/22

    2

    1

    2 2

    n

    x y S

    2 1,n

    Here we use the normalizing constant

    of the t-distribution

    http://en.wikipedia.org/wiki/Student's_t-distribution

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    Thus for the normal case with

    54

    and 0 : 0H

    under prior 2 2, 1 and

    2

    1/22

    2

    01 1/22

    2 /210

    21

    2 2 2

    n

    n

    x y S

    BB

    x y S e d

    For CMBdata, we simulate and approximate

    with (for one simulation that we run)01B

    when . Thus ๐ป0 is much more likely with the data available.

    20.0888, 0.1078, 0.00875, 100x y S n

    Bayes Factor Approximation

    ๐‘ฅ1, . . . , ๐‘ฅ๐‘› โˆผ ๐’ฉ ๐œ‡ + ๐œ‰, ๐œŽ2

    ๐‘ฆ1, . . . , ๐‘ฆ๐‘› โˆผ ๐’ฉ ๐œ‡ โˆ’ ๐œ‰, ๐œŽ2

    )๐œ‰ โˆผ ๐’ฉ(0,1

    )๐œ‰1, . . . , ๐œ‰1000 โˆผ ๐’ฉ(0,1

    ๐ต01๐œ‹ =

    าง๐‘ฅ โˆ’ เดค๐‘ฆ 2 + 2๐‘†2 โˆ’๐‘›+ ฮค1 2

    ฮค๐‘–=1

    10002๐œ‰๐‘– + าง๐‘ฅ โˆ’ เดค๐‘ฆ

    2 + 2๐‘†2 โˆ’๐‘›+ ฮค1 2 1000= 43.3309

    https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    For an estimator, with , the variance can

    be computed (in terms of the empirical mean) as

    and for ๐‘ large,

    We can thus find the variability

    in the Bayes factor estimation.

    We construct a convergence

    test and of confidence bounds on

    the approximation of

    20 40 60 80 100 120 1400

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    55

    2

    1

    1 1( )

    1

    N

    N j N

    j

    v h x hN N

    Histogram of 1000 realizations

    of the approximation of

    based on 1000 simulations

    each.

    Precision Evaluation in Bayes Factor( ) ( )I h d x x x

    MatLab Implementationเต—โ„Ž๐‘ โˆ’ ๐”ผ๐‘“ )โ„Ž(๐‘‹ )๐‘ฃ๐‘ โˆผ ๐’ฉ(0,1

    ๐ต01๐œ‹

    ๐ต01๐œ‹

    )๐‘ฅ๐‘– โˆผ ๐œ‹(๐’™

    https://www.dropbox.com/s/noh9vipbkl9pfrk/PrecisionEvaluation.m?dl=0

  • Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)

    0 100 200 300 400 500 600 700 800 900 1000

    9.6

    9.8

    10

    10.2

    10.4

    10.6

    For estimating a normal mean, a robust prior is a Cauchy prior

    Under squared error loss, the posterior mean is given as:

    Form of suggests simulating i.i.d. variables ๐œƒ1,ยท ยท ยท , ๐œƒ๐‘š โˆผ ๐’ฉ(๐‘ฅ, 1)and calculate

    LLN implies

    56

    ~ ( ,1), ~ (0,1)x N C

    2

    2

    /2

    2

    /2

    2

    1( )1

    1

    x

    x

    e d

    x

    e d

    MatLab Implementation

    Example (Cauchy-Normal)

    ๐›ฟ๐‘š๐œ‹ (๐‘ฅ) =

    ๐‘–=1

    ๐‘š๐œƒ๐‘–

    1 + ๐œƒ๐‘–2

    ๐‘–=1

    ๐‘š

    11 + ๐œƒ๐‘–

    2

    ๐›ฟ๐‘š๐œ‹ (๐‘ฅ) โ†’ ๐›ฟ๐œ‹(๐‘ฅ) ๐‘Ž๐‘  ๐‘š โ†’ โˆž

    https://en.wikipedia.org/wiki/Cauchy_distributionhttps://www.dropbox.com/s/xxfqlloltlqchek/CauchyPrior.m?dl=0