bayes sem with r

Upload: ferra-yanuar

Post on 14-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Bayes SEM With R

    1/21Electronic copy available at: http://ssrn.com/abstract=1433709

    Bayesian Estimation of Structural Equation Modelswith R

    A User Manual

    J. Buschken

    Catholic University of Eichstatt-Ingolstadt

    G. Allenby

    The Ohio State University

    Working Paper

    Please do not cite without the authors permission

    Draft as of 2009-07-07

    Joachim Buschken, Catholic University of Eichstatt-Ingolstadt, Ingolstadt School of Management, Mar-

    keting Department, Auf der Schanz 49, D-85049 Ingolstadt, Germany, phone: +49 841 937 1976, fax: +49

    841 937 2976, email: [email protected] M. Allenby, The Ohio State University, Fisher College of Business, 540A Fisher Hall, Helen C.

    Kurtz Chair in Marketing, 2100 Neil Avenue, OH-43210 Columbus, USA, phone: +1 614 292 9452, fax: +49

    841 937 2976, email: [email protected].

  • 7/30/2019 Bayes SEM With R

    2/21Electronic copy available at: http://ssrn.com/abstract=1433709

    How to estimate structural equation models with R?

    In the social sciences it is often useful to introduce latent variables and use structural

    equation modeling to quantify relations among observable and latent variables. This paper

    presents a manual, describing how to estimate structural equation models in a Bayesian

    approach with R. Parameter estimation follows a Gibbs sampling procedure, generating

    draws from the full conditionals of the unknown parameters. The manual is divided into

    two main parts. The first part presents an introduction to the estimation of structural

    equation models with R. The second part describes a method for simulating data of a

    structural equation model and the appendix contains the derivation of the full conditional

    distributions.

    1 Estimation of SEMs

    To illustrate the Bayesian estimation of SEMs with R, we present an application in the

    context of a simple SEM. The estimation procedure covers three parts. Firstly the model

    has to be specified, secondly the data have to be attached to the model and finally these

    values have to be passed to the estimation function.

    Specifying the model and attaching data

    In order to enable the user to become familiar with the notation and to transfer his model

    specifications to the model framework used in this paper, this subsection gives a short

    overview of the model framework.1 An example for the specification of a simple SEM

    illustrates how the specification procedure is done. This is followed by showing how to

    attach data to the model.

    1for more details to the model framework see appendix.

  • 7/30/2019 Bayes SEM With R

    3/21

    A SEM is composed of a measurement equation (1) and a structural equation (2):

    yi = i + i (1)

    i = i + i + i (2)

    i =

    i + i = Mi + i

    where i {1,...,n}.

    Observations of reflective measures yi are assumed to be generated by underlying

    latent variables i, possibly with measurement error i. The measurement equation is

    defined by a confirmatory factor analysis model, where is the associated (p q) loading

    matrix. The structural equation specifies relationships among the latent variables, where i

    can be divided into i = i, an endogenous (q1 1) vector of latent variables, and i = i,

    an exogenous (q2 1) vector of latent variables. Let q= q1 + q2, M is the unknown (q1 q)

    matrix of regression coefficients that represent the proposed causal effects among and ,

    and (q1 1) is a random vector of residuals. It is assumed that measurement errors are

    uncorrelated with and , residuals are uncorrelated with and the variables are

    distributed as follows:

    i N(0,) (3)

    i N(0, ) (4)

    i N(0,) (5)

    i {1,...,n}where and are diagonal matrices. This model is not identified, but it

    can be identified by restricting appropriate elements in and/or M at fixed known values

    (0 or 1). This is done with the help of the following Pick matrices:

    Pick(p q): matrix, containing fixed known elements (0 or 1) in

    MPick(q1 q): matrix, containing fixed known elements (0 or 1) in M

    For example, if is a (2 2) matrix and you want to fix element [1, 1] to 1 and element

  • 7/30/2019 Bayes SEM With R

    4/21

    [2, 2] to 0, Pick is:

    Pick =

    1 4

    4 0

    The non-fixed elements of Pick can be set to any value except for 0 and 1. Non-fixed

    elements represent starting values for the MCMC chain (in this case: 4).

    In order to enable Gibbs sampling from posterior distributions, we set natural

    conjugate prior distributions for the unknown parameters: Let k be the kth diagonal

    element of , l be the lth diagonal element of , Tk be the kth row of and M

    Tl be

    the lth row of M, we get:

    1

    k

    Gamma (0k, 0k) (6)

    k|

    1k

    N(0k, kH0k) (7)

    1l Gamma (0l, 0l) (8)

    Ml|

    1l

    N(M0l, lH0Ml) (9)

    IW [v0, V0] (10)

    with k {1,...,p} and l {1,...,q1}.

    It follows that the following parameters have to be specified:

    0k: shape parameter of the prior distribution of1k

    0k: inverse scale parameter of the prior distribution of1k

    0l: shape parameter of the prior distribution of 1l

    0l: inverse scale parameter of the prior distribution of1l

    v0, V0: parameters of the prior distribution of

    Note that we assume that those values are the same for all k {1,...,p} and l {1,...,q1}.

    Prior parameters of the distributions of regression matrices are set as follows:

    0k: prior mean of k is assumed to be zero

    H0k: variance-covariance matrix of the prior distribution of k

  • 7/30/2019 Bayes SEM With R

    5/21

    is assumed to be a diagonal matrix with 0,01 on the diagonal

    M0l: prior mean of Ml is assumed to be zero

    H0Ml: variance-covariance matrix of the prior distribution of Ml is assumed to be a

    diagonal matrix with 0,01 on the diagonal

    Furthermore it is necessary to set starting values for the unknown parameters:

    (p p): diagonal variance-covariance matrix of the measurement errors

    (q1 q1): diagonal variance-covariance matrix of the structural residuals

    (q2 q2): variance-covariance matrix of the latent exogenous variables

    and to determine the number of iterations of the MCMC Chain: R.

    Starting values for the regression coefficients in the matrices and M have been

    already set in the corresponding Pick matrices.

  • 7/30/2019 Bayes SEM With R

    6/21

    We use the following simple structural equation model with a single exogenous

    variable and three endogenous variables to exemplify our approach:

    1

    1

    2

    1

    2

    1

    2

    33

    QQQQQQQQQQs

    QQQQQQQQQQs

    3

    Figure 1: SEM example

    1i

    2i

    3i

    =

    0 0 0

    0 0 0

    1 2 0

    1i

    2i

    3i

    +

    1

    2

    0

    1i +

    1i

    2i

    3i

    (11)

    1i, 2i and 3i are the endogenous variables in this model, 1i is an exogenous variable. In

    matrix notation and using the usual notation for vectors of latent variables in the SEM

    literature, this structural model can be written as: i = i + 1i + i for observations

    i = 1,...,n.

    By continuing the two matrices and this equation becomes:

    1i

    2i

    3i

    =

    0 0 0 1

    0 0 0 2

    1 2 0 0

    1i

    2i

    3i

    1i

    +

    1i

    2i

    3i

    (12)

    In matrix notation we write this as: i = Mi + i. In our example, we assume that each

  • 7/30/2019 Bayes SEM With R

    7/21

    latent variable is measured by two reflective measurement indicators, as shown by the

    following measurement equations:

    yi =

    11 0 0 0

    21 0 0 0

    0 32 0 0

    0 42 0 0

    0 0 53 0

    0 0 63 0

    0 0 0 74

    0 0 0 84

    1i

    2i

    3i

    1i

    +

    1i

    2i

    3i

    4i

    5i

    6i

    7i

    8i

    (13)

    which is the same as yi = i + i, where i comprises the values of all latent variables for

    observation i and yi comprises the vector of observed measurement indicators for i. Since

    this modell is not identified, we have to fix elements in to 1. Thus we get:

    yi =

    11 0 0 0

    1 0 0 00 32 0 0

    0 1 0 0

    0 0 53 0

    0 0 1 0

    0 0 0 74

    0 0 0 1

    1i

    2i

    3i

    1i

    +

    1i

    2i

    3i

    4i

    5i

    6i

    7i

    8i

    (14)

  • 7/30/2019 Bayes SEM With R

    8/21

    The matrices and M contain fixed known elements, either 0 or 1. On this basis we

    can determine the corresponding Pick matrices:

    Pick =

    0 0 0

    1 0 0 0

    0 0 0

    0 1 0 0

    0 0 0

    0 0 1 0

    0 0 0

    0 0 0 1

    (15)

    MPick =

    0 0 0

    0 0 0

    0 0

    (16)

    stands for the unknown elements of the matrices. For the MCMC chain we have to set

    starting values for these elements. In this case we set all unknown parameters to 4. The

    resulting Pick matrices are:

    Pick =

    4 0 0 0

    1 0 0 0

    0 4 0 0

    0 1 0 0

    0 0 4 0

    0 0 1 0

    0 0 0 4

    0 0 0 1

    (17)

  • 7/30/2019 Bayes SEM With R

    9/21

    MPick =

    0 0 0 4

    0 0 0 4

    4 4 0 0

    (18)

    The second step is attaching the data to the model. While the input of text-based data is

    possible, R supports of several common data formats. For the manual, we will present how

    to attach a .txt file. The data have to be arranged as (n p) matrix and saved as a .txt file.

    You can read data from this file using the read.table function, creating a dataframe from it:

    Data = read.table(file=C:/your folder/data.txt,header=TRUE, sep=\t, dec=,)

    Passing data to the estimating function

    Having set all necessary parameters and having attached the data, these objects can be

    passed to a function, called semest, drawing the parameters of the model from the full

    conditionals and thus yielding estimates for the unknown parameters. In order to pass data

    and parameter values to the function semest, you have to rearrange the objects in the

    following order:

    L = (Data,0k, 0k, 0l, 0l,Pick, MPick, ,, , v0, V0, R),

    where 0 and M0 are the corresponding matrices, including all rows 0k respectively M0l.

    Now these values have to be passed to the function semest:

    semest(L)

    This function yields all draws of the posterior distributions of the unknown parameters as

    well as the estimated values of the latent variables.2

    2 Data Simulation

    In order to check whether the algorithm yields the true parameters values, you can test the

    Gibbs Sampler by simulating data and subsequently estimating the corresponding

    2for more details to the derivation of the posterior distributions see appendix.

  • 7/30/2019 Bayes SEM With R

    10/21

    parameter values. Firstly you have to determine a structural equation model, thus the

    following parameters have to be spedified:

    (p q): matrix of regression coefficients of the measurement model

    M(q1 q): matrix of regression coefficients of the structural model

    (p p): diagonal variance-covariance matrix of the measurement errors

    (q1 q1): diagonal variance-covariance matrix of the structural residuals

    (q2 q2): variance-covariance matrix of the latent exogenous variables

    n: number of observations

    Then you can pass those values to the function sim, yielding the simulated observations

    and latent variables:

    sim(,M, , ,,n)

  • 7/30/2019 Bayes SEM With R

    11/21

    Appendix: Bayesian Estimation of standard SEMs

    This section develops a Gibbs sampler to estimate structural equation models (SEM) with

    reflective measurement indicators. We illustrate the Bayesian estimation by considering a

    standard SEM that is equivalent to the most commonly used LISREL model.

    A Model Framework

    A SEM is composed of a measurement equation (19) and a structural equation (20):

    yi = i + i (19)

    i = i + i + i (20)

    i =

    i + i = Mi + i (21)

    where i {1,...,n}.

    Observations of reflective measures yi are assumed to be generated by underlying

    latent variables i, possibly with measurement error i. The corresponding matrices

    including all observations are Y(n p), (n q) and E(n p). The measurement

    equation is defined by a confirmatory factor analysis model, where (p q) is the

    associated loading matrix. The structural equation specifies relationships among the

    identified latent variables, where can be divided into = (q1 1), an endogenous

    random vector of latent variables, and = (q2 1), an exogenous random vector of

    latent variables. M(q1 q) is the unknown matrix of regression coefficients that represent

    the causal effects among and , and (q1 1) is a random vector of residuals. It is

    assumed that measruement errors are uncorrelated with and , residuals are uncorrelated

    with and the variables are distributed as follows:

    i N(0,) (22)

  • 7/30/2019 Bayes SEM With R

    12/21

    i N(0, ) (23)

    i N(0,) (24)

    i {1,...,n}, where and are diagonal matrices. The covariance matrix of isderived on the basis of the SEM:

    =

    E

    T

    ET

    ET

    ET

    (25)

    =

    10

    T +

    T0

    10

    T

    T

    0

    (26)

    T =

    10 + 10

    10 +

    10

    T= 10

    TT + T

    T0 +

    10

    T + TT

    T0

    ET

    = 10

    T +

    T0

    T =

    10 + 10

    T

    E

    T

    = 10

    (27)

    This model is not identified, but it can be identified by restricting appropriate elements in

    and/or M at fixed known values (0 or 1).

    B Prior Distributions

    In order to enable Gibbs sampling from posterior distributions, we set natural conjugate

    prior distributions for the unknown parameters: Let k be the kth diagonal element of ,

    l be the lth diagonal element of , Tk be the kth row of and M

    Tl be the lth row of M,

    we get:

    1k Gamma (0k, 0k) (28)

    k|

    1k

    N(0k, kH0k) (29)

  • 7/30/2019 Bayes SEM With R

    13/21

    1l Gamma (0l, 0l) (30)

    Ml|

    1l

    N(M0l, lH0Ml) (31)

    IW [v0, V0] (32)

    with k {1,...,p} and l {1,...,q1}.

    C Derivations of conditional distributions

    According to Bayes Theorem, the joint posterior of all unknown parameters is

    proportional to the likelihood times the prior, or:

    p (, , , M,, |Y) p (Y|,, , M,,) p (, , , M,, ) (33)

    Given Y and , and are independent from . Once we have obtained draws of ,

    we can treat the estimation of and as a simple regression model. Thus we can sample

    from the posterior distribution of and without having to refer to . The same holds

    for inference with regard to M, and , which are independent from Y given . This

    suggests:

    p (, , M,, |Y, ) [p (Y|, ,) p (,)] [p (|M, , ) p (M, ,)] (34)

    and we can treat the conditional posterior distributions of , and M, and

    separately. in the above expression refers to the the n observations of the values of the

    latent variables i which are conditionally independant given . The parameters of

    can be understood as the parameters of the distribution of heterogeneity of the latent

    variables.

  • 7/30/2019 Bayes SEM With R

    14/21

    C.1 Obtaining draws of the latent variables ()

    We can obtain draws of through the posterior of , which, according to Bayes Theorem,

    is given by:

    p (|Y, , , ) ni=1

    p (yi|i, , ) p (i|) (35)

    Given our assumption that the yi are distributed N(i,) and the i are distributed

    N(0,), we see that the posterior involves the kernel of two normals, whose quadratic

    form in the exponent can easily be combined. Because of the IID assumtion, we treat the

    inference for the i separately for each observation i. The exponent of the resulting

    distribution has the following expression:

    (i 0)T 1 (i 0) + (yi i)

    T 1 (yi i)

    = Ti 1i + y

    Ti

    1 yi 2

    Ti

    T1 yi + Ti

    T1 i

    = Ti

    1 + T1 i 2

    Ti

    T1 yi + yTi

    1 yi

    =i

    1 + T1

    1

    T1 yi

    T 1 + T1

    i

    1 + T1

    1

    T1 yi

    T1 yiT 1 + T1 1 T1 yi + yTi 1 yi

    (36)

    where the last two terms are constants wrt i. As a result, the conditional posterior

    distribution ofi is:

    p (i|yi, , , ) N

    1 + T1 1

    T1 yi,

    1 + T1 1

    (37)

    To obtain , we simply cycle in this manner through the i loop. We can then treat as

    data in subsequent steps of the Gibbs sampler.

  • 7/30/2019 Bayes SEM With R

    15/21

    C.2 Obtaining draws of

    The first step in developing the conditional distribution for is to recognize that only

    those elements of , which refer to , are relevant for . Since i = ii, we can simply

    separate the draws of from and use this for inference regarding . refers to the

    variance of the vector of exogenous variables only. The likelihood of observing is:

    p (|) n

    i=1

    ||0,5 exp

    1

    2Ti

    1i

    = ||

    n2 etr

    1

    2T

    1

    (38)

    Combining equation (38) with the prior distribution of in equation (32) yields:

    [|] IWv0 + n, V0 +

    T

    (39)

    C.3 Obtaining draws of and

    Given , obtaining draws of and becomes a regression problem. We assume that

    is a diagonal matrix, i.e. the measurement errors are uncorrelated. The likelihood of

    observing the data is given by:

    p (Y|,, ) ||

    n2 exp

    1

    2

    ni=1

    (yi i)T 1 (yi i)

    (40)

    in which yi and i are column vectors. Because of the property of , we write this as:

    p (Y|, , ) ||

    n2 exp

    1

    2

    ni=1

    pk=1

    1kyik

    Tki

    2(41)

    We can change the order of summation and move k out of the summation over i. The

    kernel of this distribution can then be written as:

    1

    2

    pk=1

    1k

    ni=1

    yik

    Tki

    2(42)

  • 7/30/2019 Bayes SEM With R

    16/21

    The summation over i is:

    ni=1

    yik

    Tki

    2=

    ni=1

    y2ik 2yik

    Tki + tr

    Tki

    Ti k

    =

    ni=1

    y2ik 2Tk

    ni=1

    yiki + Tk

    Tk = YTk Yk 2

    Tk

    TYk + Tk

    Tk

    =

    k

    T1

    TYkT

    T

    k

    T1

    TYk

    (43)

    In the above expression, Yk refers to the column vector of all observations with regard to

    the kth measurement variable. This yields the following likelihood:

    pY|, 1k ,

    ||

    n2

    p

    k=1

    exp121k k T1 TYkT

    Tk

    T

    1

    TYk

    =pk=1

    n2

    k exp

    1

    21k

    k

    T

    1

    TYkT

    T

    k

    T

    1

    TYk

    (44)

    Notice that the determinant of involves only the product of the diagonal elements. We

    can therefore move these elements into the exponential expression. The above expression

    for the likelihood implies:

    independence of the draws of k, 1k of h,

    1h for all h = k

    conditional independence ofp1k |Y,

    and p

    k|Y, ,

    1k

    Also notice that the p distributions for k|

    1k and

    1k are independent. This implies that

    we can draw k and 1k independently. Thus the likelihood of observing Yk is given by:

    pYk|k,

    1k ,

    n2

    k exp

    1

    21k

    k

    T

    1

    TYkT

    T

    k

    T1

    TYk

    (45)

    As mentioned above, this model is not identified. We can handle this problem by fixing

    some of the parameters in , see (Lee 2007) for the following section. We suggest to fix

  • 7/30/2019 Bayes SEM With R

    17/21

    some elements in to 1 and/or 0. Consider Tk , the kth row of , with certain fixed

    parameters. Let ck be the corresponding (1 q) row vector such that ckj = 0 ifkj is a

    fixed parameter; and ckj = 1 ifkj is an unknown parameter, for k = 1,...,p, j = 1,...,q

    and rk = ck1 + ... + ckq. Moreover, let

    Tk be the (1 rk) row vector that contains the

    unknown parameters in k; and let

    k be the (n rk) submatrix of such that for

    j = 1,...,rk, all the rows corresponding to ckj = 0 are deleted. Let YTk = (y

    1k,...,y

    nk) with

    yik = yik

    qj=1

    (kiij (1 ckj)) (46)

    This yields the following likelihood of observing Yk :

    pYk |

    k, 1k ,

    n2

    k exp

    1

    21k

    k

    T

    1

    TYkT

    T

    k

    T

    1

    TYk (47)

    which can also be written as:

    Y

    k |

    k, 1

    k ,

    N

    T

    1

    T

    Y

    k

    , k

    T

    1

    (48)

    The conjugate prior distribution defined in equation (29) about the loading matrix is

    [k|k] N(

    0k, kH

    0k) (49)

  • 7/30/2019 Bayes SEM With R

    18/21

    To derive the posterior for k and 1k , we multiply equation (48) with (49) and (28):

    p

    k, 1k |Y

    k ,

    n2

    k exp121k k T1 TYk T Tk

    T

    1

    TYk

    q2

    k exp

    121k (

    k

    0k)T (H0k)

    1 (k

    0k)

    1k

    0k1 exp0k 1k (50)

    Combining the two quadratic forms yields:

    p k, 1k |Yk ,

    n2

    k

    q2

    k exp

    121k

    (k c)

    TC (

    k c) + d

    1k

    0k1 exp0k 1k (51)

    with

    C =

    T + (H0k)1

    c = T + (H0k)

    1

    1

    T

    T

    1

    TYk

    + (H0k)1 0k

    d =

    T1

    TYkT

    T

    T1

    TYk

    + (0k)T (H0k)

    1 (0k) (c)T

    T + (H0k)1

    (c)

    (52)

    Thus the posterior distributions of

    k, 1k

    are respectively given by:

    p

    k|Y

    k , 1k ,

    q2

    k exp

    121k (

    k c)TC (

    k c) (53)

    and

    p1k |Y

    k ,

    1k

    n2+0k1 exp

    1

    21k [20k + d]

    (54)

  • 7/30/2019 Bayes SEM With R

    19/21

    which can also be written as:

    k|Y

    k , 1k ,

    N

    c, kC

    1

    (55)

    and 1k |Y

    k ,

    Gamma

    n

    2+ 0k, 0k +

    1

    2d

    (56)

    C.4 Obtaining draws of M and

    Given , obtaining draws of M and becomes a regression problem. We assume that

    is a diagonal matrix, i.e. the measurement errors are uncorrelated. Thus estimating the

    parameters of the structural equation follows the same procedure as obtaining draws of the

    parameters of the measurement equation. Analogous to the likelihood of observing Yk , see

    equation (48), we get the likelihood of observing l:

    l|M

    l , 1l ,

    N

    T

    1

    Tl

    , l

    T1

    (57)

    with l {1,...,q1}. The conjugate prior distribution defined in equation (31) about the

    loading matrix is

    [Ml |l] N(M

    0k, lH

    0Ml) (58)

    To derive the posterior distributions ofMl and 1l , we multiply equation (57) with (58)

    and (30):

    p

    Ml ,

    1l |

    l,

    n2

    l exp

    1

    21lM

    l

    T1

    T

    lT

    T

    Ml

    T1

    Tl

    q2

    l exp

    121l (M

    l M

    0l)T (H0Ml)

    1 (Ml M

    0l)

    1l

    0l1 exp0l 1l (59)

  • 7/30/2019 Bayes SEM With R

    20/21

    Combining the two quadratic forms yields:

    pMl ,

    1l |

    l,

    n2

    l

    q2

    l exp121l (Ml cM)TCM (Ml cM) + dM1l

    0l1 exp0l 1l (60)

    with

    CM =

    T + (H0Ml)1

    cM =

    T + (H0Ml)11

    T

    T1

    Tl

    + (H0Ml)1M0l

    dM =

    T1

    Tl

    T

    T

    T1

    Tl

    + (M0l)T (H

    0Ml)1 (M0l) (cM)T

    T + (H0Ml)

    1

    (cM)

    (61)

    Thus the posterior distributions of

    Ml , 1l

    are respectively given by:

    pMl |

    l, 1l ,

    q2

    l exp

    121l (M

    l cM)TCM (M

    l cM) (62)

    and

    p1l |

    l,

    1l

    n2+0l1 exp

    1

    21l [20l + dM]

    (63)which can also be written as:

    Ml |

    l, 1l ,

    N

    cM, lC

    1M

    (64)

    and 1l |

    l,

    Gamma

    n2

    + 0l, 0l + 12dM

    (65)

  • 7/30/2019 Bayes SEM With R

    21/21

    D Setting values for hyperprior parameters

    In order to enable sampling from posterior distributions of the unknown elements, we have

    to set values for the hyperparameters 0k and 0k, as well as 0l and 0l of the prior

    distributions in equation (28) and (30). Since the results of the sampler are very sensitive

    to these values, they have to be set thoughtfully. We suggest to run the Gibbs sampler

    initially without sampling the parameters in and , but calculate them using the

    following formulas:

    k =1

    n q(Yk k)

    T (Yk k) (66)

    l =1

    n q(l Ml)

    T (l Ml) (67)

    In a second run we set the parameters 0k and 0k as follows:

    0k = 100

    0k = k 100(68)

    and

    0l = 100

    0l = l 100(69)

    and include the sampling of the parameters in and .