bayes sem with r

7/30/2019 Bayes SEM With R

1/21Electronic copy available at: http://ssrn.com/abstract=1433709

Bayesian Estimation of Structural Equation Modelswith R

A User Manual

J. Buschken

Catholic University of Eichstatt-Ingolstadt

G. Allenby

The Ohio State University

Working Paper

Please do not cite without the authors permission

Draft as of 2009-07-07

Joachim Buschken, Catholic University of Eichstatt-Ingolstadt, Ingolstadt School of Management, Mar-

keting Department, Auf der Schanz 49, D-85049 Ingolstadt, Germany, phone: +49 841 937 1976, fax: +49

841 937 2976, email: [email protected] M. Allenby, The Ohio State University, Fisher College of Business, 540A Fisher Hall, Helen C.

Kurtz Chair in Marketing, 2100 Neil Avenue, OH-43210 Columbus, USA, phone: +1 614 292 9452, fax: +49

841 937 2976, email: [email protected].


2/21Electronic copy available at: http://ssrn.com/abstract=1433709

How to estimate structural equation models with R?

In the social sciences it is often useful to introduce latent variables and use structural

equation modeling to quantify relations among observable and latent variables. This paper

presents a manual, describing how to estimate structural equation models in a Bayesian

approach with R. Parameter estimation follows a Gibbs sampling procedure, generating

draws from the full conditionals of the unknown parameters. The manual is divided into

two main parts. The first part presents an introduction to the estimation of structural

equation models with R. The second part describes a method for simulating data of a

structural equation model and the appendix contains the derivation of the full conditional

distributions.

1 Estimation of SEMs

To illustrate the Bayesian estimation of SEMs with R, we present an application in the

context of a simple SEM. The estimation procedure covers three parts. Firstly the model

has to be specified, secondly the data have to be attached to the model and finally these

values have to be passed to the estimation function.

Specifying the model and attaching data

In order to enable the user to become familiar with the notation and to transfer his model

specifications to the model framework used in this paper, this subsection gives a short

overview of the model framework.1 An example for the specification of a simple SEM

illustrates how the specification procedure is done. This is followed by showing how to

attach data to the model.

1for more details to the model framework see appendix.


3/21

A SEM is composed of a measurement equation (1) and a structural equation (2):

yi = i + i (1)

i = i + i + i (2)

i =

i + i = Mi + i

where i {1,...,n}.

Observations of reflective measures yi are assumed to be generated by underlying

latent variables i, possibly with measurement error i. The measurement equation is

defined by a confirmatory factor analysis model, where is the associated (p q) loading

matrix. The structural equation specifies relationships among the latent variables, where i

can be divided into i = i, an endogenous (q1 1) vector of latent variables, and i = i,

an exogenous (q2 1) vector of latent variables. Let q= q1 + q2, M is the unknown (q1 q)

matrix of regression coefficients that represent the proposed causal effects among and ,

and (q1 1) is a random vector of residuals. It is assumed that measurement errors are

uncorrelated with and , residuals are uncorrelated with and the variables are

distributed as follows:

i N(0,) (3)

i N(0, ) (4)

i N(0,) (5)

i {1,...,n}where and are diagonal matrices. This model is not identified, but it

can be identified by restricting appropriate elements in and/or M at fixed known values

(0 or 1). This is done with the help of the following Pick matrices:

Pick(p q): matrix, containing fixed known elements (0 or 1) in

MPick(q1 q): matrix, containing fixed known elements (0 or 1) in M

For example, if is a (2 2) matrix and you want to fix element [1, 1] to 1 and element


4/21

[2, 2] to 0, Pick is:

Pick =

1 4

4 0

The non-fixed elements of Pick can be set to any value except for 0 and 1. Non-fixed

elements represent starting values for the MCMC chain (in this case: 4).

In order to enable Gibbs sampling from posterior distributions, we set natural

conjugate prior distributions for the unknown parameters: Let k be the kth diagonal

element of , l be the lth diagonal element of , Tk be the kth row of and M

Tl be

the lth row of M, we get:

1

k

Gamma (0k, 0k) (6)

k|

1k

N(0k, kH0k) (7)

1l Gamma (0l, 0l) (8)

Ml|

1l

N(M0l, lH0Ml) (9)

IW [v0, V0] (10)

with k {1,...,p} and l {1,...,q1}.

It follows that the following parameters have to be specified:

0k: shape parameter of the prior distribution of1k

0k: inverse scale parameter of the prior distribution of1k

0l: shape parameter of the prior distribution of 1l

0l: inverse scale parameter of the prior distribution of1l

v0, V0: parameters of the prior distribution of

Note that we assume that those values are the same for all k {1,...,p} and l {1,...,q1}.

Prior parameters of the distributions of regression matrices are set as follows:

0k: prior mean of k is assumed to be zero

H0k: variance-covariance matrix of the prior distribution of k


5/21

is assumed to be a diagonal matrix with 0,01 on the diagonal

M0l: prior mean of Ml is assumed to be zero

H0Ml: variance-covariance matrix of the prior distribution of Ml is assumed to be a

diagonal matrix with 0,01 on the diagonal

Furthermore it is necessary to set starting values for the unknown parameters:

(p p): diagonal variance-covariance matrix of the measurement errors

(q1 q1): diagonal variance-covariance matrix of the structural residuals

(q2 q2): variance-covariance matrix of the latent exogenous variables

and to determine the number of iterations of the MCMC Chain: R.

Starting values for the regression coefficients in the matrices and M have been

already set in the corresponding Pick matrices.


6/21

We use the following simple structural equation model with a single exogenous

variable and three endogenous variables to exemplify our approach:

1

1

2

1

2

1

2

33

QQQQQQQQQQs

QQQQQQQQQQs

3

Figure 1: SEM example

1i

2i

3i

=

0 0 0

0 0 0

1 2 0

1i

2i

3i

+

1

2

0

1i +

1i

2i

3i

(11)

1i, 2i and 3i are the endogenous variables in this model, 1i is an exogenous variable. In

matrix notation and using the usual notation for vectors of latent variables in the SEM

literature, this structural model can be written as: i = i + 1i + i for observations

i = 1,...,n.

By continuing the two matrices and this equation becomes:

1i

2i

3i

=

0 0 0 1

0 0 0 2

1 2 0 0

1i

2i

3i

1i

+

1i

2i

3i

(12)

In matrix notation we write this as: i = Mi + i. In our example, we assume that each


7/21

latent variable is measured by two reflective measurement indicators, as shown by the

following measurement equations:

yi =

11 0 0 0

21 0 0 0

0 32 0 0

0 42 0 0

0 0 53 0

0 0 63 0

0 0 0 74

0 0 0 84

1i

2i

3i

1i

+

1i

2i

3i

4i

5i

6i

7i

8i

(13)

which is the same as yi = i + i, where i comprises the values of all latent variables for

observation i and yi comprises the vector of observed measurement indicators for i. Since

this modell is not identified, we have to fix elements in to 1. Thus we get:

yi =

11 0 0 0

1 0 0 00 32 0 0

0 1 0 0

0 0 53 0

0 0 1 0

0 0 0 74

0 0 0 1

1i

2i

3i

1i

+

1i

2i

3i

4i

5i

6i

7i

8i

(14)


8/21

The matrices and M contain fixed known elements, either 0 or 1. On this basis we

can determine the corresponding Pick matrices:

Pick =

0 0 0

1 0 0 0

0 0 0

0 1 0 0

0 0 0

0 0 1 0

0 0 0

0 0 0 1

(15)

MPick =

0 0 0

0 0 0

0 0

(16)

stands for the unknown elements of the matrices. For the MCMC chain we have to set

starting values for these elements. In this case we set all unknown parameters to 4. The

resulting Pick matrices are:

Pick =

4 0 0 0

1 0 0 0

0 4 0 0

0 1 0 0

0 0 4 0

0 0 1 0

0 0 0 4

0 0 0 1

(17)


9/21

MPick =

0 0 0 4

0 0 0 4

4 4 0 0

(18)

The second step is attaching the data to the model. While the input of text-based data is

possible, R supports of several common data formats. For the manual, we will present how

to attach a .txt file. The data have to be arranged as (n p) matrix and saved as a .txt file.

You can read data from this file using the read.table function, creating a dataframe from it:

Data = read.table(file=C:/your folder/data.txt,header=TRUE, sep=\t, dec=,)

Passing data to the estimating function

Having set all necessary parameters and having attached the data, these objects can be

passed to a function, called semest, drawing the parameters of the model from the full

conditionals and thus yielding estimates for the unknown parameters. In order to pass data

and parameter values to the function semest, you have to rearrange the objects in the

following order:

L = (Data,0k, 0k, 0l, 0l,Pick, MPick, ,, , v0, V0, R),

where 0 and M0 are the corresponding matrices, including all rows 0k respectively M0l.

Now these values have to be passed to the function semest:

semest(L)

This function yields all draws of the posterior distributions of the unknown parameters as

well as the estimated values of the latent variables.2

2 Data Simulation

In order to check whether the algorithm yields the true parameters values, you can test the

Gibbs Sampler by simulating data and subsequently estimating the corresponding

2for more details to the derivation of the posterior distributions see appendix.


10/21

parameter values. Firstly you have to determine a structural equation model, thus the

following parameters have to be spedified:

(p q): matrix of regression coefficients of the measurement model

M(q1 q): matrix of regression coefficients of the structural model

(p p): diagonal variance-covariance matrix of the measurement errors

(q1 q1): diagonal variance-covariance matrix of the structural residuals

(q2 q2): variance-covariance matrix of the latent exogenous variables

n: number of observations

Then you can pass those values to the function sim, yielding the simulated observations

and latent variables:

sim(,M, , ,,n)


11/21

Appendix: Bayesian Estimation of standard SEMs

This section develops a Gibbs sampler to estimate structural equation models (SEM) with

reflective measurement indicators. We illustrate the Bayesian estimation by considering a

standard SEM that is equivalent to the most commonly used LISREL model.

A Model Framework

A SEM is composed of a measurement equation (19) and a structural equation (20):

yi = i + i (19)

i = i + i + i (20)

i =

i + i = Mi + i (21)

where i {1,...,n}.

Observations of reflective measures yi are assumed to be generated by underlying

latent variables i, possibly with measurement error i. The corresponding matrices

including all observations are Y(n p), (n q) and E(n p). The measurement

equation is defined by a confirmatory factor analysis model, where (p q) is the

associated loading matrix. The structural equation specifies relationships among the

identified latent variables, where can be divided into = (q1 1), an endogenous

random vector of latent variables, and = (q2 1), an exogenous random vector of

latent variables. M(q1 q) is the unknown matrix of regression coefficients that represent

the causal effects among and , and (q1 1) is a random vector of residuals. It is

assumed that measruement errors are uncorrelated with and , residuals are uncorrelated

with and the variables are distributed as follows:

i N(0,) (22)


12/21

i N(0, ) (23)

i N(0,) (24)

i {1,...,n}, where and are diagonal matrices. The covariance matrix of isderived on the basis of the SEM:

=

E

T

ET

ET

ET

(25)

=

10

T +

T0

10

T

T

0

(26)

T =

10 + 10

10 +

10

T= 10

TT + T

T0 +

10

T + TT

T0

ET

= 10

T +

T0

T =

10 + 10

T

E

T

= 10

(27)

This model is not identified, but it can be identified by restricting appropriate elements in

and/or M at fixed known values (0 or 1).

B Prior Distributions

In order to enable Gibbs sampling from posterior distributions, we set natural conjugate

prior distributions for the unknown parameters: Let k be the kth diagonal element of ,

l be the lth diagonal element of , Tk be the kth row of and M

Tl be the lth row of M,

we get:

1k Gamma (0k, 0k) (28)

k|

1k

N(0k, kH0k) (29)


13/21

1l Gamma (0l, 0l) (30)

Ml|

1l

N(M0l, lH0Ml) (31)

IW [v0, V0] (32)

with k {1,...,p} and l {1,...,q1}.

C Derivations of conditional distributions

According to Bayes Theorem, the joint posterior of all unknown parameters is

proportional to the likelihood times the prior, or:

p (, , , M,, |Y) p (Y|,, , M,,) p (, , , M,, ) (33)

Given Y and , and are independent from . Once we have obtained draws of ,

we can treat the estimation of and as a simple regression model. Thus we can sample

from the posterior distribution of and without having to refer to . The same holds

for inference with regard to M, and , which are independent from Y given . This

suggests:

p (, , M,, |Y, ) [p (Y|, ,) p (,)] [p (|M, , ) p (M, ,)] (34)

and we can treat the conditional posterior distributions of , and M, and

separately. in the above expression refers to the the n observations of the values of the

latent variables i which are conditionally independant given . The parameters of

can be understood as the parameters of the distribution of heterogeneity of the latent

variables.


14/21

C.1 Obtaining draws of the latent variables ()

We can obtain draws of through the posterior of , which, according to Bayes Theorem,

is given by:

p (|Y, , , ) ni=1

p (yi|i, , ) p (i|) (35)

Given our assumption that the yi are distributed N(i,) and the i are distributed

N(0,), we see that the posterior involves the kernel of two normals, whose quadratic

form in the exponent can easily be combined. Because of the IID assumtion, we treat the

inference for the i separately for each observation i. The exponent of the resulting

distribution has the following expression:

(i 0)T 1 (i 0) + (yi i)

T 1 (yi i)

= Ti 1i + y

Ti

1 yi 2

Ti

T1 yi + Ti

T1 i

= Ti

1 + T1 i 2

Ti

T1 yi + yTi

1 yi

=i

1 + T1

1

T1 yi

T 1 + T1

i

1 + T1

1

T1 yi

T1 yiT 1 + T1 1 T1 yi + yTi 1 yi

(36)

where the last two terms are constants wrt i. As a result, the conditional posterior

distribution ofi is:

p (i|yi, , , ) N

1 + T1 1

T1 yi,

1 + T1 1

(37)

To obtain , we simply cycle in this manner through the i loop. We can then treat as

data in subsequent steps of the Gibbs sampler.


15/21

C.2 Obtaining draws of

The first step in developing the conditional distribution for is to recognize that only

those elements of , which refer to , are relevant for . Since i = ii, we can simply

separate the draws of from and use this for inference regarding . refers to the

variance of the vector of exogenous variables only. The likelihood of observing is:

p (|) n

i=1

||0,5 exp

1

2Ti

1i

= ||

n2 etr

1

2T

1

(38)

Combining equation (38) with the prior distribution of in equation (32) yields:

[|] IWv0 + n, V0 +

T

(39)

C.3 Obtaining draws of and

Given , obtaining draws of and becomes a regression problem. We assume that

is a diagonal matrix, i.e. the measurement errors are uncorrelated. The likelihood of

observing the data is given by:

p (Y|,, ) ||

n2 exp

1

2

ni=1

(yi i)T 1 (yi i)

(40)

in which yi and i are column vectors. Because of the property of , we write this as:

p (Y|, , ) ||

n2 exp

1

2

ni=1

pk=1

1kyik

Tki

2(41)

We can change the order of summation and move k out of the summation over i. The

kernel of this distribution can then be written as:

1

2

pk=1

1k

ni=1

yik

Tki

2(42)


16/21

The summation over i is:

ni=1

yik

Tki

2=

ni=1

y2ik 2yik

Tki + tr

Tki

Ti k

=

ni=1

y2ik 2Tk

ni=1

yiki + Tk

Tk = YTk Yk 2

Tk

TYk + Tk

Tk

=

k

T1

TYkT

T

k

T1

TYk

(43)

In the above expression, Yk refers to the column vector of all observations with regard to

the kth measurement variable. This yields the following likelihood:

pY|, 1k ,

||

n2

p

k=1

exp121k k T1 TYkT

Tk

T

1

TYk

=pk=1

n2

k exp

1

21k

k

T

1

TYkT

T

k

T

1

TYk

(44)

Notice that the determinant of involves only the product of the diagonal elements. We

can therefore move these elements into the exponential expression. The above expression

for the likelihood implies:

independence of the draws of k, 1k of h,

1h for all h = k

conditional independence ofp1k |Y,

and p

k|Y, ,

1k

Also notice that the p distributions for k|

1k and

1k are independent. This implies that

we can draw k and 1k independently. Thus the likelihood of observing Yk is given by:

pYk|k,

1k ,

n2

k exp

1

21k

k

T

1

TYkT

T

k

T1

TYk

(45)

As mentioned above, this model is not identified. We can handle this problem by fixing

some of the parameters in , see (Lee 2007) for the following section. We suggest to fix


17/21

some elements in to 1 and/or 0. Consider Tk , the kth row of , with certain fixed

parameters. Let ck be the corresponding (1 q) row vector such that ckj = 0 ifkj is a

fixed parameter; and ckj = 1 ifkj is an unknown parameter, for k = 1,...,p, j = 1,...,q

and rk = ck1 + ... + ckq. Moreover, let

Tk be the (1 rk) row vector that contains the

unknown parameters in k; and let

k be the (n rk) submatrix of such that for

j = 1,...,rk, all the rows corresponding to ckj = 0 are deleted. Let YTk = (y

1k,...,y

nk) with

yik = yik

qj=1

(kiij (1 ckj)) (46)

This yields the following likelihood of observing Yk :

pYk |

k, 1k ,

n2

k exp

1

21k

k

T

1

TYkT

T

k

T

1

TYk (47)

which can also be written as:

Y

k |

k, 1

k ,

N

T

1

T

Y

k

, k

T

1

(48)

The conjugate prior distribution defined in equation (29) about the loading matrix is

[k|k] N(

0k, kH

0k) (49)


18/21

To derive the posterior for k and 1k , we multiply equation (48) with (49) and (28):

p

k, 1k |Y

k ,

n2

k exp121k k T1 TYk T Tk

T

1

TYk

q2

k exp

121k (

k

0k)T (H0k)

1 (k

0k)

1k

0k1 exp0k 1k (50)

Combining the two quadratic forms yields:

p k, 1k |Yk ,

n2

k

q2

k exp

121k

(k c)

TC (

k c) + d

1k

0k1 exp0k 1k (51)

with

C =

T + (H0k)1

c = T + (H0k)

1

1

T

T

1

TYk

+ (H0k)1 0k

d =

T1

TYkT

T

T1

TYk

+ (0k)T (H0k)

1 (0k) (c)T

T + (H0k)1

(c)

(52)

Thus the posterior distributions of

k, 1k

are respectively given by:

p

k|Y

k , 1k ,

q2

k exp

121k (

k c)TC (

k c) (53)

and

p1k |Y

k ,

1k

n2+0k1 exp

1

21k [20k + d]

(54)


19/21

which can also be written as:

k|Y

k , 1k ,

N

c, kC

1

(55)

and 1k |Y

k ,

Gamma

n

2+ 0k, 0k +

1

2d

(56)

C.4 Obtaining draws of M and

Given , obtaining draws of M and becomes a regression problem. We assume that

is a diagonal matrix, i.e. the measurement errors are uncorrelated. Thus estimating the

parameters of the structural equation follows the same procedure as obtaining draws of the

parameters of the measurement equation. Analogous to the likelihood of observing Yk , see

equation (48), we get the likelihood of observing l:

l|M

l , 1l ,

N

T

1

Tl

, l

T1

(57)

with l {1,...,q1}. The conjugate prior distribution defined in equation (31) about the

loading matrix is

[Ml |l] N(M

0k, lH

0Ml) (58)

To derive the posterior distributions ofMl and 1l , we multiply equation (57) with (58)

and (30):

p

Ml ,

1l |

l,

n2

l exp

1

21lM

l

T1

T

lT

T

Ml

T1

Tl

q2

l exp

121l (M

l M

0l)T (H0Ml)

1 (Ml M

0l)

1l

0l1 exp0l 1l (59)


20/21

Combining the two quadratic forms yields:

pMl ,

1l |

l,

n2

l

q2

l exp121l (Ml cM)TCM (Ml cM) + dM1l

0l1 exp0l 1l (60)

with

CM =

T + (H0Ml)1

cM =

T + (H0Ml)11

T

T1

Tl

+ (H0Ml)1M0l

dM =

T1

Tl

T

T

T1

Tl

+ (M0l)T (H

0Ml)1 (M0l) (cM)T

T + (H0Ml)

1

(cM)

(61)

Thus the posterior distributions of

Ml , 1l

are respectively given by:

pMl |

l, 1l ,

q2

l exp

121l (M

l cM)TCM (M

l cM) (62)

and

p1l |

l,

1l

n2+0l1 exp

1

21l [20l + dM]

(63)which can also be written as:

Ml |

l, 1l ,

N

cM, lC

1M

(64)

and 1l |

l,

Gamma

n2

+ 0l, 0l + 12dM

(65)


21/21

D Setting values for hyperprior parameters

In order to enable sampling from posterior distributions of the unknown elements, we have

to set values for the hyperparameters 0k and 0k, as well as 0l and 0l of the prior

distributions in equation (28) and (30). Since the results of the sampler are very sensitive

to these values, they have to be set thoughtfully. We suggest to run the Gibbs sampler

initially without sampling the parameters in and , but calculate them using the

following formulas:

k =1

n q(Yk k)

T (Yk k) (66)

l =1

n q(l Ml)

T (l Ml) (67)

In a second run we set the parameters 0k and 0k as follows:

0k = 100

0k = k 100(68)

and

0l = 100

0l = l 100(69)

and include the sampling of the parameters in and .

bayes sem with r

Documents