limited information likelihood and bayesian analysis

19
Journal of Econometrics 107 (2002) 175 – 193 www.elsevier.com/locate/econbase Limited information likelihood and Bayesian analysis Jae-Young Kim Department of Economics, State University of New York-Albany, Albany, NY 12222, USA Abstract In this paper, we study how to embed the optimal generalized method of moments (GMM) estimate in a likelihood-based inference framework and the Bayesian framework. First, we derive a limited information likelihood (LIL) under some moment-based limited information available in GMM based on entropy theory of I -projection theory. Second, we study a limited information Bayesian framework in which the posterior is derived from the LIL and a prior. As the LIL enables us to incorporate GMM or related inference methods in the likelihood-based inference framework, it allows us a rich set of practical applications in the Bayesian framework in which the posterior is obtained from a likelihood and a prior. Our results are primarily large sample results as inference in the underlying GMM framework is usually justied in asymptotics. Inves- tigation of large sample properties of the posterior derived from the LIL reveals an interesting relation between the Bayesian and the classical distribution theories. c 2002 Elsevier Science B.V. All rights reserved. JEL classication: C11; C2; C3; C5 Keywords: Limited information likelihood; Entropy; I -projection; Limited information posterior; Correspon- dence between classical and Bayesian distribution theories 1. Introduction In the traditional inference method based on a likelihood function, either Bayesian or classical, it is usually assumed that enough information is available to formulate a likelihood function. In many cases of econometric practice, however, the researcher has only limited information on the data generating mechanism. Also, often it is not desirable to use the fully parameterized true likelihood, if available, since there might be nuisance components. In this paper, we study a likelihood-based inference framework Tel.: +1-518-437-4418. E-mail address: [email protected] (J.-Y. Kim). 0304-4076/02/$ - see front matter c 2002 Elsevier Science B.V. All rights reserved. PII: S0304-4076(01)00119-1

Upload: jae-young-kim

Post on 03-Jul-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Limited information likelihood and Bayesian analysis

Journal of Econometrics 107 (2002) 175–193www.elsevier.com/locate/econbase

Limited information likelihood and Bayesiananalysis

Jae-Young Kim ∗

Department of Economics, State University of New York-Albany, Albany, NY 12222, USA

Abstract

In this paper, we study how to embed the optimal generalized method of moments (GMM)estimate in a likelihood-based inference framework and the Bayesian framework. First, we derivea limited information likelihood (LIL) under some moment-based limited information availablein GMM based on entropy theory of I -projection theory. Second, we study a limited informationBayesian framework in which the posterior is derived from the LIL and a prior. As the LILenables us to incorporate GMM or related inference methods in the likelihood-based inferenceframework, it allows us a rich set of practical applications in the Bayesian framework in whichthe posterior is obtained from a likelihood and a prior. Our results are primarily large sampleresults as inference in the underlying GMM framework is usually justi4ed in asymptotics. Inves-tigation of large sample properties of the posterior derived from the LIL reveals an interestingrelation between the Bayesian and the classical distribution theories. c© 2002 Elsevier ScienceB.V. All rights reserved.

JEL classi cation: C11; C2; C3; C5

Keywords: Limited information likelihood; Entropy; I -projection; Limited information posterior; Correspon-dence between classical and Bayesian distribution theories

1. Introduction

In the traditional inference method based on a likelihood function, either Bayesianor classical, it is usually assumed that enough information is available to formulatea likelihood function. In many cases of econometric practice, however, the researcherhas only limited information on the data generating mechanism. Also, often it is notdesirable to use the fully parameterized true likelihood, if available, since there might benuisance components. In this paper, we study a likelihood-based inference framework

∗ Tel.: +1-518-437-4418.E-mail address: [email protected] (J.-Y. Kim).

0304-4076/02/$ - see front matter c© 2002 Elsevier Science B.V. All rights reserved.PII: S 0304 -4076(01)00119 -1

Page 2: Limited information likelihood and Bayesian analysis

176 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

based on some moment-based limited information available in the generalized methodof moments (GMM) framework. We study how to formulate a likelihood functionbased only on such limited information. We call it a limited information likelihood(LIL). We, then, allow the Bayesian apparatus to be applied to the limited informationframework with the LIL. We study properties of the posterior derived in the frameworkand discuss an extension of the Bayesian information criterion. Our results are primarilylarge sample results as inference in the GMM framework is justi4ed asymptotically.However, our results are valid for 4nite sample inference if the model and informationavailable in the underlying GMM allows 4nite sample analysis.The limited information used to derive the LIL consists of moment conditions that

are used in GMM. 1 As such, the LIL method covers a wide range of models thatcan be analyzed by GMM. Given the moment condition, the LIL is derived from theI -projection theory (Csiszar, 1975; Jaynes, 1982; Jones, 1989, among others). Thatis, out of a set of probability measures satisfying the same moment condition wechoose the one that minimizes the entropy distance or the Kullback–Leibler informa-tion distance (White, 1982) from the true probability measure. The concept of theKullback–Leibler information distance or entropy distance and applications of it arefound in a number of works such as Shore and Johnson (1980), Haberman (1984),Zellner and High4eld (1988), Cover and Thomas (1991), Zellner (1994), Soo4 (1994),Golan et al. (1996a, b), Imbens (1997), Imbens et al. (1998), and Kitamura and Stutzer(1997). Zellner (1996,1997,1998) uses a similar method in the Bayesian framework.Also, textbook level presentations are available in Mittelhammer et al. (2000). There isanother method in the statistics and econometrics literature to get a likelihood functionbased on some limited information, known as the empirical likelihood method (Owen,1988, 1991; DiCiccio and Romano, 1989, 1990; Hall, 1990; Chen, 1993, 1994; Ko-laczyk, 1994; Quin, 1993; Quin and Lawless, 1994; Kitamura, 1997; Imbens et al.,1998, among others). Our LIL method is similar in spirit but dissimilar in approachto the empirical likelihood method. That is, the two approaches use diHerent condi-tions for the data generating mechanism and diHerent criteria to derive a likelihoodfunction.As the LIL embeds GMM or related methods in the likelihood-based inference frame-

work, it allows us a rich set of practical applications in the Bayesian framework inwhich the posterior is obtained from a likelihood and a prior. The resulting Bayesianprocedure based on the LIL is as rich as the classical limited information proceduresrelated to GMM since the two groups use the same limited information. There areother related works in the literature by Zellner (1996, 1997, 1998) and Kim (2000)who discuss how to derive a posterior without knowledge of the likelihood function.Also, Inoue (2000) uses an empirical likelihood to derive a posterior for a model withan iid observations.The approach of this paper shares the spirit of the Bayesian method of moments

(BMOM) of Zellner (1996, 1997, 1998) in that it develops a Bayesian framework based

1 The moment condition used in our approach is about a moment of the ‘optimal minimum distance’ inGMM that is the same as the GMM objective function at the true value of the parameter. As is shown laterthis moment condition implies the 4rst order moment that motivates GMM.

Page 3: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 177

on some limited information. Both of the two approaches use entropy maximization inderiving the results. In a sense the results of this paper extends BMOM to the generalsituation of GMM for deriving a limited information posterior. On the other hand,while BMOM does not use a speci4c likelihood, the approach of this paper derives anLIL for which Bayesian apparatus is applicable.Investigation of the large sample properties of the posterior density derived from the

LIL reveals an interesting relation between the Bayesian and the classical distributiontheories. It is shown that the posterior from an LIL is asymptotically normal withthe 4rst moment of the distribution equal to the GMM estimator and the second mo-ment equal to the asymptotic variance–covariance matrix of the GMM estimator or theFisher information matrix of the LIL. This 4nding implies that the asymptotic distri-bution of the posterior is the mirror image of the corresponding result in the classicaldistribution theory. This large sample correspondence between the Bayesian and theclassical distribution theories, however, holds only under certain suLcient stationaritycondition. This correspondence does not hold in the presence of nonstationarity, ingeneral. That is, diHerent from the classical distribution theory, Bayesian distributiontheory is robust to the existence of nonstationarity, as is clear from our analysis inthis paper. These 4ndings con4rm Sims (1988) conjecture by a formal analysis. Kwan(1999) derived a similar large sample correspondence to ours by a diHerent methodunder some conditions that require suLcient stationarity.The discussion of the paper proceeds in Section 2 with a preliminary discussion

on model characteristics and moment conditions. In Section 3, a limited informationlikelihood is derived, and some properties of the LIL are discussed. The Bayesianframework based on the LIL is studied in Section 4. A posterior based on the limitedinformation likelihood is derived, and its properties are studied. Section 5 concludesthe paper.

2. Moment conditions and limited information

Let (�;F; P) be a probability space and {Ft}t¿0 be an increasing family of sub�-4elds of F. Let xt be a q-vector of stochastic processes de4ned on (�;F; P) thatis adapted to Ft . Denote by Xn( M!) = (x1( M!); : : : ; xn( M!)), for M!∈�, an n-segmentof a particular realization of {xt}. Let be a k × 1 vector of parameters from � ⊂Rk . Let G be the Borel �-algebra of �. Notice that (�;G) is a measurablespace.Let h(xt ; ) be an r × 1 vector-valued function, h : (Rq × Rk) → Rr . The function

h(xt ; ) characterizes an econometric relation

h(xt ; 0) = wt (2.1)

for a 0 ∈�, where wt is an r-vector stochastic disturbance process satisfying certainconditions (Conditions (A1) and (A2) below). We begin with some standard conditionsin GMM of Hansen (1982).

(A1) {wt; −∞¡t¡∞} is stationary and ergodic.

Page 4: Limited information likelihood and Bayesian analysis

178 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

To specify another property of wt , let

vs = E[wt+s|wt; wt−1; : : : ]− E[wt+s|wt−1; wt−2; : : : ] for s¿ 0:

Notice that the index t does not aHect basic properties of vs since {wt} is a stationaryprocess. By an iterated expectations argument, we can establish that {vs} is a martingalediHerence sequence.

(A2) EP[wtw′t ] exists and is nite, EP[wt+s|wt; wt−1; : : :] converges in mean square to

zero, and∑∞

s=0 EP[v′svs]

1=2 is nite.

(A3) (a) h(x; ·) is continuously di3erentiable in � for each x ∈ Rq.(i) (b)]h(·; ) and @h(·; )=@ are Borel measurable for each ∈ �.

The conditions (A1) and (A2) cover a broad class of models (Hansen, 1982). Amongother things, (A2) implies that 2

EP[h(xt ; 0)] = 0; (2.2)

an r × 1 moment conditions. Now, letting

Rw(s) = EP[ws+1w′1];

Assumptions (A1) and (A2) insure that

S =∞∑

s=−∞Rw(s) (2.3)

is well de4ned and 4nite. The matrix S above is sometimes interpreted as a long-runvariance of wt = h(xt ; 0) and can be alternatively written as:

S =∞∑

�=−∞EP[h(xt ; 0)h(xt−�; 0)′]: (2.3′)

Consistent estimators of S are provided by Newey and West (1987), Gallant (1987),Andrews (1991), and Andrews and Monahan (1992). The above results (2.2) and (2.3)can be interpreted as the 4rst and second moments of wt implied by the probabilitymeasure P.Now, let

gn(Xn; ) =1n

n∑t=1

h(xt ; ):

The matrix S in (2.3) and (2.3′) is the asymptotic variance of√ngn(Xn; 0):

S = limn↗∞

EP[ngn(Xn; 0)gn(Xn; 0)′]: (2.3′′)

Under Assumptions (A1)–(A2), the probability measure P implies the following mo-ment condition (2.4):

2 This implication can be seen by applying an iterated expectations argument and noting that E[wt ] =E[h(xt ; 0)]. See Hansen (1982, p. 1040).

Page 5: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 179

Lemma 1. Under Assumptions (A1) and (A2); it is true that

limn↗∞

EP[ngn(Xn; 0)′S−1gn(Xn; 0)] = r: (2.4)

The long-run variance matrix S in (2.4) can be replaced by a consistent estimator ofit, denoted by Sn: 3

limn↗∞

EP[ngn(Xn; 0)′S−1n gn(Xn; 0)] = r: (2.4′)

The moment condition (2.4) combines the 4rst and the second order properties of wt ,(2.2) and (2.3′′) or (2.3). This becomes clearer by noting that

limn↗∞

[tr S−1EP[ngn(Xn; 0)gn(Xn; 0)′]] = r:

Notice that both sets of moment conditions, the moment (2.4) and the moments (2.2)with (2.3), are implied in Assumptions (A1)–(A2). The GMM is motivated by themoment conditions (2.2) and (2.3). However, in fact, GMM uses the quadratic formof gn(Xn; ) in (2.4) for optimal estimation. Only the (traditional) method of momentsuses the moment (2.2) directly, which is valid for the case of exact identi4cation.For a similar reason and other reasons explained below (Lemmas 2 and 3), we utilizemoment (2.4) in the following discussions. Notice also that Assumptions (A1)–(A2)form a suLcient condition for the moment (2.4) or the moments (2.2)–(2.3). Sincethe moment condition (2.4) is weaker than the moments (2.2)–(2.3), as is shown inLemma 2 below, we could have weaker condition(s) than (A1)–(A2) to get (2.4).We do not investigate such possible conditions in this paper although they might be auseful subject to study. As our main concern in this paper is how to embed the optimalGMM estimate in a likelihood-based=Bayesian inference framework, it is natural to usestandard assumptions in the GMM framework, Assumption (A1)–(A2).More detailed discussion for comparing the two sets of moment conditions, the

moment (2.4) and the moments (2.2)–(2.3), is provided in the following. For each ∈� let ˝1( ) and ˝2( ) be two sets of probability measures such that

˝1( ) ={P : lim

n↗∞EP[ngn(Xn; )′S−1gn(Xn; )] = r

}; (2.5)

˝2( ) ={P : EP[h(xt ; )] = 0 and lim

n↗∞EP[ngn(Xn; )gn(Xn; )′] = S

}: (2.6)

We can show that ˝2 ⊂ ˝1. That is, the moment (2.4) allows a broader class ofprobability measures than the moments (2.2)–(2.3).

Lemma 2. Let gn(Xn; ) = n−1∑nt=1 h(xt ; ) and S be as de ned in (2:3) or (2:3′′).

Then; it is true that ˝2 ⊂ ˝1.

Moreover, under some conditions we can show that for the measure P ∈˝1( ) itis true that EP[h(xt ; )] = 0. That is, the moment condition (2.4) implies the moment(2.2). To establish this result, we adopt the following condition:

3 In practice, Sn can be obtained by iteration as in GMM.

Page 6: Limited information likelihood and Bayesian analysis

180 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

(A4) Let Dn( ) = (@gn(Xn; )=@ )′. Assume that Dn( ) converges to D( ) in P-measure, where D is an r × k nite matrix with rank k.

Lemma 3. Assume that (A3)–(A4) hold. Let xt be stationary and ergodic. Assumealso that T(gn(Xn; )) ≡ ∫

gn(Xn; )′S−1gn(Xn; ) dP and @T(gn(Xn; ))=@ ′ are con-tinuous in gn(Xn; ) in P-measure. Then; for P ∈˝1( ) it is true that EP[h(xt ; )] = 0for −∞¡t¡∞.

On the other hand, Assumptions (A1)–(A2) provide suLcient conditions for apply-ing a central limit theorem for the process wt (e.g., Gordin, 1969). A central limittheorem under Assumptions (A1)–(A2) implies that

S−1=2√ngn(Xn; 0) d→Z; (2.7)

where Z ∼ N(0; Ir). Then, by applying the dominated convergence theorem to thesequence �n( 0) = ngn(Xn; 0)′S−1gn(Xn; 0), we have

limn↗∞

EP[ngn(Xn; 0)′S−1gn(Xn; 0)] = EP

[limn↗∞

ngn(Xn; 0)′S−1gn(Xn; 0)]= r;

(2.8)

which is the same as (2.4).

Remark 1: On the possibility of �nite sample inference. (a) The moment conditions(2.4) or (2.2)–(2.3) are derived from the standard assumptions of GMM; Assumptions(A1) and (A2). As inference in the GMM framework based on these assumptions isjusti4ed asymptotically; our results are primarily for large sample analysis. However;if the model and information available in the underlying GMM allow 4nite sampleanalysis; our results are valid for 4nite sample inference.(b) For example, suppose that we have more information on serial correlation of

h(xt ; 0). Taking account of serial correlation, write the smoothed linear function ofh(xt ; ) as in Smith (1997)

h!t ( ) =m∑

j=−m!(j)h(xt−j; ); t = 1; : : : ; n; (2.9)

where !(·) is a weight function de4ned on the real line. The lag truncation m may beviewed as reUecting the order of serial correlation in the process h(xt ; 0). If h!t ( 0)is serially uncorrelated with a 4nite truncation parameter m, the smoothing in (2.9) is‘complete’. When the function !(·) is chosen such that h!t ( ) is a nonsingular trans-formation of h(xt ; ) for t =1; : : : ; n, the moment conditions (2.2) may be equivalentlystated as

EP[h!t ( 0)] = 0: (2.10)

We assume that variance matrix of h!t ( ), denoted by S!t , exists:

S!t = E[h!t ( )h!t ( )

′]: (2.11)

Page 7: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 181

The moment conditions (2.10) and (2.11) are corresponding conditions of (2.2) and(2.3), respectively. Likewise, we have the following condition that corresponds to (2.4):

EP[h!t ( )′(S!t )

−1h!t ( )] = r: (2.12)

As we can see later in Section 3, if the smoothing in (2.9) is ‘complete’, the moments(2.10)–(2.11) or (2.12) allow 4nite sample analysis.

3. The limited information likelihood

The moment conditions (2.2)–(2.3) and (2.4) form a set of limited information onthe data generation mechanism. Based on the limited information, we can constructa kind of semi-parametric likelihood. In this section, we study how to get such alikelihood. Our approach in this section is based on entropy theory to get a likelihoodthat is the closest to (unknown) true likelihood in an information distance. As themoments of limited information are the same as those in the corresponding GMMframework, we are studying how to embed the GMM estimate in a likelihood-basedinference framework. We allow Bayesian apparatus to be applied to this framework inthe following sections.For each ∈�, de4ne a set of probability measures that are absolutely continuous

with respect to P such that

Q( ) ={Q : lim

n↗∞EQ[ngn(Xn; )′S−1gn(Xn; )] = r

}: (3.1)

Therefore, Q∈Q implies the same moment condition as P satisfying (2.4). Usually,such a Q is not unique. The long-run variance matrix S can be replaced by a consistentestimator of it. Selection of a particular probability measure in Q( ) is a kind oflinear inverse problem (Jones, 1989). The following convex optimization problem fora convex set Q yields a solution Q∗ ∈Q that is the closest to the true probabilitymeasure P in the entropy distance, the Kullback–Leibler information criterion distance(White, 1982), or I -divergence distance (Csiszar, 1975):

Q∗( ) = argminQ∈Q( )

I(Q||P) ≡ argminQ∈Q( )

∫ln(dQ=dP) dQ; (3.2)

where dQ=dP is the Radon–Nikodym derivative (or density) of Q with respect toP. 4 Thus, Q∗ is the solution of the constrained minimization where the constraintis given with respect to the moments implied in the measure P. We call Q∗ theI -projection of P on Q, following Csiszar (1975). The intuition is that Q∗ is theprojection of P on an information set ‘spanning’ Q which has the closestdistance from P to the set Q. Existence of the I -projection is generally ensuredby some minor condition (Theorem 2:1 of Csiszar, 1975). 5 We denote by

4 Although the I -divergence distance (3.2) is not a metric, there exist certain analogies between propertiesof probability measures and Euclidean geometry, where I -divergence plays the role of squared Euclideandistance. See Csiszar (1975).

5 If the convex set Q is closed in the topology of the variation distance |Q − R| = ∫ |qP − rP |dP, whereP is any probability measure such that Q and R are absolutely continuous with respect to P (qP = dQ=dPand rP = dR=dP), the I -projection always exists (Theorem 2:1 of Csiszar, 1975).

Page 8: Limited information likelihood and Bayesian analysis

182 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

q∗P( ) = dQ∗( )=dP the Radon–Nikodym derivative of Q∗( ) with respectto P.We call q∗P( ) a limited information density or the I -projection density, following

Csiszar (1975). The solution of (3.2), q∗P( ), is uniform in ∈� that satis4es themoment in (3.1) (Theorem 1). Therefore, q∗P( ) can be interpreted as a likelihood of . Thus, we call q∗P( ) a limited information likelihood (LIL) or the I -projectionlikelihood.As demonstrated by Kullback (1959), minimization problems of type (3.2) play

a basic role in the information-theoretic approach to statistics. The concept of theentropy distance and the associated min=max problems are used by several researchersfor diHerent problems such as Haberman (1984), Golan et al. (1996a, b), Imbens(1997), Imbens et al. (1998), Kitamura and Stutzer (1997). Zellner’s (1996, 1997,1998) BMOM uses a similar method.A solution of the constrained minimization problem (3.2) can be obtained by follow-

ing the analysis of Csiszar (1975). Thus, for n=1; : : : ;∞ let qP;n(!; )=qP;n(Xn(!); )be such that

qP;n(!; ) = Cn exp{cngn(Xn; )′S−1gn(Xn; )}; (3.3)

where c(¡ 0) and Cn are constants. As is shown in the following (Theorem 2), c=− 12

is a desirable choice. In practice, when S is unknown we can replace it with a consistentestimator Sn.

Theorem 1. Assume that the I -projection of P on Q; denoted by Q∗; exists. Let{Qn; n = 1; : : : ;∞} be a sequence of probability measures that are absolutely con-tinuous with respect to P such that dQn=dP=qP;n. Assume that such {Qn; n=1; : : : ;∞}exists. Then; it is true that (a) I(Qn( )||P) − I(Q∗( )||P) → 0 for all such that√n(gn(Xn; )−gn(Xn; 0))=Op(1) and (b) I(Qn( )||P)−I(Q∗( )||P) → 0 for all ∈�

if {xt} is a stationary and ergodic process.

Theorem 1 implies that qP;n is a 4nite sample analog of q∗P . This is true even inthe case of xt being nonstationary (part (a)). An example of the set of such that√n(gn(Xn; )− gn(Xn; 0) =Op(1) (in part (a)) can be found in Section 4, a shrinking

neighborhood. Notice that for qP;n in (3.3) it is true that

EQn [ngn(Xn; )′S−1gn(Xn; )] =

∫{ngn(Xn; )′S−1gn(Xn; )} dQn = r: (3.4)

We can show that the I -projection likelihood qP;n in (3.3) satis4es a condition forcorrect speci4cation in asymptotics. Consider the following de4nition of correct speci-4cation for a likelihood:

De�nition 1 (Correct speci4cation). Let q be a quasi-likelihood and p be the true like-lihood. Assume that ln q(Xn; ) is twice continuously diHerentiable with respect to .We de4ne that q is a correct speci4cation of p if

EP

[(@ ln q(Xn; 0)

@ ′

)(@ ln q(Xn; 0)

@ ′

)′]=−EP

[@2 ln q(Xn; 0)

@ @ ′

]: (3.5)

Page 9: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 183

Apparently, the condition (3.5) is for correct speci4cation of a quasi-likelihoodq(Xn; ) in the second order. The above de4nition of correct speci4cation is basedon the following property of the likelihood p:

Lemma 4 (Information matrix equality). Assume that lnp(Xn; ) is twice continuouslydi3erentiable with respect to . Then; it is true that

EP

[(@ lnp(Xn; 0)

@ ′

)(@ lnp(Xn; 0)

@ ′

)′]=−EP

[@2 lnp(Xn; 0)

@ @ ′

]: (3.6)

Notice that the condition of twice continuous diHerentiability of lnp(Xn; ) alone issuLcient for the property (3.6). As in White (1982), we can determine whether or notan LIL q is a correct speci4cation for p based on the condition (3.5).We can show that the I -projection likelihood qP;n in (3.3) asymptotically satis4es

the condition for correct speci4cation (3.5).

Theorem 2. Let qn=qP;n be de ned as in (3:3); where c=− 12 . Then; under (A3)–(A4)

it is true that

limn↗∞

EP

[1n

(@ ln qn(Xn; 0)

@ ′

)(@ ln qn(Xn; 0)

@ ′

)′]

= limn↗∞

− EP

[1n@2 ln qn(Xn; 0)

@ @ ′

]: (3.7)

Remark 2: Further note on the LIL. One might think that the likelihood function (3.3)can be derived by an alternative ‘simpler’ way from the asymptotic result of (2.7)or (2.8) which is a direct result of the given assumptions; Assumptions (A1)–(A2).However; (A1) and (A2) imply only that the asymptotic statement is true at = 0;that is

√ngn( 0)

d→N(0; S):

For �= 0 this is not true in general; that is; for �= 0 the asymptotic variance of√ngn( ) is not S; for example. Therefore; the asymptotic result as in (2.8) alone

would not enable us to derive a likelihood function; a function of . We can solve thisproblem by introducing the I -projection framework as in the above discussion. In theI -projection framework; we consider a set of probability measures at each ∈� thatsatisfy the same moments as the true probability measure and choose the one that isthe closest to the true probability measure. The chosen probability measure de4ned on� is the LIL (3.3).Now, we de4ne an estimator that maximizes LIL. Let Ln( ) = log qP;n(Xn; ).

De�nition 2. The estimator such that

n = argmax ∈�

Ln( )

= argmax ∈�

gn(Xn; )′S−1gn(Xn; ) (3.8)

Page 10: Limited information likelihood and Bayesian analysis

184 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

is de4ned as the limited information maximum likelihood estimator (LIMLE));where S ; a consistent estimator of S; replaces S in qP;n if S is not available.

As the LIL embeds the optimal GMM estimate in a likelihood-based inference frame-work, the associated estimator LIMLE is the same as the optimal GMM estimatorof Hansen (1982). Moreover, under suLcient stationarity condition, the informationmatrix of the LIL (3.3) is the same as the asymptotic variance–covariance matrix ofthe GMM estimator (Lemma 5 in Section 4). Note that, as is clear from De4nition 2,we are not really proposing a new estimator in this paper.

Remark 3: LIL for information sets implied in (2.10)–(2.12). If the available informa-tion set on the process wt = h(xt ; 0) is as in (2.10)–(2.12) and if the smoothing asin (2.9) is complete in the sense explained in Remark 1(b); it is possible to derive anLIL that is valid for 4nite samples. Following the same analysis as above; we obtainthe following results for such limited information likelihood:

qP;n(!; ) = Cn exp

{cn∑t=1

h!t (xt ; )′(S!t )

−1h!t (xt ; )

}; (3.9)

where c(¡ 0) and Cn are constants.

4. A limited information posterior

The traditional Bayesian approach requires the knowledge of the full likelihood func-tion since the posterior is obtained from the likelihood and a prior. As the full like-lihood is often not available, this aspect of the Bayesian approach is an importantdrawback for practical applications. Also, sometimes the full model or the full likeli-hood may involve nuisance components that are not of interest. In this case, some typeof semi-parametric procedure might be more appropriate for practical applications.The limited information likelihood studied in the previous section enables us to

derive a posterior even when the full likelihood is not available. Also, the LIL enablesus to consider only the parts of interest, ignoring the nuisance components of the fulllikelihood, if any. In this section, we study the posterior derived from the LIL.For notational convenience, write qn(Xn; ) = qP;n(Xn; ) the LIL (3.3). Let ’( ) be

a prior density of . Then, a posterior can be derived from Bayes’ rule

’n( |Xn) = qn(Xn)−1{’( )qn(Xn; )} (4.1)

where qn(Xn) =∫� ’( )qn(Xn; ) d , a normalizing factor. Whenever the LIL is deriv-

able, we can construct a limited information posterior (4.1). As in the case of BMOMexplained in Zellner (1996), the density (4.1) constructed from an I -projection likeli-hood not only serves as a post-data density of but also enables us to derive predictivedensities of future observations. See Zellner (1996). 6

Now, we study the behavior of the posterior ’n( |Xn) in the large sample context.As can be shown in the sequel, the posterior distribution is asymptotically normal with

6 One of the editors has reminded the author of this point. The author appreciates it.

Page 11: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 185

the 4rst moment of the distribution equal to n and the second moment [− L′′n ( n)]−1,where L′′n ( ) is the second derivative of Ln( ) = log qn( ). This result implies that theasymptotic distribution of the posterior is the mirror image of the distribution of in the case of stationarity. (A detailed discussion is found in Section 4.2.) However,diHerent from the classical distribution theory, Bayesian distribution theory is robust tothe existence of nonstationarity.

4.1. Asymptotic normality of the posterior

We 4rst introduce a set of nontrivial conditions on qn(Xn; ). Let N( n; 'n),n= 1; : : : ;∞, be such that

N( n; 'n) = { : | 1 − n1|2='2n1 + · · ·+ | k − nk |2='2nk ¡ 1}; (4.2)

where ni is the ith element of n; 'n=('n1; : : : ; 'nk)′ is a k-vector of real numbers; | · |denotes the usual Euclidean norm. We consider a sequence {'n} such that 'n becomessmaller and smaller as n↗ ∞, so that N( n; 'n) shrinks as n gets larger.Notice that the log-likelihood Ln( ) = log qn( ) is twice diHerentiable with respect

to under (A3). Denote by L′′n ( ) the second derivative of Ln( ). Notice that withc =− 1

2

L′′n ( ) ≈ −n(@gn(Xn; )@ ′

)′S−1n

(@gn(Xn; )@ ′

); (4.3)

where S, a consistent estimator of S, replaces S in qP;n.Now, consider the following conditions (B1) and (B2).

(B1) (a) Let )n( n(!); 'n)=sup ∈N( n;'n)||[L′′n ( n)]−1[L′′n ( )−L′′n ( n)]||. There exists

a positive sequence {'n}∞n=1 such that limn↗∞ P[)n( n(!); 'n)¡*]=1 for each *¿ 0.(b) Let ,n = [− L′′n ( n)]−1. For 'n satisfying (B1) (a) the absolute value of each

element of the vector ,−1=2n 'n tends to in nity as n↗ ∞ in P-measure.

Condition (B1) (a) is a smoothness or equicontinuity condition of ,n( ) in theneighborhood N( n; 'n). Condition (B1) (b) guarantees that N( n; 'n) is wide enoughto cover the domain of the posterior of . Conditions (B1) and (B2) (below) or similarconditions are used by Sweeting (1992), Sweeting and Andekola (1987) and Kim(1998), among others, to study the behavior of posteriors. In particular, these authorsadopt shrinking neighborhoods like N( n; 'n) for specifying the conditions, which isnecessary to handle the case of nonstationarity in xt . Also, the probability measureon � (P-measure) for the probability statement in conditions (B1) and (B2) is notbinding as we study the behavior of the posterior or the data-conditioned likelihoodfunction.The following condition is about asymptotic posterior concentration of in the

neighborhood N( n; 'n), which is in the sense of Berk (1970):

Page 12: Limited information likelihood and Bayesian analysis

186 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

(B2) For 'n satisfying (B1),∫�\N( n;'n)

’n( |Xn) d → 0

as n↗ ∞ in P-measure, i.e., concentrates in N( n; 'n) as n↗ ∞.

Condition (B2) has much to do with consistency of n in the classical samplingtheory framework, as is shown in Kim (1998).

(B3) The prior density ’( ) is continuous in � and 0¡-( 0)¡∞.

We can show that a posterior formed from an I -projection likelihood and a priorsatisfying (B1)–(B3) is asymptotically normal. Thus, let .(·) denote the standardnormal p.d.f. de4ned on Rk . Also, for a; b∈Rk ; a = (a1; : : : ; ak), etc., let (a; b) be ak-dimensional interval, that is, (a; b) = {x = (x1; : : : ; xk): ai ¡xi ¡bi; i = 1; : : : ; k}.

Theorem 3. Assume that (B1) and (B2) are satis ed for qn(·; ·) and ’(·). Also; as-sume that (B3) is satis ed for ’(·). Then; for each (a; b);∫

Jnab

’n( |Xn) d →∫ b

a.(z) dz (4.4)

in P-measure; where Jnab = { : [− L′′n ( n)]1=2( − n)∈ (a; b)}.

Theorem 3 states that under some conditions the posterior distribution of the para-meter is asymptotically normal with the 4rst moment of the distribution equal to nand the second moment [− L′′n ( n)]−1:

|Xn A∼N( n; [− L′′n ( n)]−1): (4.5)

4.2. Large sample correspondence=divergence between the Bayesian and theclassical distribution theories

In Theorem 3, we established asymptotic normality of the posterior of . Similarresult is available in the classical framework for the behavior of the estimator . As isexplained in Section 3 (De4nition 2, (3:8)), the estimator is the same as the optimal

GMM estimator with the GMM objective function Hn(Xn; ) = gn(Xn; )′S−1gn(Xn; ).

Therefore, the estimator is asymptotically normal under standard conditions in theGMM framework. The standard conditions in the GMM framework includes suLcientstationarity in xt ; −∞¡t¡∞. Asymptotic behavior of the estimator n is summa-rized in the following:

Lemma 5. Assume that {xt ;−∞¡t¡∞} is stationary and ergodic. Under(A2)–(A4) and (A5)–(A6) in Appendix A; it is true that

√n( n − 0) d→N(0; V );

where V = limn{DS−1D′}−1 with D = limn(@gn(Xn; 0)=@ ′)′.

Page 13: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 187

Lemma 5 implies that

| 0 A∼N( 0; Vn=n) (4.6)

where Vn = {DnS−1n D

′n}−1 with Dn = (@gn(Xn; n)=@ ′)′. Notice that from (4.3)

[− L′′n ( n)]−1 ≈ Vn=n: (4.7)

The following result, Proposition 1, summarizes the results (4.5) and (4.6) by thehelp of (4:7) and is provided without proof.

Proposition 1. Assume that {xt ;−∞¡t¡∞} is stationary and ergodic. Under(A2)–(A6) and (B1)–(B3);

| A∼N( ; Vn=n); (4.8)

|Xn A∼N( ; Vn=n): (4.9)

Proposition 1 implies that the classical GMM estimator has a Bayesian interpreta-tion. Given a GMM estimator and its second moment, an asymptotically valid posteriordensity can be constructed according to (4.9). Also, (4.8) and (4.9) in Proposition 1show that the asymptotic distribution of the limited information posterior is the mir-ror image of the corresponding result in the classical distribution theory. Notice thatthe large sample correspondence between the Bayesian and the classical distributiontheories in Proposition 1 holds under a suLcient stationarity condition for xt . This cor-respondence, however, does not hold in the presence of nonstationarity, in general. Ascan be shown easily, the LIMLE (and equivalently, the GMM estimator) is nonnormalin the presence of unit roots in xt while the Bayesian asymptotic theory in Theo-rem 3 and (4.5), asymptotic normality of the posterior, is robust to the existence ofnonstationarity.The result of Theorem 3 further implies that the statistics ( n; Vn) are asymptoti-

cally suLcient for the posterior of , where asymptotic suLciency is de4ned in thefollowing:

De�nition 3. A statistic s(Xn) is asymptotically suLcient for a posterior of ’n( |·)if for each (a; b)∣∣∣∣

∫Jnab

’n( |Xn) d −∫Jnab

’n( |s(Xn)) d ∣∣∣∣→ 0;

where Jnab is as de4ned in Theorem 3.

The following result is a direct consequence of Theorem 3 and is provided withoutproof.

Corollary 1. Let s(Xn)=( n; Vn). Under conditions of Theorem 3; the statistics s(Xn)are asymptotically su;cient for in the posterior ’n( |·).

Page 14: Limited information likelihood and Bayesian analysis

188 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

The result of Corollary 1 implies that for large sample analysis, we can construct aposterior based on the statistics s(Xn)=( n; Vn) rather than based on the whole sampleXn and the full likelihood according to the following:

|s(Xn) A∼N( ; Vn=n): (4.10)

5. Concluding remarks

We have studied how to embed the optimal GMM estimate in a likelihood-based in-ference framework and the Bayesian framework. As inference in the underlying GMMframework is usually justi4ed in asymptotics, our results are primarily useful for largesample analysis. Large sample theory of the proposed method is well established inthis paper. Also, large sample correspondence between the Bayesian and classical ap-proaches is well studied. As noted earlier, with appropriate information for 4nite sam-ple analysis in the underlying GMM, similar approach can be applied in order toconstruct a framework that is valid for 4nite sample inference. This subject, however,has not been fully explored in this paper and is currently under investigation by theauthor.

Acknowledgements

I would like to thank Amos Golan, one of the editors, and two anonymousreferees for helpful comments on the earlier version of this paper. I am also gratefulto Don Andrews, Chris Sims and Arnold Zellner for helpful suggestions anddiscussions.The research of this paper is partially supported by the Research Grants Council of

Hong Kong, grant number HKUST6178=98H.

Appendix A. Additional conditions for Lemma 5

For the asymptotic normality of the LIMLE in De4nition 2, we need the followingtwo additional conditions together with (A3) and (A4) in the main text.

(A5) The parameter vector 0 is globally identi ed in � by the moment. That is,EP[h(xt ; )] = 0 if and only if = 0.

(A6) Let *n(!; ; ') = sup{||Dn(!; ) − Dn(!; ′)|| : ′ ∈�; | ; ′|¡'} for n¿ 1,where || · || denotes the norm of a matrix. Assume that for each *¿ 0

limn↗∞

lim'↓0*n(!; 0; ')¡*

in P-measure.

Page 15: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 189

Appendix B. Mathematical proofs

Proof of Lemma 1. By the properties of the trace; it follows that

E[ngn(Xn; 0)′S−1gn(Xn; 0)] = tr(E[ngn(Xn; 0)′S−1gn(Xn; 0)])

=E[tr(ngn(Xn; 0)′S−1gn(Xn; 0))] = E[tr(nS−1gn(Xn; 0)gn(Xn; 0)′)]

=tr(S−1E[ngn(Xn; 0)gn(Xn; 0)′])

→ tr(S−1 × S) = tr(Ir) = r; (B.1)

where the last step is by virtue of the fact that

limn↗∞

EP[ngn(Xn; 0)gn(Xn; 0)′] = S: (B.2)

Proof of Lemma 2. By (B.1) and (B.2); for P satisfying

limn↗∞

EP[ngn(Xn; )gn(Xn; )′] = S;

it is true that

limn↗∞

EP[ngn(Xn; )′S−1gn(Xn; )] = r:

This implies that ˝2( ) ⊂ ˝1( ).

Proof of Lemma 3. For P ∈˝1

limn↗∞

EP[ngn(Xn; )′S−1gn(Xn; )] = r:

Then; under given conditions

0 =@@ ′

limn↗∞

∫gn(Xn; )′S−1gn(Xn; ) dP

= limn↗∞

@@ ′

∫gn(Xn; )′S−1gn(Xn; ) dP

= limn↗∞

∫@@ ′

[gn(Xn; )′S−1gn(Xn; )] dP

= limn↗∞

∫ (@gn(Xn; )@ ′

)′S−1gn(Xn; ) dP; (B.3)

where the diHerentiation can be taken under integral by (A3) (a). By the condition(A4); the last expression of (B.3) is such that

limn↗∞

∫ (@gn(Xn; )@ ′

)′S−1gn(Xn; )dP = DS−1EP

[limngn(Xn; )

]= 0: (B.4)

Under stationarity; EP[h(xt ; )] = 5 for −∞¡t¡∞ for a constant 5. But EP[limn gn(Xn; )] = 0 only if 5 = 0.

Page 16: Limited information likelihood and Bayesian analysis

190 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

Proof of Theorem 1. By applying Theorem 3:1; Corollary 3:1; Theorem 3:3 of Csiszar(1975); we have

q∗P(!; ) = C exp{

limn↗∞

cngn(Xn; )′S−1gn(Xn; )}; (B.5)

where c is a constant and C is a normalizing constant. Notice that

I(Qn||P)− I(Q∗||P) =∫qP;n log qP;n dP −

∫q∗P log q

∗P dP:

Then; by the dominated convergence theorem

limn↗∞

∫qP;n log qP;n dP =

∫limn↗∞ qP;n log qP;ndP =

∫q∗P log q

∗P dP

for qP;n in (3.3) and q∗P in (B.5). Notice that in part (a); the dominated convergencetheorem applies for qP;n( ) log qP;n( ) for

√n(gn(Xn; ) − gn(Xn; 0)) = Op(1) since

S−1=2√ngn(Xn; ) → Z for Z ∼ N(0; I) as in (2.7). In part (b); the dominated con-vergence theorem also applies under the stationarity and ergodicity of {xt} and (A3)because

√n(gn(Xn; )− gn(Xn; 0)) = Op(1) for ∈� in this case.

Proof of Lemma 4. The result follows from a standard argument as in SerUing (1980;p. 146). Since

∫pn(Xn; ) dXn(!) = 1;

∫@pn(Xn; )

@ ′dXn = 0: (B.6)

But (B.6) equals

∫@ lnpn(Xn; )

@ ′pn(Xn; ) dXn = 0: (B.7)

Now; diHerentiating both sides of (B.7) with respect to ;

∫ [@2 lnpn(Xn; )

@ @ ′pn(Xn; ) +

@ lnpn(Xn; )@ ′

@pn(Xn; )@ ′

]dXn = 0;

or

∫ (@ lnpn(Xn; )

@ ′

)(@ lnpn(Xn; )

@ ′

)′pn(Xn; ) dXn

=−∫@2 lnpn(Xn; )

@ @ ′pn(Xn; ) dXn: (B.8)

Page 17: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 191

Proof of Theorem 2. For qn in (3.3);

EP

[1n

(@ ln qn(Xn; 0)

@ ′

)(@ ln qn(Xn; 0)

@ ′

)′]

=∫

1n(2c × n)

(@gn(Xn; 0)

@ ′

)′S−1gn(Xn; 0)

(2c × n)gn(Xn; 0)′S−1(@gn(Xn; 0)

@ ′

)dP: (B.9)

But; under (A4)∫ (@gn(Xn; 0)

@ ′

)′S−1(ngn(Xn; 0)gn(Xn; 0)′)S−1

(@gn(Xn; 0)

@ ′

)dP

→ DS−1 limn↗∞

EP[ngn(Xn; 0)gn(Xn; 0)′]S−1D′ = DS−1SS−1D′

=DS−1D′

Also;

EP

[1n@2 ln qn(Xn; 0)

@ @ ′

]

≈∫

1n(2c × n)

(@gn(Xn; 0)

@ ′

)′S−1

(@gn(Xn; 0)

@ ′

)dP: (B.10)

But; under (A4)∫ (@gn(Xn; 0)

@ ′

)′S−1

(@gn(Xn; 0)

@ ′

)dP → DS−1D′:

Then; the asymptotic equivalence of (B.9) and (B.10) is achieved with c =− 12 .

Proof of Theorem 3. Theorem 3 is a straightforward generalization of Theorem 3:1 ofKim (1998) to the framework of the limited information likelihood=posterior. Therefore;referring to Kim (1998) and references therein; we omit the proof of Theorem 3.

Proof of Lemma 5. As is shown in (3:8); the LIMLE is the same as the GMM esti-

mator with the GMM objective function Hn(Xn; ) = gn(Xn; )′S−1gn(Xn; ). Therefore;

under given conditions the asymptotic normality of is established by the same wayas in Hansen (1982; Theorem 3:1).

References

Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation.Econometrica 59, 817–858.

Page 18: Limited information likelihood and Bayesian analysis

192 J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193

Andrews, D.W.K., Monahan, J.C., 1992. An improved heteroskedasticity and autocorrelation consistentcovariance matrix estimator. Econometrica 60, 953–966.

Berk, R., 1970. Consistency a posteriori. Annals of Mathematical Statistics 41, 894–906.Chen, S.X., 1993. On the accuracy of empirical likelihood con4dence regions for linear regression models.

Annals of the Institute of Statistical Mathematics 45, 621–637.Chen, S.X., 1994. Empirical likelihood con4dence intervals for linear regression coeLcients. Journal of

Multivariate Analysis 49, 24–40.Cover, T.M., Thomas, J.A., 1991. Elements of Information Theory. Wiley, New York.Csiszar, I., 1975. I-divergence geometry of probability distributions and minimization problems. The Annals

of Probability 3 (1), 146–158.DiCiccio, T., Romano, J., 1989. On adjustments to the signed root of the empirical likelihood statistics.

Biometrika 76, 447–456.DiCiccio, T., Romano, J., 1990. Nonparametric con4ndence limits by resampling methods and least favorable

families. International Statistics Review 58, 59–76.Gallant, A.R., 1987. Nonlinear Statistical Models. Wiley, New York.Golan, A., Judge, G.G., Miller, D., 1996a. Maximum Entropy Econometrics, Robust Estimation with Limited

Data. Wiley, New York.Golan, A., Judge, G.G., PerloH, J., 1996b. A generalized maximum entropy approach to recovering

information from multinomial response data. Journal of the American Statistical Association 91, 841–853.Gordin, M.I., 1969. The central limit theorem for stationary processes. Soviet Mathematics Doklady 10,

1174–1176.Haberman, S.J., 1984. Adjustment by minimum discriminant information. The Annals of Statistics 12,

971–988.Hall, P., 1990. Pseudo-likelihood theory for empirical likelihood. The Annals of Statistics 18, 121–140.Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators. Econometrica

50, 1029–1054.Imbens, G.W., 1997. One step estimators for over-identi4ed generalized method moments models. Review

of Economic Studies 64, 359–383.Imbens, G.W., Spady, R.H., Johnson, P., 1998. Information theoretic approaches to inference in moment

condition models. Econometrica 66, 333–357.Inoue, A., 2000. A Bayesian GMM in large samples. Mimeo.Jaynes, E.T., 1982. On the rationale of maximum-entropy methods. Proceedings of the IEEE 70, 939–952.Jones, L.K., 1989. Approximation-theoretic derivation of logarithmic entropy principles for inverse problems

and unique extension of the maximum entropy method to incorporate prior knowledge. SIAM Journal ofApplied Mathematics 49, 650–661.

Kim, J.Y., 1998. Large sample properties of posterior densities in a time series with nonstationarycomponents, Bayesian information criterion, and the likelihood principle. Econometrica 66 (2), 359–380.

Kim, J.Y., 2000. The generalized method of moments in the bayesian framework. Manuscript.Kitamura, Y., 1997. Empirical likelihood methods with weakly dependence processes. The Annals of Statistics

25 (5), 2084–2102.Kitamura, Y., Stutzer, M., 1997. An information-theoretic alternative to procedures generalized method of

moments estimation. Econometrica 65, 861–874.Kolaczyk, E.D., 1994. Empirical likelihood for generalized linear models. Statistica Sinica 4, 199–218.Kullback, S., 1959. Information theory and statistics. Wiley, New York.Kwan, Y.K., 1999. Asymptotic Bayesian analysis based on a limited information estimator. Journal of

Econometrics 88 (1), 99–121.Mittelhammer, R.C., Judge, G., Miller, D., 2000. Econometric foundations. Cambridge University Press, New

York.Newey, W.K., West, K.D., 1987. A simple positive semi-de4nite, heteroskedasticity and autocorrelation

consistent covariance matrix. Econometrica 55, 703–708.Owen, A., 1988. Empirical likelihood ratio con4dence intervals for a single functional. Biometrika 75,

237–249.Owen, A., 1991. Empirical likelihood for linear models. The Annals of Statistics 19, 1725–1747.Quin, J., 1993. Empirical likelihood in biased sample problems. The Annals of Statistics 21, 1182–1196.

Page 19: Limited information likelihood and Bayesian analysis

J.-Y. Kim / Journal of Econometrics 107 (2002) 175–193 193

Quin, J., Lawless, J., 1994. Empirical likelihood and general estimating equations. The Annals of Statistics23, 300–325.

SerUing, R.J., 1980. Approximation Theorems on Mathematical Statistics. John Wiley.Shore, J.E., Johnson, R.W., 1980. Axiomatic derivation of the principle of maximum entropy and the principle

of minimum cross-entropy. IEEE Transactions IT-26 (1), 26–37.Sims, C.A., 1988. Bayesian skepticism on unit root econometrics. Journal of Economic Dynamics and Control

(12) 463–474.Smith, R.J., 1997. Alternative semi-parametric likelihood approaches to generalized method of moments

estimation. The Economics Journal 107, 503–519.Soo4, E.S., 1994. Capturing the intangible concept of information. Journal of the American Statistical

Association 89 (428), 1243–1254.Sweeting, T.J., 1992. On asymptotic posterior normality in the multivariate case. In: Bernardo, J.M., Berger,

J.O., Dawid, A.P., Smith, A.F.M. (Eds.), Bayesian Statistics 4, 825–835. Oxford University Press.Sweeting, T.J., Andekola, A.O., 1987. Asymptotic posterior normality for stochastic processes revisited.

Journal of the Royal Statistical Society, Series B 49, 215–222.White, H., 1982. Maximum likelihood estimation of misspeci4ed models. Econometrica 50, 1–25.Zellner, A., 1994. Model, prior information and Bayesian analysis. Journal of Econometrics 75, 51–68.Zellner, A., 1996. Bayesian method of moments=instrumental variable (BMOM=IV) analysis of mean and

regression model. In: Lee, J.C., Johnson, W.C., Zellner, A (Eds.), Modelling and Prediction HonoringSeymour Geisser. Springer, New York, pp. 61–74.

Zellner, A., 1997. The Bayesian method of moments (BMOM): theory and application. In: Fomby, T., Hill,R.C (Eds.). Advances in Econometrics. Cambridge University Press.

Zellner, A., 1998. The 4nite sample properties of simultaneous equations’ estimates and estimators: Bayesianand non-Bayesian approaches. Journal of Econometrics 83, 185–212.

Zellner, A., High4eld, R.A., 1988. Calculation of maximum entropy distributions and approximation ofmarginal posterior distributions. Journal of Econometrics 37, 195–210.