[william greene] maximum simulated likelihood meth(bookzz.org)

INTRODUCTION

Simulation-based methods and simulation-assisted estimators have greatlyincreased the reach of empirical applications in econometrics. The receivedliterature includes a thick layer of theoretical studies, including landmarkworks by Gourieroux and Monfort (1996), McFadden and Ruud (1994),and Train (2003), and hundreds of applications. An early and still influentialapplication of the method is Berry, Levinsohn, and Pakes’s (1995) (BLP)application to the U.S. automobile market in which a market equilibriummodel is cleared of latent heterogeneity by integrating the heterogeneityout of the moments in a GMM setting. BLP’s methodology is a baselinetechnique for studying market equilibrium in empirical industrialorganization. Contemporary applications involving multilayered modelsof heterogeneity in individual behavior such as that in Riphahn, Wambach,and Million’s (2003) study of moral hazard in health insurance are alsocommon. Computation of multivariate probabilities by using simulationmethods is now a standard technique in estimating discrete choice models.The mixed logit model for modeling preferences (McFadden & Train, 2000)is now the leading edge of research in multinomial choice modeling. Finally,perhaps the most prominent application in the entire arena of simulation-based estimation is the current generation of Bayesian econometrics basedon Markov Chain Monte Carlo (MCMC) methods. In this area, heretoforeintractable estimators of posterior means are routinely estimated with theassistance of simulation and the Gibbs sampler.

The 10 chapters in this volume are a collection of methodologicaldevelopments and applications of simulation-based methods that werepresented at a workshop at Louisiana State University in November 2009.

Among the earliest applications of the principles discussed here was thedevelopment of the GHK simulator for multivariate normal probabilities.Prior analysts of multinomial choice had reluctantly focused on themultinomial logit model for estimation of choice probabilities. Thesubstantive shortcomings of the MNL model as a description of behaviorwere well known. The assumptions underlying the MNL random utilitymodel of independent, identically distributed random components in theutility functions produce unattractive restrictions on the behavioral model.A multinomial probit model with multivariate normal components relaxes

ix

the restrictions. However, computation of choice probabilities requirescomputation of the multivariate cumulative probabilities. This is simple inthe MNL case, for which Prob [v1 r v�, v2 r v�,y,vj r v�] has a simpleclosed form when vj and v� (where � is one of the j’s) are independentextreme value random variables. However, the counterpart when v has amultivariate normal distribution with mean vector l and covariance vectorS cannot be computed directly as there is no closed form expression (and,for more than two variables, no convenient quadrature method). Manskiand Lerman (1977) proposed an indirect estimator based on a set of Rmultivariate random draws from the normal (l,S) population

P ¼1

R

XRr¼1

1½v1r � v�r; v2r � v�r; . . . ; vKr � v�rjl;R�

That is, based on the (easily drawn) psuedo-random sample, simply countthe number of observations that fall in the specified orthant and divide by thenumber of draws. Two practical limitations, the extraordinarily large numberof draws needed to get a satisfactory estimate and the nonzero probability ofproducing a one or zero estimate inhibit the direct use of this approach.However, the principle that underlies it provides the means to proceed. Thecentral result is that by dint of the law of large numbers, P is a consistentestimator of the expectation of a random variable that can be simulated

P ¼ E½1ðv1r � v�r; v2r � v�r; . . . ; vKr � v�rjl;RÞ�

The GHK simulator (Geweke, 1989; Hajivassiliou, 1990; Borsch-Supan &Hajivassiliou, 1990; Keane, 1994) provides a smooth, efficient means ofestimating multivariate normal cumulative probabilities. [See Greene (2008,pp. 582–584) for mechanical details, and a special issue of the Caves,Moffatt, and Stock (1994) for numerous studies.]

Two of the chapters in our volume are extensions of the GHK simulator.Ivan Jeliazkov and Esther Hee Lee, in ‘‘MCMC Perspectives on SimulatedLikelihood Estimation,’’ reconsider the computation of the probabilities in adiscrete choice model. They show that by interpreting the outcomeprobabilities through Bayes’ theorem, the estimation can alternatively behandled by methods for marginal likelihood computation based on theoutput of MCMC algorithms. They then develop new methods for estimatingresponse probabilities and propose an adaptive sampler for producing high-quality draws from multivariate truncated normal distributions. A simulationstudy illustrates the practical benefits and costs associated with each

INTRODUCTIONx

approach. In ‘‘The Panel Probit Model: Adaptive Integration on SparseGrids,’’ Florian Heiss suggests an algorithm that is based on GHK but usesan adaptive version of sparse grids integration (SGI) instead of simulation. Itis adaptive in the sense that it uses an automated change of variables to makethe integration problem numerically better behaved along the lines of efficientimportance sampling (EIS) and adaptive univariate quadrature. The resultingintegral is approximated using SGI that generalizes Gaussian quadrature in away such that the computational costs do not grow exponentially with thenumber of dimensions. Monte Carlo experiments show an impressiveperformance compared to the original GHK algorithm, especially in difficultcases such as models with high intertemporal correlations.

The extension of the simulation principles to models in which unobservedheterogeneity must be integrated out of the likelihood function is ubiquitousin the contemporary literature. The template application involves a loglikelihood that is conditioned on the unobserved heterogeneity

lnLjv ¼Xni¼1

gð yi;X ijmi; hÞ

Feasible estimation requires that the unobserved heterogeneity beintegrated out of the log-likelihood function; the unconditional log-likelihood function is

lnL ¼Xni¼1

Zmi

gð yi;X ijmi; hÞf ðmiÞ dmi

¼Xni¼1

Emi gð yi;X ijmi; hÞ� �

Consistent with our earlier observation, the log-likelihood function can beapproximated adequately by averaging a sufficient number of pseudo-drawson vi. This approach has been used in a generation of applications ofrandom parameters models in which hi ¼ hþ vi where vi is a multivariaterealization of a random vector that can be simulated. Two of our studies arefocused specifically on the methodology. Chandra R. Bhat, Cristiano Varin,and Nazneen Ferdous compare the performance of the maximum simulatedlikelihood (MSL) approach with their proposed composite marginallikelihood (CML) approach in multivariate ordered-response situations.Overall, the simulation results demonstrate the ability of the CML approachto recover the parameters very well in a five- to six-dimensional ordered-response choice model context. In addition, the CML recovers parameters

Introduction xi

as well as the MSL estimation approach in the simulation contexts used inthe current study, while also doing so at a substantially reducedcomputational cost. The CML approach appears to be a promisingapproach for the estimation of not only the multivariate ordered-responsemodel considered here, but also for other analytically intractable econo-metric models. Tong Zeng and R. Carter Hill in ‘‘Pretest Estimation in theRandom Parameters Logit Model’’ examine methods of testing for thepresence of heterogeneity in the heterogeneity model. The models are nested;the model without heterogeneity arises if Sv ¼ 0 in the template formula-tion. The authors use Monte Carlo sampling experiments to examine thesize and power properties of pretest estimators in the random parameterslogit (RPL) model. The pretests are for the presence of random parameters.They study the Lagrange multiplier, likelihood ratio, and Wald tests usingthe conditional logit as the restricted model.

Importance sampling is a method of improving on the core problem ofusing simulation to estimate E[g(v)] with (1/R)Sr g(vr) through a tranforma-tion of the random variable. The GHK simulator is an application of thetechnique. In ‘‘Simulated Maximum Likelihood Estimation of ContinuousTime Stochastic Volatility Models’’ Tore Selland Kleppe, Jun Yu, and HansJ. Skaug develop and implement a method for MSL estimation of thecontinuous time stochastic volatility model with a constant elasticity ofvolatility. To integrate out latent volatility from the joint density of returnand volatility, a modified EIS technique is used after the continuous timemodel is approximated using the Euler–Maruyama scheme.

We have five applications in the series.In ‘‘Education Savings Accounts, Parent Contributions, and Education

Attainment,’’ Michael D. S. Morris uses a dynamic structural model ofhousehold choices on savings, consumption, fertility, and education spendingto perform policy experiments examining the impact of tax-free educationsavings accounts on parental contributions toward education and theresulting increase in the education attainment of children. The model isestimated via MSL using data from the National Longitudinal Survey ofYoung Women. Unlike many similarly estimated dynamic choice models, theestimation procedure incorporates a continuous variable probabilitydistribution function.

In ‘‘Estimating the Effect of Exchange Rate Flexibility on FinancialAccount Openness,’’ Raul Razo-Garcia considers the estimation of theeffect of exchange rate flexibility on financial account openness. Using apanel data set of advanced countries and emerging markets, a trivariate

INTRODUCTIONxii

probit model is estimated via an MSL approach. The estimated coefficientsexhibit important differences when exchange rate flexibility is treated as anexogenous regressor relative to the case when it is treated as endogenous.

In ‘‘Estimating a Fractional Response Model with a Count EndogenousRegressor and an Application to Female Labor Supply,’’ Hoa B. Nguyenproposes M-estimators of parameters and average partial effects in afractional response model for female labor supply with an endogenouscount variable, number of children, under the presence of time-constantunobserved heterogeneity. To address the endogeneity of the right-hand-side count variable, he uses instrumental variables and a two-step estimationapproach. Two methods of estimation are employed: quasi-maximumlikelihood (QML) and nonlinear least squares (NLS).

Greene (2003) used Monte Carlo simulation for directly evaluating anintegral that appears in the normal-gamma stochastic frontier model. In‘‘Alternative Random Effects Panel Gamma SML Estimation withHeterogeneity in Random and One-Sided Error,’’ Saleem Sheik and AshokK. Mishra utilize the residual concept of productivity measures defined inthe context of a normal-gamma stochastic frontier production model withheterogeneity to differentiate productivity and inefficiency measures. Threealternative two-way random effects panel estimators of normal-gammastochastic frontier model are proposed using simulated maximum likelihoodestimation techniques.

Finally, we have an application of Bayesian MCMC methods by EsmailAmiri, ‘‘Modeling and Forecasting Volatility in a Bayesian Approach.’’ Theauthor compares the forecasting performance of five classes of models:ARCH, GARCH, SV, SV-STAR, and MSSV using daily Tehran stockmarket exchange (TSE) data. The results suggest that the models in the fourthand the fifth classes perform better than the models in the other classes.

ACKNOWLEDGMENTS

I would like to thank Tom Fomby of Southern Methodist University andCarter Hill of Louisiana State University, editors of the Advances inEconometrics Series, for giving me the enjoyable opportunity to host theconference with them and to put together this volume. And, of course, I’dlike to thank the authors for their contributions to the volume and for theirinvaluable help in the editorial process.

Introduction xiii

REFERENCES

Berry, S., Levinsohn, J., & Pakes, A. (1995). Automobile prices in market equilibrium.

Econometrica, 63(4), 841–890.

Borsch-Supan, A., & Hajivassiliou, V. (1990). Smooth unbiased multivariate probability

simulators for maximum likelihood estimation of limited dependent variable models.

Journal of Econometrics, 58(3), 347–368.

Caves, R., Moffatt, R., & Stock, J. (Eds). (1994). Symposium on simulation methods in

econometrics. Review of Economics and Statistics, 76(4), 591–702.

Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.

Econometrica, 57, 1317–1340.

Gourieroux, C., & Monfort, A. (1996). Simulation-based econometric methods. Oxford: Oxford

University Press.

Greene, W. (2003). Simulated maximum likelihood estimation of the normal-gamma stochastic

frontier model. Journal of Productivity Analysis, 19, 179–190.

Greene, W. (2008). Econometric analysis (6th ed.). Englewood Cliffs, NJ: Pearson Prentice Hall.

Hajivassiliou, V. (1990). Smooth simulation estimation of panel data LDV models. New Haven,

CT: Department of Economics, Yale University.

Keane, M. (1994). A computationally practical simulation estimator for panel data.


Manski, C., & Lerman, S. (1977). The estimation of choice probabilities from choice based

samples. Econometrica, 45, 1977–1988.

McFadden, D., & Ruud, P. (1994). Estimation by simulation. Review of Economics and

Statistics, 76, 591–608.

McFadden, D., & Train, K. (2000). Mixed multinomial logit models for discrete response.

Journal of Applied Econometrics, 15, 447–470.

Riphahn, R., Wambach, A., & Million, A. (2003). Incentive effects in the demand for health

care: A bivariate panel count data estimation. Journal of Applied Econometrics, 18(4),

387–405.

Train, K. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University

Press.

William GreeneEditor

INTRODUCTIONxiv

ADVANCES IN ECONOMETRICS

Series Editors: Thomas B. Fomby,R. Carter Hill and Ivan Jeliazkov

Recent Volumes:

Volume 20A: Econometric Analysis of Financial andEconomic Time Series, Edited by Dek Terrelland Thomas B. Fomby

Volume 20B: Econometric Analysis of Financial andEconomic Time Series, Edited by Dek Terrelland Thomas B. Fomby

Volume 21: Modelling and Evaluating Treatment Effects inEconometrics, Edited by Daniel L. Millimet,Jeffrey A. Smith and Edward Vytlacil

Volume 22: Econometrics and Risk Management, Edited byJean-Pierre Fouque, Thomas B. Fomby andKnut Solna

Volume 23: Bayesian Econometrics, Edited by SiddharthaChib, Gary Koop, Bill Griffiths and Dek Terrell

Volume 24: Measurement Error: Consequences,Applications and Solutions, Edited by JaneBinner, David Edgerton and Thomas Elger

Volume 25: Nonparametric Econometric Methods, Editedby Qi Li and Jeffrey S. Racine

ADVANCES IN ECONOMETRICS VOLUME 26

MAXIMUM SIMULATEDLIKELIHOOD METHODS

AND APPLICATIONS

EDITED BY

WILLIAM GREENEStern School of Business, New York University

R. CARTER HILLDepartment of Economics, Louisiana State University

United Kingdom – North America – JapanIndia – Malaysia – China

LIST OF CONTRIBUTORS

Esmail Amiri Department of Statistics, Imam KomeiniInternational University, Ghazvin, Iran

Chandra R. Bhat Department of Civil, Architectural andEnvironmental Engineering, University ofTexas, Austin, Texas, USA

Nazneen Ferdous Department of Civil, Architectural andEnvironmental Engineering, University ofTexas, Austin, Texas, USA

Florian Heiss Department of Business and Economics,University of Mainz, Germany

R. Carter Hill Department of Economics, Louisiana StateUniversity, LA, USA

Ivan Jeliazkov Department of Economics, University ofCalifornia, Irvine, CA, USA

Tore Selland Kleppe Department of Mathematics, University ofBergen, Norway

Esther Hee Lee IHS, EViews, Irvine, CA, USA

Ashok K. Mishra Department of Agricultural Economics andAgribusiness, Louisiana State University,LA, USA

Michael D. S. Morris Department of Economics and LegalStudies in Business, Oklahoma StateUniversity, OK, USA

Hoa B. Nguyen Department of Economics, Michigan StateUniversity, MI, USA

Raul Razo-Garcia Department of Economics, CarletonUniversity, Ottawa, Ontario, Canada

vii

Saleem Shaik Department of Agribusiness and AppliedEconomics, North Dakota StateUniversity, ND, USA

H. J. Skaug Department of Mathematics, University ofBergen, Norway

Cristiano Varin Department of Statistics, Ca’FoscariUniversity, Venice, Italy

Jun Yu School of Economics, SingaporeManagement University, Singapore

Tong Zeng Department of Economics, Louisiana StateUniversity, LA, USA

viii LIST OF CONTRIBUTORS

Emerald Group Publishing Limited

Howard House, Wagon Lane, Bingley BD16 1WA, UK

First edition 2010

Copyright r 2010 Emerald Group Publishing Limited

Reprints and permission service

Contact: [email protected]

No part of this book may be reproduced, stored in a retrieval system, transmitted in any

form or by any means electronic, mechanical, photocopying, recording or otherwise

without either the prior written permission of the publisher or a licence permitting

restricted copying issued in the UK by The Copyright Licensing Agency and in the USA

by The Copyright Clearance Center. No responsibility is accepted for the accuracy of

information contained in the text, illustrations or advertisements. The opinions expressed

in these chapters are not necessarily those of the Editor or the publisher.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-0-85724-149-8

ISSN: 0731-9053 (Series)

Emerald Group PublishingLimited, Howard House,Environmental ManagementSystem has been certified byISOQAR to ISO 14001:2004standards

Awarded in recognition ofEmerald’s productiondepartment’s adherence toquality systems and processeswhen preparing scholarlyjournals for print

MCMC PERSPECTIVES ON

SIMULATED LIKELIHOOD

ESTIMATION

Ivan Jeliazkov and Esther Hee Lee

ABSTRACT

A major stumbling block in multivariate discrete data analysis is theproblem of evaluating the outcome probabilities that enter the likelihoodfunction. Calculation of these probabilities involves high-dimensionalintegration, making simulation methods indispensable in both Bayesianand frequentist estimation and model choice. We review several existingprobability estimators and then show that a broader perspective on thesimulation problem can be afforded by interpreting the outcome probabi-lities through Bayes’ theorem, leading to the recognition that estimationcan alternatively be handled by methods for marginal likelihood computa-tion based on the output of Markov chain Monte Carlo (MCMC)algorithms. These techniques offer stand-alone approaches to simulatedlikelihood estimation but can also be integrated with traditionalestimators. Building on both branches in the literature, we develop newmethods for estimating response probabilities and propose an adaptivesampler for producing high-quality draws from multivariate truncatednormal distributions. A simulation study illustrates the practical benefitsand costs associated with each approach. The methods are employed to

Maximum Simulated Likelihood Methods and Applications

Advances in Econometrics, Volume 26, 3–39

Copyright r 2010 by Emerald Group Publishing Limited

All rights of reproduction in any form reserved

ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)0000026005

3

dx.doi.org/10.1108/S0731-9053(2010)0000026005

estimate the likelihood function of a correlated random effects panel datamodel of women’s labor force participation.

1. INTRODUCTION

Limited dependent variable models deal with binary, multivariate, multi-nomial, ordinal, or censored outcomes that can arise in cross-sectional,time-series, or longitudinal (panel data) settings. To enable inference in thisclass of models, however, one must address a central problem in multi-variate discrete data analysis, namely, evaluation of the outcome probabilityfor each observation. Outcome probabilities are required in constructingthe likelihood function and involve multivariate integration constrained tospecific regions that correspond to the observed data. To illustrate the mainideas in some detail, consider the latent variable representation

zi ¼ X ibþ ei; ei � N ð0;XÞ (1)

where, for i ¼ 1; . . . ; n, zi ¼ ðzi1; . . . ; ziJ Þ0 is a vector of continuous latent

variables underlying the discrete observations yi ¼ ðyi1; . . . ; yiJ Þ0, X i is a J �

k matrix of covariates with corresponding k-vector of parameters b, and Xis a J � J covariance matrix in which the variances of any binary or ordinalvariables yij are typically set to 1 for identification reasons. This latentvariable framework is a general probabilistic construct in which differentthreshold-crossing mappings from zi to the observed responses yi canproduce various classes of discrete data models such as the multivariateprobit for binary and ordinal data, multinomial probit, panels of binary,ordinal, or censored (Tobit) outcomes, models with incidental truncationor endogenous treatment indicators, and Gaussian copula models. Forexample, the indicator function mapping yij ¼ 1fzij40g underlies binarydata models, the relationship yij ¼ 1fzij40gzij leads to a Tobit model withcensoring from below at 0, the discretization yij ¼

PSs¼11fzij4gj;sg for some

strictly increasing sequence of cutpoint parameters fgj;sgSs¼1 arises in ordinal

data modeling and copula models for count data, and so on. Variations onthe distributional assumptions can be used to construct mixtures or scalemixtures of normals models including the Student’s t-link (‘‘robit’’) andlogit models. In economics, the latent zi are interpreted as unobserved utilitydifferences (relative to a baseline category), and discrete data models areoften referred to as discrete choice models.

IVAN JELIAZKOV AND ESTHER HEE LEE4

A representative example that will form the basis for discussion in theremainder of this chapter is the multivariate probit model where thebinary outcomes in yi relate to the latent zi in Eq. (1) through the indicatorfunctions yij ¼ 1fzij40g for j ¼ 1; . . . ; J. In this context, the object ofinterest is the probability of observing yi, conditionally on b and X, which isgiven by

Prðyijb;XÞ ¼ZBiJ

� � �

ZBi1

f NðzijX ib;XÞdzi1 � � � dziJ

¼

Z1fzi 2 Big f NðzijX ib;XÞdzi ð2Þ

where f NðzijX ib;XÞ is the normal density with mean X ib and covariancematrix X (which is in correlation form), and the region of integration isgiven by Bi ¼ Bi1 � Bi2 � � � � � BiJ with

Bij ¼ð�1; 0� if yij ¼ 0

ð0;1Þ if yij ¼ 1

(

The log-likelihood function is given by ln f ðyjb;XÞ ¼Pn

i¼1lnPrðyijb;XÞ,however, a major stumbling block in evaluating that function is thatthe multivariate integrals defining the likelihood contributions in Eq. (2)typically have no closed-form solution, but need to be evaluated at variousvalues of b and X for the purposes of estimation (e.g., in maximizationalgorithms) and model comparison (e.g., in evaluating likelihood ratiostatistics, information criteria, Bayes factors, and marginal likelihoods).Standard grid-based numerical approximations (e.g., Gauss–Legendre orquadrature methods) exist for univariate and bivariate problems, but thecomputational costs associated with these approaches rise exponentiallywith dimensionality, which makes them prohibitively expensive in higherdimensions. While in many instances, the computational intensity ofnumerical integration can be moderated by sparse-grid approximations asin Heiss and Winschel (2008), the most widely used approaches forobtaining Eq. (2) in discrete data analysis have been based on simulation.Such methods exploit a number of practical advantages that make themparticularly appealing. For example, simulation methods typically rely onstandard distributions which makes them conceptually and computationallystraightforward and efficient, even in high dimensions. Moreover, simula-tion often resolves the problem of having to specify the location and sizeof a grid so that it corresponds to areas of high density. This is especiallyuseful because knowledge of these features is often absent, especially in

MCMC Perspectives on Simulated Likelihood Estimation 5

high-dimensional problems. For these reasons, simulation methods havebecome a fundamental tool in multivariate integration in general and insimulated likelihood estimation in particular.

One popular approach for simulation-based evaluation of the outcomeprobabilities in discrete choice models is the Geweke, Hajivassiliou, andKeane (GHK) method (Geweke, 1991; Borsch-Supan & Hajivassiliou, 1993;Keane, 1994; Hajivassiliou & McFadden, 1998). Another one is studiedby Stern (1992). These methods have risen to prominence because they areefficient and offer continuous and differentiable choice probabilities that arestrictly bounded between 0 and 1, making them very suitable for maximumlikelihood estimation and other problems that require gradient or Hessianevaluation. Other methods, such as the accept–reject (AR) simulator andits variants, are appealing because of their transparency and simplicity.Many of these techniques, together with other useful alternatives, have beencarefully reviewed in Hajivassiliou and Ruud (1994), Stern (1997), and Train(2003).

In this chapter we pursue several objectives. Our first main goal is to showthat the probability of the observed response, given the model parameters,can be estimated consistently and very efficiently by a set of alternativetechniques that have been applied in a very different context. In particular,the calculation of integrals which have no closed-form solution has beena central issue in Bayesian model comparison. The marginal likelihood,which is given by the integral of the likelihood function with respect to theprior distribution of the model parameters, is an important ingredient inproducing Bayes factors and posterior odds of competing models. A largenumber of Markov chain Monte Carlo (MCMC) methods have beenintroduced to calculate marginal likelihoods, Bayes factors, and posteriorodds (e.g., Ritter & Tanner, 1992; Newton & Raftery, 1994; Gelfand & Dey,1994; Chib, 1995; Meng & Wong, 1996; DiCiccio, Kass, Raftery, &Wasserman, 1997; Geweke, 1999; Chib & Jeliazkov, 2001, 2005), but thesemethods have not yet been employed to estimate response probabilities andconstruct likelihood functions for discrete data models even though MCMCdata augmentation techniques have been routinely used to obtain parameterestimates without computing those probabilities (see, e.g., Koop, 2003;Greenberg, 2008, and the references therein). A recent comparison ofBayesian and classical inferences in probit models is offered in Griffiths,Hill, and O’Donnell (2006). Given the specifics of the current context, in thischapter, we first focus on MCMC estimation techniques that embodydesirable characteristics such as continuity and differentiability, but mentionthat the other approaches can be very useful as well. Second, we design


several new estimation methods by integrating two branches of the literatureand combining features of the classical and Bayesian methods. This allowsfor several enhancements in the resulting ‘‘hybrid’’ approaches that tend toimprove the quality of the simulated latent data sample, the efficiency of theresulting estimates, and retain simplicity without sacrificing continuity anddifferentiability. Our third goal is to provide a comparison and documentthe performance of the alternative methods in a detailed simulation studythat highlights the practical costs and benefits associated with eachapproach. Finally, we present an application to the problem of estimatingthe likelihood ordinate for a correlated random effects panel data model ofwomen’s labor force participation, which illustrates the applicability of theproposed techniques.

The rest of this chapter is organized as follows. In Section 2, we reviewseveral traditional simulation methods that have been used to estimatethe response probabilities in simulated likelihood estimation. A numberof alternative MCMC approaches are discussed in Section 3. Buildingon existing work, we introduce new approaches for estimating outcomeprobabilities that are obtained by integrating features of the Bayesianand traditional techniques. Section 4 provides evidence on the relativeperformance of these simulation methods, while Section 5 applies thetechniques to evaluate the likelihood function of a correlated random effectspanel data model using data on women’s labor force participation.Concluding remarks are presented in Section 6.

2. EXISTING METHODS

We begin with a brief review of the basic idea behind the AR, or frequency,method which is perhaps the most straightforward approach for estimatingthe probability in Eq. (2). The AR method draws independent identicallydistributed (iid) random variables zðgÞi � N ðX ib;XÞ for g ¼ 1; . . . ;G. Drawsthat satisfy zðgÞi 2 Bi are accepted, whereas those that do not are rejected.The probability in Eq. (2) is then calculated as the proportion of accepteddraws

bPrðyijb;XÞ ¼ G�1XGg¼1

1fzðgÞi 2 Big (3)

The AR approach is very simple and intuitive and is easy to implementwith a variety of distributions for the random terms. This estimator has been


applied in discrete choice problems by Lerman and Manski (1981);additional discussion and applications of AR methods are offered inDevroye (1986) and Ripley (1987).

However, for a given finite number of draws, the AR approach hasa number of pitfalls especially when used in the context of likelihoodestimation. One is that the estimated probability is not strictly boundedbetween 0 and 1, and there is a positive probability of obtaining an estimateon the boundary, which can cause numerical problems when taking thelogarithm of the estimated probability. The more important problem withthe AR method, however, is the lack of differentiability of the estimatedprobability with respect to the parameter vector. Because the ARprobability in Eq. (3) has the form of a step function with respect to theparameters, the simulated probability is either constant or jumps by adiscrete amount with respect to a small change in the parameter values.These features of the estimator impede its use in numerical optimization andcomplicate the asymptotics of estimators that rely on it.

The difficulties of the AR method can be circumvented by replacing theindicator function 1fzi 2 Big in Eq. (2) with a smooth and strictly positivefunction. One strategy, suggested in McFadden (1989), is to approximatethe orthant probability as

Prðyijb;XÞ �Z

Kzib

� �f NðzijX ib;XÞdzi (4)

where Kð�Þ is a smooth kernel function, for example the logistic cumulativedistribution function (cdf), and ba0 is a scale factor that determines thedegree of smoothing. It can easily be seen that the function Kðzi=bÞapproaches 1fzi 2 Big as b! 0. This approach avoids the problem ofnondifferentiability, however, it comes at the cost of introducing a bias inthe estimate of the probability (Hajivassiliou, McFadden, & Ruud, 1996).Whereas the bias can be reduced by picking a value of b that is very close to0, doing so can potentially revive the problem of nondifferentiability if Kð�Þbegins to approximate the indicator function 1f�g too closely. In practice,therefore, the choice of b is not straightforward and must be done verycarefully. Other complications, for example, appropriate kernel selection,could arise with this approach when Bi is bounded from both below andabove as in ordinal probit and copula models.

Another strategy for overcoming the difficulties of estimating Eq. (2) wasdeveloped by Stern (1992) and relies on a particular decomposition of thecorrelation structure in Eq. (1). The basic idea underlying the Stern method


is to decompose the error component ei in Eq. (1) into the sum of twoterms – one that is correlated and another one that contains orthogonalerrors. In particular, the Stern simulator is based on rewriting the model as

zi ¼ vi þ wi

where vi � N ðX ib;X� KÞ and wi � N ð0;KÞ with K ¼ lI . Note that themean X ib can be incorporated either in vi or wi, or in the limits ofintegration (these representations are equivalent). Moreover, as a matter ofsimulation efficiency, Stern (1992) suggests that l should be chosen as largeas possible subject to leaving ðX� KÞ positive definite. This is done bysetting l close to the smallest eigenvalue of X.

With this decomposition, the likelihood contribution in Eq. (2) can berewritten as

Prðyijb;XÞ ¼ZBi

f NðzijX ib;XÞdzi

¼

ZCi

Z þ1�1

f Nðwij0;KÞf NðvijX ib;X� KÞdvidwi

where the change of variable implies that Ci ¼ Ci1 � � � � � CiJ with Cij ¼ð�1;�vijÞ if yij ¼ 0 and Cij ¼ ½�vij ;1Þ if yij ¼ 1. Because the independentelements of wi have a Gaussian density, which is symmetric, this probabilitycan be expressed as

Prðyijb;XÞ ¼Z YJ

j¼1

Fð�1Þ1�yij vijffiffiffi

lp

� �" #f NðvijX ib;X� KÞdvi

where Fð�Þ denotes the standard normal cdf. Estimation of this integral thenproceeds by

bPrðyijb;XÞ ¼ 1

G

XGg¼1

YJj¼1

Fð�1Þ1�yij v

ðgÞijffiffiffi

lp

!( )

where vðgÞi � N ðvijX ib;X� KÞ for g ¼ 1; . . . ;G.Another popular method is the GHK algorithm which builds upon

simulation techniques for multivariate truncated normal distributions thatwere pioneered by Geweke (1991) and has been successfully implemented ina variety of problems in cross-sectional, time series, and panel data settings.The GHK algorithm has been extensively studied in Borsch-Supan andHajivassiliou (1993), Hajivassiliou and Ruud (1994), Keane (1994),


Hajivassiliou et al. (1996), and Hajivassiliou and McFadden (1998) and hasbeen carefully reviewed in Train (2003).

The insight behind the GHK algorithm is that one can design a tractableimportance density that could facilitate simulation-based estimation bywriting the model as

zi ¼ X ibþ Lgi; gi � N ð0; IÞ (5)

where L is a lower triangular Cholesky factor of X with elements lijsuch that LL0 ¼ X. Because the entries in gi are independent and L islower triangular, a recursive relation between the elements of zi canbe established to produce the importance density used in the GHKalgorithm

hðzijyi;b;XÞ ¼ hðzi1jyi1;b;XÞhðzi2jzi1;yi2;b;XÞ � � �hðziJ jzi1; . . . ; zi;J�1;yiJ ;b;XÞ

¼YJj¼1

hðzijjfzikgkoj ;yij ;b;XÞ ð6Þ

and the terms in the product are restricted to the set Bi by letting

hðzijjfzikgkoj ;yij ;b;XÞ ¼ f TNBijzijjx

0ijbþ

Xj�1k¼1

ljkZik; l2jj

!

¼ 1fzij 2 Bijg f N zijjx0ijbþ

Xj�1k¼1

ljkZik; l2jj

!=cij

where cij ¼ Fðð�1Þð1�yij Þðx0ijbþPj�1

k¼1ljkZikÞ=ljjÞ is the normalizing constant ofthe truncated normal density f TNBij

ðzijjx0ijbþ

Pj�1k¼1ljkZik; l

2jjÞ. As a result,

taking the product in Eq. (6) produces

hðzijyi;b;XÞ ¼

QJj¼11fzij 2 Bijgf N zijjx

0ijbþ

Pj�1k¼1ljkZik; l

2jj

� �QJ

j¼1cij

¼1fzi 2 Big f NðzijX ib;XÞQJ

j¼1cij


upon which one could write Eq. (1) as

Prðyijb;XÞ ¼ZBi

f NðzijX ib;XÞdzi

¼

ZBi

f NðzijX ib;XÞhðzijyi;b;XÞ

hðzijyi;b;XÞdzi

¼

ZBi

f NðzijX ib;XÞ

f NðzijX ib;XÞ=QJ

j¼1cijhðzijyi;b;XÞdzi

¼

ZBi

YJj¼1

cij

( )hðzijyi;b;XÞdzi

(7)

Therefore, Prðyijb;XÞ can be estimated as

bPrðyijb;XÞ ¼ 1

G

XGg¼1

YJj¼1

cðgÞij

with draws zðgÞi obtained recursively as zðgÞij � hðzijjfz

ðgÞik gkoj ;yij ;b;XÞ for

j ¼ 1; . . . ;J � 1, and g ¼ 1; . . . ;G, using techniques such as the inverse cdfmethod (see, e.g., Devroye, 1986) or simulation-based techniques such asthose proposed in Robert (1995).

Both the Stern and GHK methods provide continuous and differentiablemultivariate probability estimates. They also typically produce smallerestimation variability than the AR method because the simulatedprobabilities are strictly bounded between 0 and 1, whereas each draw inthe AR method gives either 0 or 1. However, all three methods suffer from acommon problem that can often produce difficulties. In particular, in allthree approaches, the simulation draws come from proposal distributionsthat differ from the truncated normal distribution of interest, TN Bi

ðX ib;XÞ.When this disparity is large, the efficiency of all methods can be adverselyaffected. For example, it is easy to recognize that the AR method providesa sample from the unrestricted normal distribution N ðX ib;XÞ, the Sternmethod generates draws from the normal distribution N ðX ib;X� KÞ, whileGHK simulation relies on the recursive importance density in Eq. (6) inwhich draws depend only on the restrictions implied by yij but ignore therestrictions implied by subsequent fyikgk4j. These mismatches betweenthe proposal and target densities may adversely affect the efficiency of theAR, GHK, and Stern methods. We next introduce a class of simulatedlikelihood methods which are, in fact, based on draws from the truncateddensity of interest.


3. MCMC METHODS

The calculation of multivariate integrals that generally have no analyticalsolution has been an important research area in Bayesian statistics.In particular, a key quantity of interest in Bayesian model comparison isthe marginal likelihood, which is obtained by integrating the likelihoodfunction with respect to the prior distribution of the parameters (for adiscussion, see Kass & Raftery, 1995, and the references therein). It is one ofthe basic goals of this chapter to link the simulated likelihood literature withthat on Bayesian model choice in order to introduce MCMC methods asnew and viable approaches to simulated likelihood estimation in discretedata analysis. Another goal is to develop new MCMC methods that arespecifically tailored to simulated likelihood estimation. Our third goal is toprovide an efficient simulation method for sampling zi � TN Bi

ðX ib;XÞ,which is particularly important in this class of models but also has broadramifications beyond simulated likelihood estimation. These goals arepursued in the remainder of this section.

3.1. The CRB Method

To see the common fundamentals between outcome probability estimationand Bayesian model choice, and to establish the framework for the estima-tion methods that will be discussed subsequently, we begin by rewritingthe expression for Prðyijb;XÞ. In particular, note that we can write theprobability in Eq. (2) as

Prðyijb;XÞ ¼Z

1fzi 2 Big f NðzijX ib;XÞdzi ¼1fzi 2 Big f NðzijX ib;XÞ

f TNðzijX ib;XÞ(8)

which can be interpreted in terms of Bayes formula based on the recognitionthat the indicator function 1fzi 2 Big actually gives PrðyijziÞ and hencecan be treated as a ‘‘likelihood,’’ f NðzijX ib;XÞ can be treated as a ‘‘prior’’because it does not respect the truncation implied by yi, and f TNðzijX ib;XÞcan be viewed as a ‘‘posterior’’ that accounts for the truncation constraintsreflected in yi. Thus, we can see that Prðyijb;XÞ can actually be viewed as a‘‘marginal likelihood,’’ that is, the normalizing constant of the ‘‘posterior’’f TNðzijX ib;XÞ. Even though the interpretation of Prðyijb;XÞ as thenormalizing constant of a truncated normal distribution is directly visiblefrom Eq. (2), its reinterpretation in terms of the quantities in Eq. (8) is usefulfor developing empirical strategies for its estimation. In fact, the equivalent


of Eq. (8) was used in Chib (1995) in developing his method for marginallikelihood estimation. This identity is particularly useful because, asdiscussed in Chib (1995), it holds for any value of zi 2 Bi, and thereforethe calculation is reduced to finding the estimate of the ordinatef TNðz

�i jX ib;XÞ at a single point z�i 2 Bi. In our implementation, an estimate

of the log-probability is obtained as

ln bPrðyijb;XÞ ¼ ln f Nðz�i jX ib;XÞ � lndf TNðz�i jX ib;XÞ (9)

where we take z�i to be the sample mean of the MCMC draws zðgÞi � TN Bi

ðX ib;XÞ, g ¼ 1; . . . ;G, and make use of the fact that the numeratorquantities 1fz�i 2 Big and f Nðz

�i jX ib;XÞ in Eq. (8) are directly available.

Draws zðgÞi � TNBiðX ib;XÞ can be produced by employing the Gibbs

sampling algorithm of Geweke (1991) in which a new value for zi isgenerated by iteratively simulating each element zij from its full-conditionaldensity zij � f ðzijjfzikgkaj ; yij ;b;XÞ ¼ TNBij

ðmij ; s2ijÞ for j ¼ 1; . . . ; J, where mij

and s2ij are the conditional mean and variance of zij given fzikgkaj , whichare obtained by the usual updating formulas for a Gaussian density. Notethat unlike the aforementioned importance sampling methods, a Gibbssampler constructed in this way produces draws from the exact truncatednormal distribution of interest and those draws will be used to estimatef TNðz

�i jX ib;XÞ, thereby leading to an estimate of Prðyijb;XÞ.

To estimate the ordinate f ðz�i jyi; b;XÞ ¼ f TNðz�i jX ib;XÞ, the joint density

is decomposed by the law of total probability as

f ðz�i jyi; b;XÞ ¼YJj¼1

f ðz�ijjyi; fz�ikgkoj ;b;XÞ

In the context of Gibbs sampling, when the full-conditional densities arefully known, Chib (1995) proposed finding the ordinates f ðz�ijjyi; fz

�ikgkoj ;

b;XÞ for 1ojoJ by Rao-Blackwellization (Tanner & Wong, 1987; Gelfand& Smith, 1990) in which the terms in the decomposition are represented by

f ðz�ijjyi; fz�ikgkoj ;b;XÞ ¼

Zf ðz�ijjyi; fz

�ikgkoj ; fzikgk4j ;b;XÞ

� f ðfzikgk4jjyi; fz�ikgkoj ;b;XÞdfzikgk4j

and estimated as

bf ðz�ijjyi; fz�ikgkoj ; b;XÞ ¼ G�1XGg¼1

f ðz�ijjyi; fz�ikgkoj ; fz

ðgÞik gk4j ;b;XÞ


where the draws fzðgÞik gk4j come from a reduced run in which the latent

variables fz�ikgkoj are fixed and sampling is over fzðgÞik gk�j � f ðfzikgk�jj

yi; fz�ikgkoj ; b;XÞ. Excluding z

ðgÞij from fz

ðgÞik gk�j yields draws fzikgk4j �

f ðfzikgk4jjyi; fz�ikgkoj ;b;XÞ that are required in the average. The ordinate

f ðz�i1jyi;b;XÞ is estimated with draws from the main MCMC run, while theordinate f ðz�iJ jyi; fz

�ikgkoJ ;b;XÞ is available directly, and hence the method

requires ðJ � 2Þ reduced MCMC runs. An advantage of this approach isthat it breaks a large-dimensional problem into a set of smaller and moremanageable steps and, at the cost of additional MCMC simulation, typicallyleads to very efficient estimates in many practical problems.

In the remainder of this chapter, we will refer to Chib’s method withRao-Blackwellization as the CRB method. This method provides a directapplication of existing MCMC techniques (Chib, 1995) to simulatedlikelihood estimation and forms an important benchmark case againstwhich other MCMC methods can be compared. Moreover, the CRBmethod provides continuous and differentiable probability estimates in thecontext of estimating Eq. (2), which distinguishes it from the other MCMCmethods referenced in Section 1. It will also form a basis for the newestimators that will be developed in the remainder of this section.

3.2. The CRT Method

Our first extension aims to address a potential drawback of Rao-Blackwellization, namely the cost of the additional reduced MCMCruns that it requires. For this reason, we examine a different way ofobtaining df TNðz�i jX ib;XÞ that is required in Eq. (8) or (9). An approachto density estimation which is based on the Gibbs transition kernel anddoes not entail reduced runs is discussed in Ritter and Tanner (1992).In particular, the Gibbs transition kernel for moving from zi to z�i is given bythe product of well-known univariate truncated normal full-conditionaldensities

Kðzi; z�i jyi;b;XÞ ¼

YJj¼1

f ðz�ijjyi; fz�ikgkoj ; fz

ðgÞik gk4j ;b;XÞ (10)

Because the full-conditional densities are the fundamental building blocksof the Gibbs sampler, the additional coding involved in evaluating Eq. (10)is minimized. By virtue of the fact that the Gibbs sampler satisfiesMarkov chain invariance (see, e.g., Tierney, 1994; Chib & Greenberg, 1996),


we have that

f TNBiðz�i jX ib;XÞ ¼

ZKðzi; z

�i jyi;b;XÞ f TNBi

ðzijX ib;XÞdzi (11)

which was exploited for density estimation in Ritter and Tanner (1992).Therefore, an estimate of the denominator in Eq. (8) can be obtainedby invoking Eq. (11) and averaging the transition kernel Kðzi; z�i jyi; b;XÞwith respect to draws from the truncated normal distributionzðgÞi � TNBi

ðX ib;XÞ, that is

df TNBiðz�i jX ib;XÞ ¼

1

G

XGg¼1

KðzðgÞi ; z�i jyi; b;XÞ (12)

As in the CRB method, the random draws zðgÞi required in the average aregenerated by a Gibbs sampler that iteratively simulates each element zij fromits full-conditional distribution zij � f ðzijjfzikgkaj ;b;XÞ for j ¼ 1; . . . ; J.

Because this method combines the marginal likelihood estimationapproach of Chib (1995) with the density ordinate estimation approach ofRitter and Tanner (1992), it will be referred to as the CRT method in theremainder of this chapter. Several remarks about the CRT method and itsrelationship to CRB can be made. First, because the CRT and CRBmethods are continuous and differentiable, they are applicable in maximumlikelihood estimation and other problems that require differentiation.Second, in contrast to CRB, CRT does not require reduced run simulationas all ordinates are estimated with draws from the main MCMC run.However, CRT may require storage for the latent variables fzðgÞi g, becausethe point z�i , typically taken to be the mean of f TNBi

ðzijX ib;XÞ, may not beavailable during the main MCMC run, thus preventing concurrentevaluation of KðzðgÞi ; z�i jyi; b;XÞ. If storage is a problem, then estimationcan involve some limited amount of precomputation such as a short MCMCrun to determine z�i for subsequent evaluation of the Gibbs kernel. Note,however, that such a problem rarely presents itself in Bayesian studies wherez�i may be readily available from MCMC runs conducted during theestimation of b and X. Third, note that in bivariate problems CRB will bemore efficient than CRT because it does not involve any reduced runsand only requires estimation of f ðz�i1jyi; b;XÞ, whereas f ðz�i2jyi; z

�i1;b;XÞ is

directly available. Finally, the main ideas stemming from the CRB and CRTapproaches – that response probability evaluation can be reduced to findinga density ordinate and that the Gibbs kernel can be employed in estimatingthis density ordinate – will form a foundation for the methods that we


discuss next. The key distinction between the alternatives that we considerhas to do with the way in which the sample of latent data fzðgÞi g is generated.

3.3. The ARK Method

Our second extension deals with the AR estimator. As discussed in Section2, the AR approach can be appealing because of its simplicity and ease ofimplementation, but can be problematic because of its nondifferentiabilityand discontinuity and the potential for numerical instability when estimatingprobabilities near 0 or 1. In this section, we show that the integration ofMCMC theory into AR sampling can produce a method that circumventsmany of the drawbacks of standard AR estimation. An important advantageof the proposed method relative to the estimator in Eq. (4) is that continuityand differentiability are introduced without sacrificing simulation consis-tency or requiring additional tuning parameters. Because the approachcombines the AR simulator with the kernel of the Gibbs sampler, we willrefer to it as the ARK method.

The derivation of the ARK method is fairly uncomplicated. It proceedsby simply rewriting the invariance condition in Eq. (11) as

f TNBiðz�i jX ib;XÞ ¼

ZKðzi; z

�i jyi;b;XÞ f TNBi

ðzijX ib; XÞdzi

¼

ZKðzi; z

�i jyi;b;XÞ1fzi 2 Big f NðzijX ib;XÞdzi ð13Þ

which suggests a straightforward way of producing an estimatedf TNBiðz�i jX ib;XÞ that can be used to obtain bPrðyijb;XÞ by Eq. (8) or (9).

Specifically, from Eq. (13) it follows that f TNBiðz�i jX ib;XÞ can be estimated

by drawing zi � N ðX ib;XÞ, accepting only draws that satisfy zi 2 Bi, andusing those draws to average Kðzi; z�i jyi; b;XÞ as in Eq. (12).

At this point, it may be helpful to review the main pros and cons of ARKestimation in some detail. First, the ARK method retains the simplicity ofAR sampling, while simultaneously offering continuous, differentiable, andsimulation consistent estimates of Prðyijb;XÞ based on the Gibbs kernel(even though simulation of fz

ðgÞi g does not involve Gibbs sampling as in CRB

or CRT). Second, because ARK subsumes the traditional AR estimator,the AR estimate will also typically be available as a by-product of ARKestimation. Third, although both ARK and CRT average the kernel inEq. (12) using latent data zi � TNBi

ðX ib;XÞ, the fact that the latent data are


obtained by either AR or Gibbs sampling can have important implicationsfor the relative efficiency or ARK versus CRT. To see this, consider Fig. 1.The figure shows that with low correlations, the Gibbs sampler cantraverse the parameter space relatively quickly, without inducing muchserial dependence in the sampled fzðgÞi g. When the elements of zi arehighly correlated, however, iterative sampling of the full-conditionaldistributions produces relatively small Gibbs steps that lead to slow mixingof the Markov chain. In contrast, ARK provides an independent sample ofdraws whose mixing is unaffected by the extent of correlation between theelements of zi.

One should keep in mind, however, that this advantage of the ARKapproach comes at the cost of a well-known problem with AR samplers thattoo many rejections may occur if Prðzi 2 Bijb;XÞ is relatively low, therebyadversely affecting simulation efficiency. In some cases, this problem may beremedied by estimating Prðzi 2 Bc

i jb;XÞ because when the probability of Bi

is small, that of its complement Bci must be relatively large. However, we

0 1 2 3 4

0

1

2

3

4

Low

Cor

rela

tion

Cas

e

Gibbs Sampling

0 1 2 3 4

0

1

2

3

4

Independent Sampling

0 1 2 3 4

0

1

2

3

4

Hig

h C

orre

latio

n C

ase

0 1 2 3 4

0

1

2

3

4

Fig. 1. Gibbs vs. Independent Sampling from Distributions with Varying Degrees

of Correlation.


caution that in doing so, one must be careful to ensure that a number oftechnical requirements are met. In particular, while the set Bi is convex,its complement Bc

i need not be. As a result, some choices of z�i 2 Bci may

potentially introduce nondifferentiability in the estimate of Prðzi 2 Bci jb;XÞ

because the kernel Kðzi; z�i jyi; b;XÞ may not be strictly positive for all fzig.Even worse, for some settings of b and X the nonconvexity of Bc

i may leadto near reducibility of the Markov chain on Bc

i , rendering convergence andkernel estimation altogether problematic. Therefore, ARK estimation ofPrðzi 2 Bc

i jb;XÞ should only be attempted after careful consideration of theaforementioned issues.

3.4. The ASK Method

In this section, we discuss an approach which aims to improve the qualityof the sample of fzig that is used in estimation by addressing some of thesimulation difficulties discussed in Section 3.3. Another look at Fig. 1suggests that improving the mixing properties of the Gibbs sampler inproblems with high correlation would be key to reducing the serialdependence in the MCMC sample zi � TNBi

ðX ib;XÞ which, in turn, canreduce the sampling variability of the average in Eq. (12). Moreover, thediscussion in Section 3.3 also indicates that Gibbs sampling has importantadvantages over AR sampling because every Gibbs draw satisfies zi 2 Bi,whereas meeting this requirement may lead to large rejection rates in ARsimulation.

In developing the method, we link a variety of approaches and introducea new adaptive MCMC algorithm for simulating zi � TN Bi

ðX ib;XÞ whichimproves the quality of the MCMC sample. We build upon Chib (1995)to relate estimation of Prðzi 2 Bijb;XÞ to that of f TNBi

ðz�i jX ib;XÞ, rely onideas from Ritter and Tanner (1992) to obtain the latter quantity, and usefull-conditional truncated normal sampling (see Geweke, 1991), but withthe key difference that our proposed Gibbs algorithm improves mixing byadaptively sampling either the latent fzig or a particular transformation ofthose variables. Specifically, we use the Mahalanobis transformation to mapfzig into a priori independent standard normal variables fgig such as thoseused in Eq. (5) to develop the recursive conditioning importance densityof the GHK estimator. Due to the particular combination of inputs thatappear in this method, in the remainder of this chapter, we shall refer to it asthe adaptive sampling kernel (ASK) method.


The ASK approach proceeds along the following lines. We write themodel as zi ¼ X ibþ Lgi, where gi � N ð0; IÞ and L is a lower triangularCholesky factor such that LL0 ¼ X. Then, solving for gi, we obtaingi ¼ L�1ðzi � X ibÞ, which is the Mahalanobis transformation of zi. Eventhough the elements of gi are a priori independent, it is important to notethat conditioning on yi introduces dependence through the constraintson each Zij in the full-conditional distributions f ðZijjfZikgkaj ; yi;b;XÞ,j ¼ 1; . . . ; J. To see this, note that the constraints on Zij are obtainedfrom those on zi by solving the system zi ¼ X ibþ Lgi, and that Zij enters allequations for which the elements in the jth column of L are not zero (L islower triangular by construction, but it can possibly contain zero elementsbelow the main diagonal). Let Eijk denote the feasible region for Zij impliedby the kth equation, and let Eij ¼ \

Jk¼jEijk and Ei ¼ fgi : Zij 2 Eij ;8jg.

Readers may recall that some constraints arising from yi are ignored in theGHK method in order to obtain a tractable importance density. However,all constraints must be incorporated in the sequence of Gibbs steps

½ZijjfZikgkaj ; yi;b;X� � TN Eij ð0; 1Þ; j ¼ 1; . . . ; J

leading to the Gibbs kernel

Kðgi; gy

i jyi; b;XÞ ¼YJj¼1

f ðZyijjyi; fZy

ikgkoj ; fZðgÞik gk4j ; b;XÞ (14)

so that MCMC simulation produces gi � TN Eið0; IÞ that correspond to

zi � TNBiðX ib;XÞ.

Some intuition about the mechanics of the ASK approach can be gleanedfrom Fig. 2, which relates the sets Bi and Ei implied by observing yi ¼ 12.The Mahalanobis transformation demeans, orthogonalizes, and rescales thedraws zi to produce gi, but these operations also map Bi into Ei by shiftingthe vertex of the feasible set and rotating its boundaries (the axes) dependingon the sign of the covariance elements in X. Note that because Zij entersthe equations for fzikgk�j, updating Zij corresponds to simultaneouslyupdating multiple elements of zi; conversely, updating zij affects all elementsfZikgkj that enter the jth equation. The key feature of the transformationthat will be exploited here is that it offers a trade-off between correlation(in the case of zi) and dependence in the constraints (for the elements of gi)and implies that important benefits can be obtained by adaptively samplingthe elements of zi or those of the Mahalanobis transformation gi.

To understand the trade-offs between Gibbs simulation of Zij �f ðZijjfZikgkaj ; yi;b;XÞ as a way of obtaining gi � TN Ei

ð0; IÞ and the resultant


zi ¼ X ibþ Lgi, and Gibbs sampling of zij � f ðzijjfzikgkaj ;b;XÞ which yieldszi � TNBi

ðX ib;XÞ directly, consider a setting where X contains highcorrelations but the constraints implied by yi are relatively mildly binding.In this case, it will be beneficial to simulate gi because f TNEi

ðgij0; IÞ !f Nðgij0; IÞ as Prðgi 2 EiÞ ! 1 and drawing gi produces a sample that will beclose to iid. In contrast, a traditional Gibbs sampler defined on the elementsof zi will exhibit high serial correlation between successive MCMC drawsbecause such a sampler must traverse large portions of the support by takingsmall steps (recall the discussion of Fig. 1). Note also that as the correlationsin X increase toward 1 for similar components of yi or decrease toward �1for dissimilar components of yi, the feasible sets tend to be binding on one

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

z1

z 2

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

η1η 2

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

z1

z 2

−3 −2 −1 0 1 2 3

−3

−2

−1

0

1

2

3

η1

η 2

Fig. 2. Correspondence Between zi 2 Bi and the Mahalanobis Transform gi 2 Ei

for yi ¼ 12. The Mahalanobis Transform Orthogonalizes and Standardizes zi to

Produce gi, but Causes Dependence to be Reflected in the Boundaries of Ei.


Zij but not the other, and the MCMC sampler is well behaved. In othercases, it may be better to sample zi directly without transforming to gi, forexample, when the constraints Eij ¼ \

Jk¼jEijk on Zij are such that they slow

down the mixing of other fZikgkaj. Some of these scenarios, together withmeasures of sampling inefficiency (to be discussed shortly) for eachGibbs kernel (Kzð�Þ and KZð�Þ) are presented in Fig. 3. In yet other cases,for example, when correlations are low or the probabilities to be estimatedare small, the two sampling approaches will typically exhibit similar mixingand either approach will work well. However, in order to produce anMCMC sample that is as close to iid as possible, we have to be able toadaptively determine whether to simulate gi (and convert to zi) or sample zidirectly. Our proposed approach for doing so is presented next.

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

z1

z 2

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

z1

z 2

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

z1

z 2

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

�1

� 2

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

�1

� 2

−5 −3 −1 1 3 5−5

−3

−1

1

3

5

�1

� 2

0

5

10in

effic

ienc

y

0

5

inef

ficie

ncy

0

1

2

3

inef

ficie

ncy

Kz K�

Kz K�

Kz K�

Fig. 3. Performance of Kzð�Þ and KZð�Þ in Different Settings. Higher Inefficiencies

Indicate Stronger Autocorrelations Between Successive MCMC Draws fzig for EachKernel (1 Indicates iid Sampling).


Algorithm 1. Adaptive Gibbs Sampler for Multivariate TruncatedNormal Simulation

1. Initialize pg 2 ð0; 1Þ and let pz ¼ 1� pg;2. Given the current zi and the corresponding gi in the Markov chain, with

probability pg sample gi using the Gibbs kernel KZð�Þ in Eq. (14) andconvert to zi by Eq. (5), or, with probability pz sample zi directly usingthe Gibbs kernel Kzð�Þ in Eq. (10);

3. After a burn-in period, accumulate the sample fzig while keeping track ofthe draws obtained by KZð�Þ and Kzð�Þ;

4. Periodically update pg using a rule PZ : Rð2JÞ! ½0; 1� that maps the

autocorrelations from the two kernels to the closed interval [0,1]; PZ is anincreasing function in the autocorrelations of the draws produced by Kzð�Þ

and a decreasing function of those produced by KZð�Þ.

We now discuss Algorithm 1 in greater detail. From a theoretical pointof view, the algorithm is quite transparent: it is very simple to show that themixture of kernels, each of which converges to the target distribution,also converges to that distribution. Specifically, one only has to observethat invariance is satisfied for each kernel (see, e.g., Chib & Greenberg,1996) and therefore for any weighted average (mixture) of those kernels.An interesting observation, based on our experience with step 1, is thatgood mixing in the initial stages of sampling does not require that the bettermixing sampler be favored by the initial choice of pg and pz. In fact,a ‘‘neutral’’ probability choice pg ¼ pz ¼ 0:5 typically leads to a mixedsampler whose performance is more than proportionately closer to that ofthe more efficient transition kernel. The goal of steps 3 and 4 is to ensurethat if one kernel dominates the other on every margin (i.e., sampling ismore efficient for every element of zi), the mixed chain settles on thatmore efficient kernel; otherwise, the aim is to produce an adaptive Markovchain that strikes a balance between the two kernels in a way that reducesoverall inefficiency. There are many possible ways in which pg and pz can bedetermined depending on one’s aversion (as captured by some loss function)to slow mixing for each element of zi. In our examples, we considered thefollowing PZ:

pg ¼

1 if rz rZ

0 if rZ rz

w0rzw0rz þ w0rZ

otherwise

8>>><>>>:


where w is a vector of (loss function) weights, rz and rZ are J-vectorsof inefficiency measures for each element of zi under the two MCMCsampling kernels, and ‘‘’’ denotes element-by-element inequality. Let rjl �corrðz

ðgÞij ; z

ðg�lÞij Þ be the lth autocorrelation for the sampled draws of zij . Note

that in computing rjl for each of the two kernels in Algorithm 1, one shouldensure that the draws z

ðgÞij come from the kernel of interest, even though the

draws zðg�lÞij could have been generated by either kernel. As a quick and

inexpensive (but fairly accurate) measure of the inefficiency factors 1þP1l¼1rjl of the two samplers, we use the draws generated by Kzð�Þ to compute

rz½j� ¼ ð1� rj1Þ�1 and similarly base the computation of rZ½j� ¼ ð1� rj1Þ

�1

on draws generated by KZð�Þ. These expressions require calculation of onlythe first-order autocorrelations and minimize computational and book-keeping requirements but approximate the inefficiency factors rather well.Note also that in determining the mixing probabilities, the vector w couldcontain equal weights if the goal is to improve overall MCMC mixing.However, the weights can easily be adjusted when it may be desirableto weigh the mixing of a particular subset of zi more heavily, such as inproblems when a subset of zi must be integrated out. A final remark is that itwill typically suffice to update pg only a few times in the course of samplingand that the sampling probability tends to stabilize very rapidly, almostimmediately in the case of algorithms that exhibit widely diverging MCMCmixing properties.

The definition of the ASK simulator is completed by noting that oncea sample of draws fzðgÞi g is available, then estimation proceeds by Eqs. (12)and (9). We emphasize that while by construction it is true that

Prðyijb;XÞ ¼ZBi

f NðzijX ib;XÞdzi

¼

ZEif Nðgij0; IÞdgi

we do not use the second representation in estimation (and only rely on it insimulation) because after the Mahalanobis transformation the dependencein the constraints, seen in Fig. 2, implies that some values of gi will possiblylead to Kðgi; g

�i jyi; b;XÞ ¼ 0 which may lead to nondifferentiability of the

resulting probability estimate. This, however, is not a problem when thedraws fg

ðgÞi g are converted to fzðgÞi g and the kernel Kðzi; z�i jyi;b;XÞ is used in

estimation.


3.5. Summary and Additional Considerations

In this section, we have presented a variety of MCMC methods forestimating response probabilities in discrete data models. We have shownthat simulated likelihood estimation can proceed by adapting methodsfrom the Bayesian literature on marginal likelihood estimation and havedeveloped a set of new techniques designed to address features that arespecific to simulated likelihood evaluation. The methods are applicable inbinary, ordinal, censored, count, and other settings and can be easilyextended to handle mixtures and scale-mixtures of normal distributions thatinclude the Student’s t-link and logit models, among others (see, e.g.,Andrews & Mallows, 1974; Poirier, 1978; Albert & Chib, 1993; Geweke,1993), and to models with heteroskedasticity (Gu, Fiebig, Cripps, & Kohn,2009). Moreover, even though for most of the approaches presentedhere we have discussed Gibbs kernel versions of estimating the posteriorordinate (as in Ritter & Tanner, 1992), we emphasize that it is possibleto use Rao-Blackwellization as in Section 3.1, which can be desirable inhigh-dimensional problems or in settings where natural groupings of thelatent variables may be present.

An important goal of this chapter is to consider approaches for obtainingMCMC samples fzig that result in better mixing of the Markov chainand improved efficiency of estimation. The improvements in simulationmade possible by Algorithm 1 have ramifications not only for estimationof response probabilities, but also for problems in which high-qualitysamples from a truncated normal distribution are needed. For example,Chib’s approach, which was discussed in Section 3.1, can be combined withthe output of Algorithm 1 to further improve its efficiency. Many of themethods discussed here can also be combined with recently developedsimulation techniques such as slice sampling (Neal, 2003; Damien &Walker,2001) and antithetic draws such as those produced by reflection samplersand Halton sequences (see, e.g., Tierney, 1994; Train, 2003; Bhat, 2001,2003). In this chapter, we focused on algorithms that provide continuousand differentiable probability estimates but have also cited a number ofimportant MCMC approaches that lead to nondifferentiable estimates.It is useful to keep in mind that many of these latter methods can still beapplied in optimization algorithms that do not require differentiation – forexample, in simulated annealing (Goffe, Ferrier, & Rogers, 1994) andparticle swarming (Kennedy & Eberhart, 2001), although such algorithmsinvolve computationally intensive stochastic search that typically requiresnumerous evaluations of the objective function.


Finally, we remark that although our discussion has centered on discretedata models, the techniques developed in this chapter are directly applicableto the computation of p-values for multivariate directional hypothesis tests.

4. COMPARISON OF SIMULATORS

We carried out a simulation study to examine the performance of thetechniques proposed in Section 3 and compare them to the methodsdiscussed in Section 2. In particular, we report estimates of the probability

Prðyjl;XÞ ¼ZBf Nðzjl;XÞdz

under several settings of l and X. Because the problem of estimating anyorthant probability can always be represented as an equivalent problem ofestimating the probability of another orthant by simple rotation of thespace, without loss of generality, we let y be a J-dimensional vector of ones,and hence B is the positive orthant. We vary the dimension of integrationfrom J ¼ 3 to J ¼ 12 in increments of 3. In each case, we consider threesettings of l and four settings of X. Specifically, when J ¼ 3, we let lA ¼ð0; 0:5; 1Þ0 be the value of l that makes y ‘‘likely’’, lB ¼ ð�0:5; 0; 0:5Þ

0 as the‘‘intermediate’’ value of l, and lC ¼ ð�1;�0:5; 0Þ

0 as the ‘‘least likely’’value. For J ¼ 6 the ‘‘likely,’’ ‘‘intermediate,’’ and ‘‘least likely’’ values areobtained by setting l ¼ ðl0A;l

0AÞ0 or l ¼ ðl0B; l

0BÞ0 or l ¼ ðl0C;l

0CÞ0, respec-

tively. The means are similarly constructed for higher values of J. We use acovariance matrix X of the type X½k; j� ¼ rjk�jj, that is,

X ¼

1 r r2 � � � rJ�1

r 1 r � � � rJ�2

r2 r 1 ...

..

. . .. . .

.r2

1 r

rJ�1 rJ�2 � � � r2 r 1

0BBBBBBBBBBB@

1CCCCCCCCCCCA

where r 2 f�0:7;�0:3; 0:3; 0:7g, which allows for high and low positiveand negative correlations in the examples. Finally, the reported results forall simulators are based on simulation runs of length 10,000; for the three


simulators requiring MCMC draws (CRB, CRT, and ASK), the main run ispreceded by a burn-in of 1,000 cycles.

Tables 1–4 present results for different settings of l and X for J ¼ 3,J ¼ 6, J ¼ 9, J ¼ 12, respectively. We find that for low values of J andsettings of l that make y ‘‘likely,’’ all methods produce point estimatesthat agree closely. However, the variability differs widely across estimatorsand different settings of J, r, and l (note that the entries in parentheseshave to be divided by 100 to obtain the actual numerical standard errors(NSE) of the estimates). Among the traditional estimators, we see thatGHK outperforms AR and Stern, regardless of the values of J, r, and l.AR performs worst and can also fail in high-dimensional problems or inother settings where the outcome is ‘‘unlikely’’ and no draws are accepted.These findings for the traditional estimators are consistent with

Table 1. Log-Probability Estimates (J¼ 3) with Numerical StandardErrors (� 10�2) in Parentheses.

r AR STERN GHK CRB CRT ASK ARK

l¼lA �0.7 �1.574 �1.557 �1.556 �1.557 �1.557 �1.558 �1.560

(1.956) (0.896) (0.386) (0.086) (0.082) (0.085) (0.171)

�0.3 �1.396 �1.393 �1.392 �1.393 �1.393 �1.393 �1.394

(1.743) (0.414) (0.123) (0.018) (0.017) (0.017) (0.033)

0.3 �1.069 �1.059 �1.067 �1.065 �1.066 �1.065 �1.066

(1.382) (0.729) (0.080) (0.033) (0.033) (0.033) (0.051)

0.7 �0.835 �0.840 �0.836 �0.838 �0.836 �0.841 �0.834

(1.143) (0.895) (0.094) (0.265) (0.318) (0.239) (0.395)

l¼lB �0.7 �3.523 �3.498 �3.498 �3.502 �3.502 �3.502 �3.503

(5.736) (1.212) (0.543) (0.029) (0.026) (0.026) (0.176)

�0.3 �2.658 �2.664 �2.663 �2.665 �2.665 �2.665 �2.666

(3.642) (0.518) (0.170) (0.009) (0.009) (0.009) (0.040)

0.3 �1.862 �1.853 �1.867 �1.865 �1.865 �1.865 �1.865

(2.332) (1.151) (0.113) (0.023) (0.024) (0.023) (0.057)

0.7 �1.427 �1.406 �1.424 �1.423 �1.423 �1.424 �1.423

(1.780) (1.341) (0.134) (0.203) (0.253) (0.200) (0.427)

l¼lC �0.7 �7.824 �7.208 �7.208 �7.213 �7.213 �7.213 �7.220

(49.990) (1.463) (0.676) (0.008) (0.008) (0.008) (1.881)

�0.3 �4.656 �4.645 �4.645 �4.648 �4.648 �4.648 �4.648

(10.211) (0.607) (0.212) (0.004) (0.004) (0.004) (0.097)

0.3 �3.006 �2.979 �3.002 �3.000 �3.000 �3.000 �3.000

(4.382) (1.851) (0.148) (0.017) (0.017) (0.017) (0.065)

0.7 �2.248 �2.213 �2.237 �2.234 �2.235 �2.233 �2.230

(2.910) (2.105) (0.179) (0.175) (0.199) (0.172) (0.532)


earlier studies (e.g., Borsch-Supan & Hajivassiliou, 1993; Hajivassiliouet al., 1996).

The interesting finding from Tables 1–4 in this study is that the MCMC-based estimation methods perform very well. In fact, CRB, CRT, and ASKoutperform GHK most of the time, although the methods are roughly onpar in ‘‘likely’’ cases characterized by high values of r. However, in caseswith unlikely outcomes, the MCMC-based methods typically produce NSEthat are much lower than those of GHK. Although it could be argued thatsome of the efficiency of CRB comes at the cost of additional reduced runs,neither CRT nor ASK require reduced runs and are still typically moreefficient than GHK. These results present a strong case in favor of theproposed MCMC-based approaches. Even ARK, which similarly to ARcould also fail when no draws are accepted, seem to provide very efficient



l¼ 12�lA �0.7 �3.032 �3.097 �3.066 �3.074 �3.074 �3.076 �3.071

(4.444) (1.700) (0.643) (0.112) (0.121) (0.129) (0.576)

�0.3 �2.859 �2.831 �2.823 �2.828 �2.828 �2.828 �2.829

(4.056) (0.729) (0.235) (0.026) (0.027) (0.027) (0.128)

0.3 �2.039 �2.030 �2.041 �2.037 �2.037 �2.038 �2.036

(2.586) (1.237) (0.221) (0.049) (0.055) (0.053) (0.146)

0.7 �1.350 �1.355 �1.376 �1.371 �1.386 �1.372 �1.360

(1.691) (1.315) (0.433) (0.408) (0.539) (0.449) (0.818)

l¼ 12�lB �0.7 �7.131 �7.211 �7.164 �7.174 �7.175 �7.175 �7.195

(35.341) (2.846) (0.912) (0.035) (0.037) (0.036) (2.917)

�0.3 �5.473 �5.480 �5.469 �5.475 �5.475 �5.475 �5.473

(15.398) (0.969) (0.311) (0.013) (0.014) (0.014) (0.453)

0.3 �3.527 �3.518 �3.534 �3.527 �3.528 �3.528 �3.529

(5.746) (2.238) (0.297) (0.037) (0.041) (0.040) (0.183)

0.7 �2.294 �2.246 �2.283 �2.277 �2.280 �2.278 �2.271

(2.985) (2.178) (0.555) (0.351) (0.464) (0.379) (1.090)

l¼ 12�lC �0.7 – �15.476 �15.444 �15.458 �15.458 �15.458 –

– (4.408) (1.140) (0.010) (0.010) (0.010) –

�0.3 – �9.669 �9.656 �9.663 �9.663 �9.663 –

– (1.227) (0.374) (0.006) (0.006) (0.007) –

0.3 �5.497 �5.620 �5.629 �5.621 �5.621 �5.621 �5.624

(15.585) (4.486) (0.374) (0.027) (0.030) (0.028) (0.450)

0.7 �3.537 �3.518 �3.514 �3.498 �3.502 �3.503 �3.501

(5.776) (3.925) (0.730) (0.288) (0.366) (0.342) (1.745)


estimates that are close to those of the other estimators (provided at leastsome draws are accepted).

In comparing the MCMC approaches to each other, we see that the ASKestimates, as expected, are at least as efficient as those from CRT, butthat in many settings all three methods (ASK, CRT, and CRB) performsimilarly. This suggests that ASK (which nests CRT as a special case) maybe preferable to CRB in those cases because of its lower computationaldemands. The advantages of adaptive sampling by Algorithm 1 becomemore pronounced the higher the correlation r.

An important point to note, in light of the results presented in this sectionand in anticipation of the application in Section 5, is that precise estimationof the log-likelihood is essential for inference. For example, it is crucial



l¼ 13�lA �0.7 �4.585 �4.599 �4.610 �4.590 �4.588 �4.589 �4.566

(9.851) (2.578) (0.864) (0.155) (0.142) (0.156) (1.719)

�0.3 �4.200 �4.258 �4.269 �4.263 �4.263 �4.263 �4.260

(8.103) (0.957) (0.318) (0.035) (0.034) (0.032) (0.350)

0.3 �2.966 �3.001 �3.005 �3.008 �3.008 �3.009 �3.005

(4.292) (1.789) (0.326) (0.062) (0.071) (0.069) (0.318)

0.7 �1.885 �1.880 �1.890 �1.893 �1.901 �1.905 �1.906

(2.363) (1.784) (0.611) (0.563) (0.738) (0.519) (1.394)

l¼ 13�lB �0.7 – �10.877 �10.878 �10.846 �10.845 �10.846 –

– (5.093) (1.277) (0.046) (0.043) (0.040) –

�0.3 �7.824 �8.281 �8.293 �8.285 �8.285 �8.285 �8.449

(49.990) (1.312) (0.421) (0.017) (0.017) (0.016) (4.721)

0.3 �5.116 �5.157 �5.186 �5.190 �5.190 �5.190 �5.188

(12.871) (3.919) (0.440) (0.047) (0.053) (0.050) (0.657)

0.7 �3.128 �3.100 �3.107 �3.107 �3.113 �3.113 �3.090

(4.672) (3.294) (0.910) (0.457) (0.609) (0.520) (2.229)

l¼ 13�lC �0.7 – �23.764 �23.743 �23.702 �23.702 �23.702 –

– (8.661) (1.615) (0.012) (0.012) (0.011) –

�0.3 – �14.675 �14.689 �14.678 �14.678 �14.678 –

– (1.725) (0.505) (0.008) (0.008) (0.008) –

0.3 �8.112 �8.141 �8.236 �8.241 �8.242 �8.242 �8.100

(57.726) (9.785) (0.557) (0.035) (0.039) (0.037) (15.765)

0.7 �4.804 �4.743 �4.733 �4.741 �4.743 �4.738 �4.740

(10.998) (6.980) (1.264) (0.375) (0.518) (0.473) (4.517)


for computing likelihood ratio statistics, information criteria, marginallikelihoods, and Bayes factors for model comparisons and hypothesistesting. Estimation efficiency is also key to mitigating simulation biases (dueto Jensen’s inequality and the nonlinearity of the logarithmic transforma-tion) in the maximum simulated likelihood estimation of parameters,standard errors, and confidence intervals (see, e.g., McFadden & Train,2000, Section 3 and Train, 2003, Chapter 10).

To summarize, the results suggest that the MCMC simulated likelihoodestimation methods perform very well and dominate other estimationmethods over a large set of possible scenarios. Their performance improveswith the ability of the Markov chain to mix well, making Algoirthm 1 animportant component of the estimation process.



l¼ 14�lA �0.7 �5.914 �6.096 �6.084 �6.102 �6.101 �6.103 �6.129

(19.219) (3.775) (1.207) (0.170) (0.162) (0.180) (3.428)

�0.3 �5.599 �5.699 �5.696 �5.697 �5.697 �5.697 �5.685

(16.409) (1.220) (0.412) (0.040) (0.037) (0.038) (1.898)

0.3 �3.868 �3.961 �3.979 �3.979 �3.979 �3.978 �3.987

(6.844) (2.332) (0.389) (0.074) (0.078) (0.083) (0.481)

0.7 �2.429 �2.397 �2.410 �2.410 �2.417 �2.404 �2.365

(3.217) (2.326) (0.836) (0.628) (0.909) (0.747) (2.305)

l¼ 14�lB �0.7 – �14.504 �14.484 �14.516 �14.516 �14.516 –

– (8.462) (1.864) (0.049) (0.046) (0.047) –

�0.3 – �11.091 �11.092 �11.094 �11.094 �11.094 –

– (1.734) (0.547) (0.020) (0.019) (0.019) –

0.3 �6.725 �6.858 �6.850 �6.852 �6.851 �6.851 �6.818

(28.850) (5.393) (0.524) (0.055) (0.058) (0.062) (5.829)

0.7 �3.821 �3.923 �3.914 �3.933 �3.944 �3.937 �3.943

(6.683) (4.651) (1.213) (0.540) (0.709) (0.762) (3.553)

l¼ 14�lC �0.7 – �31.980 �31.901 �31.945 �31.945 �31.945 –

– (13.751) (2.411) (0.013) (0.012) (0.013) –

�0.3 – �19.684 �19.690 �19.693 �19.693 �19.693 –

– (2.327) (0.656) (0.009) (0.009) (0.009) –

0.3 – �11.090 �10.860 �10.862 �10.862 �10.861 –

– (12.964) (0.667) (0.041) (0.043) (0.046) –

0.7 �5.776 �6.044 �5.959 �5.972 �5.974 �5.961 �6.003

(17.933) (11.969) (1.718) (0.428) (0.642) (0.588) (7.573)


4.1. Computational Caveats and Directions for Further Study

In this chapter, we have compared a number of new and existing estimatorsfor a fixed Monte Carlo simulation size. Such comparisons are easy toperform and interpret in practically any simulation study. However, animportant goal for future research would be to optimize the code, performformal operation counts, and study the computational intensity of eachestimation algorithm. This would enable comparisons based on a fixedcomputational budget (running time), which are less straightforward andmore difficult to generalize because they depend on various nuances of thespecific application. In this section, we highlight some of the subtleties thatmust be kept in mind.

For instance, although AR and ARK are simple and fast, the computa-tional cost to achieve a certain estimation precision is entirely dependenton the context. Importance sampling and MCMC simulators such as GHK,CRT, CRB, and ASK, on the other hand, involve more coding and morecostly iterations, but they are also more reliable and statistically efficient,especially in estimating small orthant probabilities. Based on rough opera-tion counts, GHK, CRT, and ASK involve comparable computations andsimulations, while the efficiency of CRB depends on the number of reducedruns that is required.

The complications of optimizing these methods for speed, while retainingtheir transparency and reliability, go well beyond simply removing redundantcomputations (e.g., inversions, multiplications, conditional moment calcula-tions) and making efficient use of storage. Although these steps are essentialin producing efficient algorithms, another difficulty arises because randomnumber generators may have to rely on a mix of techniques in order to bereliable and general. For example, to simulate truncated normal draws closeto the mean, one can use the inverse cdf method. However, it is well knownthat the inverse cdf method can fail in the tails. Fortunately, in thosecircumstances the algorithms proposed in Robert (1995) are very efficientand reliable. Because in a given application, the estimation algorithms mayuse a different mix of these simulation approaches, the computational timesacross algorithms may not be directly comparable.

Another caveat arises due to the specifics of algorithms that rely onMCMC samplers and has to do with determining an appropriate way toaccount for the dual use (a benefit) and the need for burn-in sampling(a cost) of MCMC simulation. Specifically, in many instances MCMCdraws have dual use in addition to evaluation of the likelihood function(e.g., for computing marginal effects, point elasticities, etc.) or are already


available from an earlier step in the MCMC estimation (so likelihoodestimation requires no simulation but only the computation and averagingof certain conditional densities and transition kernels). Similarly, the costsof burn-in simulation would typically not be an issue in Bayesian studieswhere a Markov chain would have already been running during theestimation stage, but could be an additional cost in hill-climbing algorithms.Of course, for well-mixing Markov chains convergence and burn-in costsare trivial but should otherwise be properly accounted into the cost ofMCMC simulation.

These special considerations are compounded by having to examine theestimators in the context of different dimensionality, mean, and covariancematrix combinations, making a thorough computational examination of themethods an important and necessary next step in this area of research. Gaussprograms for the new estimators are available on the authors’ websites.

5. APPLICATION TO A MODEL

FOR BINARY PANEL DATA

This section offers an application of the techniques to the problem oflikelihood ordinate estimation in models for binary panel data. Inparticular, we consider data from Chib and Jeliazkov (2006) that dealswith the intertemporal labor force participation decisions of 1545 marriedwomen in the age range of 17–66. The data set, obtained from the PanelStudy of Income Dynamics, contains a panel of women’s working statusindicators (1 = working during the year, 0 = not working) over a 7-yearperiod (1979–1985), together with a set of seven covariates given in Table 5.The sample consists of continuously married couples where the husband is alabor force participant (reporting both positive earnings and hours worked)in each of the sample years. Similar data have been analyzed by Chib andGreenberg (1998), Avery, Hansen, and Hotz (1983), and Hyslop (1999)using a variety of techniques.

We considered two competing specifications that differ in their dynamics.For i ¼ 1; . . . ; n and t ¼ 1; . . . ;T , the first specification, model M1, isgiven by

yit ¼ 1f ~x0itdþ w0itbi þ gðsitÞ þ f1yi;t�1 þ f2yi;t�2 þ eit40g; eit � N ð0; 1Þ

and captures state dependence through two lags of the dependent variablebut does not involve serial correlation in the errors. The second


specification, model M2, involves only a single lag of the dependentvariable, but allows for AR(1) serial correlation in the errors:

yit ¼ 1f ~x0itdþ w0itbi þ gðsitÞ þ f1yi;t�1 þ eit40g;

eit ¼ rei;t�1 þ nit; nit � N ð0; 1Þ

Both M1 and M2 include mutually exclusive sets of covariates ~xit and wit,where the effects of the former, d, are modeled as common across women,and the effects of the latter, bi, are individual specific (random); the modelsalso include a covariate sit whose effect is modeled through an unknownfunction gð�Þ which is estimated nonparametrically. In both specifi-cations yit ¼WORKit, ~x0it ¼ ðRACEi;EDUit; lnðINCitÞÞ, sit ¼ AGEit,w0it ¼ ð1;CH2it;CH5itÞ, and heterogeneity is modeled in a correlated randomeffects framework which allows bi to be correlated with observables through

bi ¼ Aicþ bi; bi � N3ð0;DÞ (15)

We let all three random effects depend on the initial conditions, and theeffects of CH2 and CH5 also depend on average husbands’ earnings through

Ai ¼

�yi01 �yi0 lnðINCiÞ

1 �yi0 lnðINCiÞ

0B@

1CA

where neither ~xit nor the first row of Ai involves a constant term because theunknown function gð�Þ is unrestricted and absorbs the overall intercept.

Table 5. Variables in the Women’s Labor Force ParticipationApplication.

Variable Explanation Mean SD

WORK Wife’s labor force status (1¼working, 0¼not working) 0.7097 0.4539

INT An intercept term (a column of ones)

AGE The woman’s age in years 36.0262 9.7737

RACE 1 if black, 0 otherwise 0.1974 0.3981

EDU Attained education (in years) at time of survey 12.4858 2.1105

CH2 Number of children aged 0–2 in that year 0.2655 0.4981

CH5 Number of children aged 3–5 in that year 0.3120 0.5329

INC Total annual labor income of head of householda 31.7931 22.6417

aMeasured as nominal earnings (in thousands) adjusted by the consumer price index (base year

1987).


Upon substituting Eq. (15) into each of the two models, stacking theobservations over time for each woman, and grouping the common andindividual-specific terms, we can write models M1 and M2, similarly toEq. (1), in the latent variable form

zi ¼ X ibþ gi þ ei (16)

where zi ¼ ðzi1; . . . ; ziT Þ0, X i ¼ ð ~X i : W iAi : LiÞ, ~X i ¼ ð ~xi1; . . . ; ~xiT Þ

0,W i ¼ ðwi1; . . . ;wiT Þ

0, Li contains the requisite lags of yit, b ¼ ðd0; c0;/0Þ0,and gi ¼ ðgðsi1Þ; . . . ; gðsiTi

ÞÞ0. The errors ei ¼ ðei1; . . . ; eiT Þ

0 follow thedistribution ei � N ð0;XiÞ, where Xi ¼ RþW iDW 0i and R is the Toeplitzmatrix implied by the autoregressive process, that is, R ¼ IT for model M1

and R½j; k� ¼ rjj�kj=ð1� r2Þ for model M2. Because M1 requires two lagsof the dependent variable, both models are estimated conditionally on theinitial two observations in the data.

Our focus in this section is on the problem of estimating the log-likelihoodordinate conditionally on the model parameters. For details on theestimation of the parameters in the two models, interested readers arereferred to Chib and Jeliazkov (2006). As the model construction shows,both the cluster means and covariance matrices depend on cluster-specificcharacteristics, and hence the panel data setup with multidimensionalheterogeneity is quite useful for examining the performance of theestimators in a variety of possible circumstances that occur in the data.

Estimates of the log-likelihood function obtained by various methods arepresented in Table 6. The log-likelihood and NSE estimates were obtained

Table 6. Log-Likelihood Estimates in the Women’s Labor ForceParticipation Application.

Estimator Model M1 Model M2

Log-Likelihood NSE Log-Likelihood NSE

Traditional estimators

Stern �2501.435 (0.291) �2537.926 (0.573)

GHK �2501.434 (0.100) �2537.631 (0.137)

AR �2502.005 (2.355) �2540.702 (2.440)

MCMC-based estimators

CRB �2501.425 (0.027) �2537.593 (0.061)

CRT �2501.403 (0.039) �2537.572 (0.081)

ASK �2501.411 (0.036) �2537.563 (0.073)

ARK �2501.498 (0.090) �2537.898 (0.202)


from runs of length 10,000 draws for each likelihood contribution (thereare n ¼ 1545 likelihood contributions, one for each woman in the sample).The NSEs of the log-likelihood contribution estimates are presented inFigs. 4 and 5. The results in Table 6 and Figs. 4 and 5 show that in thisapplication, the new MCMC methods are more efficient than existingapproaches. While the argument can be made that the higher efficiency ofCRB is due to its reliance on additional reduced runs, the results also revealthat the remaining MCMC methods are also generally more efficient eventhough they do not require any reduced run simulations. We can also seethat the improvement in MCMC sampling due to Algorithm 1 used in theASK method leads to lower standard errors relative to CRT. A much morestriking improvement in efficiency, however, can be seen in a comparisonbetween the AR and ARK methods. What makes the comparison impressiveis that both methods are based on the same simulated draws (with the ARestimate being obtained as a by-product of ARK estimation), yet ARK isorders of magnitude more efficient.

0 500 1000 15000

0.5

NS

E

AR

0 500 1000 15000

0.01

0.02

NS

E

ARK

0 500 1000 1500

2468

10× 10−3

NS

E

GHK

0 500 1000 15000

1

2× 10−3

NS

E

CRB

0 500 1000 15000

1

2

× 10−3

NS

E

CRT

0 500 1000 15000

1

2

× 10−3N

SE

ASK

0 500 1000 15000

0.02

0.04

NS

E

Stern

GHK CRB CRT ASK0

5

10× 10−3

NS

E

Select NSE Boxplots

Fig. 4. Numerical Standard Error (NSE) Estimates for Model M1.


Comparison of the estimates for models M1 and M2 shows that allowingfor autocorrelated errors (the estimated value of r is �0:29), at the costof excluding a second lag of yit from the mean, has a detrimental effecton the efficiency of all estimators. While the relative efficiency rankings ofestimators are largely preserved as we move from M1 to M2 (with theexception of GHK and ARK), traditional methods appear to exhibit morehigh-variability outliers, whereas MCMC-based methods show a generalincrease in variability of estimation across all clusters (the plots for ARK,similarly to those of AR, show both features).

This section has considered the application of several simulatedlikelihood estimation techniques to a hierarchical semiparametric modelsfor binary panel data with state dependence, serially correlated errors,and multidimensional heterogeneity correlated with the covariates andinitial conditions. Using data on women’s labor force participation, wehave illustrated that the proposed MCMC-based estimation methods arepractical and can lead to improved efficiency of estimation in a variety of

0 500 1000 15000

0.2

0.4

NS

E

AR

0 500 1000 15000

0.05

NS

E

ARK

0 500 1000 1500

5

10

15× 10−3

NS

E

GHK

0 500 1000 15000

1

2

3× 10−3

NS

E

CRB

0 500 1000 15000

2

× 10−3

NS

E

CRT

0 500 1000 15000

2

× 10−3N

SE

ASK

0 500 1000 15000

0.05

NS

E

Stern

GHK CRB CRT ASK0

5

10

15× 10−3

NS

E

Select NSE Boxplots

Fig. 5. Numerical Standard Error (NSE) Estimates for Model M2.


environments occurring in a real-world data set. Comparisons of these andother simulated likelihood estimators in other model settings is an importantitem for future research.

6. CONCLUSIONS

This chapter considers the problem of evaluating the likelihood functionsin a broad class of multivariate discrete data models. We have reviewedtraditional simulation methods that produce continuous and differentiableestimates of the response probability and can be used in hill-climbingalgorithms in maximum likelihood estimation. We have also shown that theproblem can be handled by MCMC-based methods designed for marginallikelihood computation in Bayesian econometrics. New computationallyefficient and conceptually straightforward MCMC algorithms have beendeveloped for (i) estimating response probabilities and likelihood functionsand (ii) simulating draws from multivariate truncated normal distributions.The former of these contributions aims to provide simple, efficient, andsound solutions from Markov chain theory to outstanding problems insimulated likelihood estimation; the latter is motivated by the need to providehigh-quality samples from the target multivariate truncated normal density.A simulation study has shown that the methods perform well, while anapplication to a correlated random effects panel data model of women’slabor force participation shows that they are practical and easy to implement.

In addition to their simplicity and efficiency, an important advantage ofthe methods considered here is that they are modular and can be mixed andmatched as components of composite estimation algorithms in a variety ofmultivariate discrete and censored data settings. Important topics for futurework in this area would be to examine the effectiveness of the estimatorsin practical applications, to explore extensions and develop additionalhybrid approaches, and to perform detailed computational efficiency studiesin a range of contexts.

ACKNOWLEDGMENT

We thank the editors and two anonymous referees for their detailedcomments and helpful suggestions.


REFERENCES

Albert, J., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data.

Journal of the American Statistical Association, 88, 669–679.

Andrews, D. F., & Mallows, C. L. (1974). Scale mixtures of normal distributions. Journal of the

Royal Statistical Society–Series B, 36, 99–102.

Avery, R., Hansen, L., & Hotz, V. (1983). Multiperiod probit models and orthogonality

condition estimation. International Economic Review, 24, 21–35.

Bhat, C. R. (2001). Quasi-random maximum simulated likelihood estimation of the mixed

multinomial logit model. Transportation Research Part B, 35, 677–693.

Bhat, C. R. (2003). Simulation estimation of mixed discrete choice models using randomized

and scrambled halton sequences. Transportation Research Part B, 37, 837–855.

Borsch-Supan, A., & Hajivassiliou, V. A. (1993). Smooth unbiased multivariate probability


Journal of Econometrics, 58, 347–368.

Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical

Association, 90, 1313–1321.

Chib, S., & Greenberg, E. (1996). Markov Chain Monte Carlo simulation methods in

econometrics. Econometric Theory, 12, 409–431.

Chib, S., & Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika, 85,

347–361.

Chib, S., & Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output.


Chib, S., & Jeliazkov, I. (2005). Accept-reject Metropolis-Hastings sampling and marginal

likelihood estimation. Statistica Neerlandica, 59, 30–44.

Chib, S., & Jeliazkov, I. (2006). Inference in Semiparametric dynamic models for binary

longitudinal data. Journal of the American Statistical Association, 101, 685–700.

Damien, P., & Walker, S. G. (2001). Sampling truncated normal, beta, and gamma densities.

Journal of Computational and Graphical Statistics, 10, 206–215.

Devroye, L. (1986). Non-uniform random variate generation. New York: Springer-Verlag.

DiCiccio, T. J., Kass, R. E., Raftery, A. E., & Wasserman, L. (1997). Computing bayes factors

by combining simulation and asymptotic approximations. Journal of the American

Statistical Association, 92, 903–915.

Gelfand, A. E., & Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact

calculations. Journal of the Royal Statistical Society-Series B, 56, 501–514.

Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal

densities. Journal of the American Statistical Association, 85, 398–409.

Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions

subject to linear constraints. In: E. M. Keramidas (Ed.), Computing science and statistics,

Proceedings of the twenty-third symposium on the interface, Interface Foundation of

North America, Inc., Fairfax (pp. 571–578).

Geweke, J. (1993). Bayesian treatment of the independent student-t linear model. Journal of

Applied Econometrics, 8, S19–S40.

Geweke, J. (1999). Using simulation methods for Bayesian econometric models: Inference,

development, and communication. Econometric Reviews, 18, 1–73.

Goffe, W. L., Ferrier, G. D., & Rogers, J. (1994). Global optimization of statistical functions

with simulated annealing. Journal of Econometrics, 60, 65–99.


Greenberg, E. (2008). Introduction to Bayesian econometrics. New York: Cambridge University

Press.

Griffiths, W. E., Hill, R. C., & O’Donnell, C. J. (2006). A comparison of Bayesian and

sampling theory inferences in a probit model. In: M. Holt & J.-P. Chavas (Eds), Essays

in honor of Stanley R. Johnson, article 12. Available at http://www.bepress.com/

sjohnson/

Gu, Y., Fiebig, D. G., Cripps, E., & Kohn, R. (2009). Bayesian estimation of a random effects

heteroscedastic probit model. Econometrics Journal, 12, 324–339.

Hajivassiliou, V. A., & McFadden, D. (1998). The method of simulated scores for the

estimation of LDV models. Econometrica, 66, 863–896.

Hajivassiliou, V. A., McFadden, D. L., & Ruud, P. (1996). Simulation of multivariate normal

rectangle probabilities and their derivatives: Theoretical and computational results.


Hajivassiliou, V. A., & Ruud, P. (1994). Classical estimation methods for LDV models using

simulation. Handbook of Econometrics, 4, 2383–2441.

Heiss, F., & Winschel, V. (2008). Likelihood approximation by numerical integration on sparse

grids. Journal of Econometrics, 144, 62–80.

Hyslop, D. (1999). State dependence, serial correlation and heterogeneity in intertemporal labor

force participation of married women. Econometrica, 67, 1255–1294.

Kass, R., & Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association,

90, 773–795.


Econometrica, 95–116.

Kennedy, J., & Eberhart, R. C. (2001). Swarm intelligence. San Francisco, CA: Morgan

Kaufmann.

Koop, G. (2003). Bayesian econometrics. New York: John Wiley & Sons.

Lerman, S., & Manski, C. (1981). On the use of simulated frequencies to approximate

choice probabilities. Structural Analysis of Discrete Data with Econometric Applications,

305–319.

McFadden, D. (1989). A method of simulated moments for estimation of discrete response

models without numerical integration. Econometica, 57, 995–1026.

McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of

Applied Econometrics, 15, 447–470.

Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple

identity: A theoretical exploration. Statistica Sinica, 6, 831–860.

Neal, R. M. (2003). Slice sampling. The Annals of Statistics, 31, 705–767.

Newton, M. A., & Raftery, A. E. (1994). Approximate Bayesian inference with the weighted

likelihood bootstrap. Journal of the Royal Statistical Society-Series B, 56, 3–48.

Poirier, D. J. (1978). A curious relationship between probit and logit models. Southern

Economic Journal, 44, 640–641.

Ripley, B. D. (1987). Stochastic simulation. New York: John Wiley & Sons.

Ritter, C., & Tanner, M. A. (1992). Facilitating the Gibbs sampler: The Gibbs stopper and the

Griddy-Gibbs sampler. Journal of the American Statistical Association, 87, 861–868.

Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing, 5,

121–125.

Stern, S. (1992). A method for smoothing simulated moments of discrete probabilities in

multinomial probit models. Econometica, 60, 943–952.


http://www.bepress.com/sjohnson/

http://www.bepress.com/sjohnson/

Stern, S. (1997). Simulation-based estimation. Journal of Economic Literature, 35, 2006–2039.

Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data

augmentation. Journal of the American Statistical Association, 82, 528–549.

Tierney, L. (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22,

1701–1761.

Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University

Press.


THE PANEL PROBIT MODEL:

ADAPTIVE INTEGRATION

ON SPARSE GRIDS

Florian Heiss

ABSTRACT

In empirical research, panel (and multinomial) probit models are leadingexamples for the use of maximum simulated likelihood estimators.The Geweke–Hajivassiliou–Keane (GHK) simulator is the most widelyused technique for this type of problem. This chapter suggests analgorithm that is based on GHK but uses an adaptive version of sparse-grids integration (SGI) instead of simulation. It is adaptive in the sensethat it uses an automated change-of-variables to make the integrationproblem numerically better behaved along the lines of efficient importancesampling (EIS) and adaptive univariate quadrature. The resultingintegral is approximated using SGI that generalizes Gaussian quadraturein a way such that the computational costs do not grow exponentially withthe number of dimensions. Monte Carlo experiments show an impressiveperformance compared to the original GHK algorithm, especially indifficult cases such as models with high intertemporal correlations.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)0000026006

41

dx.doi.org/10.1108/S0731-9053(2010)0000026006

1. INTRODUCTION

Panel probit models are widely used specifications for modeling discrete(binary) dependent variables in panel data settings. Multinomial probitmodels are popular for modeling multinomial outcomes with flexiblesubstitution patterns. Both types of models suffer from the fact that theimplied outcome probabilities are not analytically tractable which impedesestimation. Consequently, a lot of research has been devoted to thesimulation-based estimation of these types of models. For a textbookdiscussion, see Greene (2008) and Train (2009).

Hajivassiliou, McFadden, and Ruud (1996) provide a comprehensivesurvey of different approaches and Monte Carlo simulations to comparetheir relative performance. They and other studies found the Geweke–Hajivassiliou–Keane (GHK) simulator to be most powerful among thedifferent alternatives. As a result, it is by far the most used approach forestimating multinomial and panel probit models and is routinely implemen-ted in software packages like LIMDEP and Stata. However, computationalcosts required to obtain accurate simulation estimators can be substantialdepending on the nature of the problem. For example, Geweke, Keane, andRunkle (1997) and Lee (1997) find that the simulation estimators can havepoor properties with strong serial correlation and long panels.

One commonly used general method of improving properties ofsimulation estimators is the use of deterministic sequences with certainproperties instead of random numbers from a (pseudo-) random numbergenerator. An especially successful sequence of numbers are Haltonsequences, see Bhat (2001). Another alternative to simulation is determi-nistic numerical integration. The problem of the most widely known methodof multivariate numerical integration is that the computational costs growexponentially with the number of dimensions making it infeasible for higherdimensions and impractical for an intermediate number of dimensions suchas 4 or 5. A much less common approach is integration on sparse grids, seeHeiss and Winschel (2008). It is easy to implement, and its computationalcosts only rise polynomially with the number of dimensions. In the setting ofmaximum simulated likelihood estimation, Heiss and Winschel (2008) applyit very successfully to problems with up to 20 dimensions.

Importance sampling has long been known to provide an opportunity toreformulate simulation problems. The term ‘‘efficient importance sampling’’(EIS) has been used for an algorithm that aims at an optimal reformula-tion in order to minimize simulation errors, see Richard and Zhang(2007). A related approach is called adaptive numerical integration, see

FLORIAN HEISS42

Rabe-Hesketh, Skrondal, and Pickles (2005). It reformulates the integrationproblem in a similar fashion as EIS and is typically used for univariateintegration problems.

This chapter combines these different aspects of the problem ofapproximating panel (and/or multivariate) probit probabilities and com-pares the relative performance in a simulation study. Section 2 introducesthe class of models and notation. Section 3 discusses the various dimensionsof the approximation problem. Section 4 presents the design and results ofa simulation study and Section 5 summarizes and concludes.

2. THE PANEL PROBIT MODEL

We are concerned with a panel data model for a binary dependent variableyit, with i denoting cross-sectional units and t ¼ 1; . . . ;T a time indicator.It is driven by a set of strictly exogenous regressors xit and an unobservedrandom variable ~eit. Dropping the individual subscripts for notationalconvenience, we make the parametric assumption that:

yt ¼1 if ~et � xtb

�1 otherwise

�(1)

where b is a vector of parameters to be estimated. Assume that the errorterms ~e1 through ~eT are independent of the covariates and jointly normallydistributed with means of zero and a general variance–covariance matrix.Its elements are parameters to be estimated subject to identifying normal-izations and potentially other constraints.

Let y ¼ ½y1; y2; . . . ; yT � denote the vector of dependent random variablesand d ¼ ½d1; d2; . . . ; dT � a corresponding vector of observed outcomes. Forthe estimation, for example, by maximum likelihood, the joint outcomeprobability

P :¼ Pr½y ¼ djx� (2)

needs to be evaluated. Define vt :¼ ð1=stÞðdtxtbÞ and et :¼ ð1=stÞðdt ~etÞ withst denoting the standard deviation of ~et. Then, et is a normally distributedrandom variable with mean 0 and variance–covariance matrix R, where thevariance of et is equal to unity and Covðet; esÞ ¼ ðdtds=stssÞðCovð ~et; ~esÞÞ.This allows the convenient expression for the joint outcome probability:

P ¼ Pr½e1 � v1; e2 � v2; . . . ; eT � vT jv� ¼ Pr½e � vjv� (3)

The Panel Probit Model: Adaptive Integration on Sparse Grids 43

This is the cumulative distribution function (CDF) of the joint normaldistribution, which does not have a closed-form expression and needs to beapproximated.

While this chapter is concerned with this type of panel data modelfor binary outcomes, multinomial probit models for cross-sectional andpanel data lead to a very similar structure and can be expressed in thesame form as Eq.(3), with an appropriately defined jointly normallydistributed vector of random variables e and observed values v. Thediscussion of how to approximate P is therefore the same for this classof models. To see this, consider the cross-sectional case. Each individualfaces the choice between T outcomes. She chooses outcome j, denoted asy ¼ j, if it delivers the highest utility, which is specified as a linear functionof alternative-specific regressors plus an alternative-specific error termxjbþ ~ej . Then

y ¼ j 3 ~ek � ~ej � ðxj � xkÞb 8k ¼ 1; . . . ;T (4)

With d denoting the observed outcome, define vt :¼ ðxd � xtÞb andet ¼ ~ek � ~ed . The relevant outcome probability is

P:¼Pr½y ¼ djx� ¼ Pr½e � vjv� (5)

where e and v are vectors of all random variables et and observed values vt,respectively, leaving out the d th values for which this inequality triviallyholds. The vector of random variables e is jointly normally distributedif the original error terms ~ej are. This is the same expression-the CDFof jointly normal random variables evaluated at a given vector-as Eq.(3).Similar normalization by the standard deviations can be applied.

3. SIMULATION AND APPROXIMATION

OF THE OUTCOME PROBABILITIES

3.1. (Pseudo) Monte Carlo Integration

Simulation estimators are nowadays very popular and routinely used inempirical work. For general discussions, see Gourieroux and Monfort(1996), Hajivassiliou and Ruud (1994), and Train (2009). For the panelprobit model, the conceptually simplest approach for the simulation ofthe joint outcome probability is the so-called crude frequency simulator

FLORIAN HEISS44

(CFS). Write

P ¼

Z 1�1

� � �

Z 1�1

1 e � v½ � f eðeÞ deT � � � de1

¼

ZRD

gðzÞ/ðzÞ dz ð6Þ

with gðzÞ:¼1½LRz � v� (7)

with f e denoting the probability density function (PDF) of the jointlynormally distributed e ¼ ½e1; . . . ; eT � and 1½�� being the indicator function.The second line is the result of a simple change of variables with LR denotingthe Cholesky decomposition of the variance–covariance matrix R and /ðzÞ

being the joint PDF of T i.i.d. standard normal random variables. The CFSmakes a number of draws z1; . . . ; zR from independent standard normals andapproximates the joint outcome probability as ~P

CFS¼ 1=R

PRr¼11½LRz

r � v�.The CFS will converge in probability to P as R!1, but for a finite R, thisintuitive simulator has unfavorable properties compared to other simulators.For example, ~P

CFSand the implied likelihood is a step function of the

parameters which impedes numerical maximization. A smoothed version ofthis simulator avoids these problems and leads to a better approximation fora given finite number of draws R, see McFadden (1989).

Stern (1992) suggests a partially analytic (PA) simulator for which theerror term is decomposed as e ¼ eI þ eD, where eI is a vector of i.i.d.normally distributed random variables with some variance sI and eD isanother vector of jointly normally distributed random variables withvariance–covariance matrix RD ¼ ðR� sI IÞ. This decomposition is ingeneral possible due to special features of the joint normal distribution.Stern suggests using the first eigenvalue of R as sI . Now the joint outcomeprobability can be solved partially analytically. Write

P ¼

Z 1�1

� � �

Z 1�1

Pr½eI þ eD � vjv; eD� f eDðeDÞ deDT � � � de

D1

¼

Z 1�1

� � �

Z 1�1

Pr½eI � v� LRDzjv; z�/ðzÞ dzT � � � dz1 ð8Þ

where eD ¼ LRDz and LRD is the Cholesky matrix of the covariance matrixRD. Because of the independence of eI , the conditional probability inside ofthe integral has an analytic expression. Namely,

P ¼

ZRD

gðzÞ/ðzÞdz (9)


with gðzÞ ¼YTt¼1

Fvt � eDt

sI

� �(10)

Stern’s decomposition simulator now makes draws z1; . . . ; zR from thestandard normal distribution and then evaluates

~PPA¼

1

R

XRr¼1

Pr½eI � v� LRDzrjv; zr� (11)

A comprehensive collection of these and many other simulators for thepanel and the multinomial probit models can be found in Hajivassiliou et al.(1996). Their Monte Carlo study finds that, in general, the GHK simulatorworks most accurately for a given number of draws and computationalburden – a result that has been confirmed by other studies as well.Therefore, the majority of empirical work makes use of the GHK simulator,and it is used by boxed routines implemented in software packages likeLIMDEP and Stata. It is based on work of Geweke (1989), Borsch-Supanand Hajivassiliou (1993), and Keane (1994).

While the GHK algorithm is often directly motivated by sampling fromappropriate joint distributions, it will be useful below to describe it in termsof an integral that can be easily approximated, for example, by simulation.Let L denote the Cholesky factorization of the variance/covariance matrix Rand let lt;s denote its element in row t and column s. Then, the joint outcomeprobability can be written as1

P ¼

ZRD

gðzÞ/ðzÞ dz (12)

with gðzÞ :¼YTt¼1

gtðUðzÞÞ (13)

g1ðuÞ :¼Fv1

l1;1

� �(14)

gtðuÞ :¼Fvt � lt;1h1ðuÞ � lt;2h2ðuÞ � � � � � lt;t�1ht�1ðuÞ

lt;t

� �(15)

and htðuÞ :¼F�1ðztgtðuÞÞ (16)

FLORIAN HEISS46

The GHK simulator starts from a sample z1; . . . ; zR from the independentnormal distribution or u1; . . . ; uR from the independent uniform distributionwith urt ¼ FðzrtÞ for all t and r. For each draw r, the values gtðu

rÞ can berecursively computed: g1ðu

rÞ actually does not depend on ur and is the samefor all draws. The value of g2ðu

rÞ depends on g1ðurÞ through h1ðuÞ. In the

next step, g3ðurÞ depends on g1ðu

rÞ and g2ðurÞ both of which have been

evaluated in the steps before. This is repeated until gT ðurÞ is evaluated.

The GHK simulator is then simply

~PGHK¼

1

R

XRr¼1

YTt¼1

gtðurÞ. (17)

3.2. Quasi Monte Carlo (QMC)

The integrals in Eqs.(6),(9), and (12) are all written as expectations overindependent standard normal random variables. Monte Carlo integrationapproximates them by making random draws from this distribution.2

A straightforward and often powerful way to improve the properties ofsimulation estimators are quasi-Monte Carlo (QMC) approaches. They canbe used in combination with all simulation estimators presented above.Instead of random draws, QMC algorithms use deterministic sequences ofnumbers. These sequences share some properties with random numbers, butare specifically designed to provide a better performance in Monte Carlosimulations. Intuitively, QMC draws cover the support of the integral moreevenly than random numbers. This leads to a more accurate approximationwith a finite number of draws and can – depending on the nature of theintegrand – also improve the rate of convergence.

One example of such sequences which has been found to work well forrelated problems are Halton sequences, Bhat (2001, 2003). For an intuitiveintroduction to Halton sequences, other QMC and related methods,see Train (2009). A more formal discussion is provided by Sloan andWozniakowski (1998).

3.3. Multivariate Numerical Integration

For the approximation of univariate integrals for estimation purposes,Gaussian quadrature is a powerful alternative to simulation. This is wellestablished at least since Butler and Moffit (1982). Gaussian quadrature


rules prescribe a given number R of nodes z1; . . . ; zR and correspondingweights w1; . . . ;wR for a given class of integration problems. An importantexample is the expectation of a function of a normally distributed randomvariable

R1�1

gðzÞfðzÞ dz which can be approximated using the Gauss–Hermite quadrature rule. The approximated integral is simply

PRr¼1gðz

rÞwr.While Monte Carlo integration chooses the nodes randomly and quasi-

Monte Carlo spreads them evenly, Gaussian quadrature places themstrategically. A Gaussian quadrature rule with R nodes and weights givesthe exact value of the integral if gðzÞ is a polynomial of order 2R� 1 or less.The result will not be exact if gðzÞ is not a polynomial, but since smoothfunctions can well be approximated by polynomials of sufficient order,it gives an often remarkably precise approximation even with a relativelysmall value R.

Gaussian quadrature in several dimensions is less straightforward. Thereare three approaches. The first, easiest and best-known approach is theso-called (tensor) product integration rule. It sequentially nests univariaterules. For example, the expectation of a function of two independentnormally distributed random variables

R1�1

R1�1

gðz1; z2Þfðz1Þfðz2Þ dz2 dz1can be approximated using the by the Gauss–Hermite nodes and weights asPR

r¼1

PRs¼1gðz

r; zsÞwrws. The problem of this rule is the curse of dimension-ality: if the integral is over D dimensions and the rule uses R nodes in eachdimension, gð�Þ has to be evaluated RD times which quickly results inprohibitive computational costs as D exceeds 4 or 5. These methods deliverthe exact value of the integral if gð�Þ belongs to a rather peculiar classof polynomials: If the underlying univariate rule is exact for a polynomialorder K, the maximal exponent (max½j1; j2; . . . ; jD�) showing up in anymonomial z

j11 z

j22 � � � z

jDD which make up the multivariate polynomial cannot

exceed K.The second approach to Gaussian quadrature in the multvariate setting is

based on the observation that complete multivariate polynomials of a giventotal order are a more natural class of functions. They restrict the sum ofexponents (

PDd¼1jd) of the monomials instead of the maximum, see Judd

(1998).3 Unlike univariate integration problems, multivariate Gaussianquadrature rules for complete polynomials are hard and often impossibleto derive, very problem-specific and therefore (if mentioned at all)considered impractical for applied work, see, for example, Geweke (1996).For a compendium of different such multivariate integration rules, seeCools (2003).

A third way to use Gaussian quadrature in several dimensions is sparse-grids integration (SGI). Heiss and Winschel (2008) suggest to use this

FLORIAN HEISS48

approach in the setting of econometric estimation and provide encouragingevidence for its performance. SGI combines univariate quadrature rules justas the product rule does and is therefore just as flexible and generic. But itdoes so in a more careful way such that the resulting rule delivers exactresults for polynomials of a given total order, and the number of nodes risesonly polynomially instead of exponentially. As an example, to achieveexactness for polynomials of order 5 or less for 5 and 20 dimensions,SGI requires 51 and 801 nodes, while the product rule requires 35 ¼ 243 and320 ¼ 3; 486; 784; 401 nodes, respectively. Fig. 1 presents an example of howthe nodes of a sparse grid are distributed compared to a full product rule.

The product rule uses the full grid of all combinations of nodes inall dimensions, as demonstrated in the two-dimensional case in Fig. 1(a).Univariate nodes and weights are combined to a sparse grid in a morecomplicated fashion. It consists of a combination of several sub grids eachof which has a different degree of accuracy in each dimension. If one ofthese sub grids is very fine in one dimension, it is very coarse in the otherdimensions. This is analogous to the definition of polynomials of a giventotal order which restricts the sum of the exponents – if one exponentin a monomial is very high, the others must be relatively low. The sparsegrid now combines grids which are very fine in one and coarse in otherdimensions with others which are moderately fine in each dimension.

The general strategy to extend univariate operators to multivariateproblems in this way is based on the work of Smolyak (1963) and hasgenerated a lot of interest in the field of numerical mathematics, seeBungartz and Griebel (2004). Heiss and Winschel (2008) discuss the useof SGI in econometrics, and show the exact formulas and details on howthe sparse grids are built. They also provide Matlab and Stata code forgenerating them as well as a set of readily evaluated numbers for the mostimportant cases for download at http://sparse-grids.de.

Fig. 1. Product Rule vs. Sparse Grid: Example. (a) Product Rule; (b) Sparse Grid.


http://sparse-grids.de

With those, the implementation is straightforward. The difference tosimulation boils down to calculating a weighted sum instead of anaverage. Suppose you want to evaluate the expected value of some functiongðz1; . . . ; z20Þ of independent normals such as in Eqs.(6), (8), and (12),with T ¼ 20 and a level of polynomial exactness of 5. You obtain the801� 20 matrix of nodes ½z1; . . . ; z20� and the 801� 1 vector of weights w asa download or from the mentioned software. The approximated value isthen equal to

P801r¼1gðz

r1; . . . ; z

r20Þw

r.

3.4. Efficient Importance Sampling and Adaptive Integration

Remember that the integrals in Eqs.(6), (9), and (12) are all written in theform

P ¼

ZRD

gðzÞ/ðzÞ dz (18)

with /ðzÞ denoting the joint PDF of independent standard normally distributedrandom variables and different functions gðzÞ. In the case of the CFS in Eq.(6),gðzÞ is an indicator function which is either zero or one depending on z. In theother two equations, gðzÞ are smooth functions of z which improves theproperties of all approximation and simulation approaches.

Define some parametric function kðz; hÞ which is bounded away from zeroon RD. Quite obviously, the general integral can be rewritten as

P ¼

ZRD

gðzÞ/ðzÞ

kðz; hÞkðz; hÞ dz (19)

Consider a Gaussian importance sampler kðz; hÞ ¼ jL�1j/ðL�1ðz� aÞÞ wherethe parameters h define a vector (of means) a and a nonsingular lowertriangular (Cholesky factor of a covariance) matrix L. Rewrite the integral as

P ¼

ZRD

gðzÞ/ðzÞ

/ðL�1ðz� aÞÞ/ðL�1ðz� aÞÞ dz (20)

¼

ZRD

hðx; hÞ/ðxÞ dx (21)

with hðx; hÞ ¼ jLjgðaþ LxÞ/ðaþ LxÞ

/ðxÞ(22)

FLORIAN HEISS50

Given any h, this integral conforms to the general form (18) and can benumerically approximated by any of the methods discussed above.

While the actual value of the integral by definition is the same forany value of h, the quality of numerical approximation is in generalaffected by the choice of h. The introduction of the parametersprovides the freedom of choosing values which help to improve theapproximation quality. To see this, consider the ‘‘ideal case’’ in whichthere is some h� for which hðx; h�Þ ¼ c is constant over x, that is kðz; hÞis proportional to gðzÞ/ðzÞ. In this case, the integral is solved exactlyand P ¼ c.

While this ideal case is infeasible for nontrivial problems, a goal inchoosing the parametric family of functions kðz; hÞ and the parametervector h is to come close to this ideal in some sense. Such a change ofvariables approach is quite common for univariate Gaussian quadratureand is then often called ‘‘adaptive quadrature,’’ a term used, for example,by Rabe-Hesketh et al. (2005). In the setting of simulation, an algorithmbased on this approach is called ‘‘efficient importance sampling’’ by Richardand Zhang (2007).

There are different approaches for choosing the parameters h such thata given parametric family of functions kðz; hÞ is close to proportional togðzÞ/ðzÞ. Liu and Pierce (1994) and Pinheiro and Bates (1995) choosethe parameters such that kðz; hÞ and gðzÞ/ðzÞ have the same mode and thecurvature at the mode. Note that in this case, the one-point quadraturerule with R ¼ 1 and the single node z1 ¼ ½0; . . . ; 0�0 corresponds to a Laplaceapproximation. Similarly, in the context of Bayesian analysis, Naylorand Smith (1988) and Rabe-Hesketh et al. (2005) choose the parameterssuch that kðz; hÞ has the same means and variances as the (scaled) posteriordistribution gðzÞ/ðzÞ.

The EIS algorithm by Richard and Zhang (2007) attempts tominimize (an approximation of) the variance of the simulation noiseof P. This approach has been successfully implemented mostly in high-dimensional time-series applications. Notably, Liesenfeld and Richard(2008) use it for a time-series probit example. Since it appears to bevery powerful and at the same time computationally relatively straight-forward with the Gaussian importance sampler, this strategy will beimplemented for both the simulation and the numerical integrationtechniques below.

Adaptive integration on sparse grids (ASGI) combines SGI with anintelligent reformulation of the integration problem in the spirit ofEIS. Remember that for the Gaussian kðz; hÞ, the goal is to choose h


such that /ðL�1ðz� aÞÞ is close to proportional to gðzÞ/ðzÞ. Define thedistance eðz; hÞ as

eðz; hÞ ¼ lngðzÞ/ðzÞ

/ðL�1ðz� aÞÞ

� �þ c (23)

¼ ln gðzÞ �1

2z0zþ

1

2ððz� aÞ0ðLL0Þ�1ðz� aÞÞ þ c (24)

Richard and Zhang (2007) show that the IS variance can approximately beminimized by minimizing the variance of eðhÞ over the appropriate(weighted) distribution

VðhÞ ¼

ZRD

e2ðz; hÞ gðzÞ/ðzÞ dz (25)

With wr ¼ 1=R for simulation and is the integration weight for SGI, theapproximated value is

~VðhÞ ¼XRr¼1

e2ðzr; hÞgðzÞwr (26)

The minimization of this function is a linear weighted least squaresproblem with weights gðzÞwr. The symmetric matrix B�1 :¼ ðLL0Þ�1 containselements bjk ¼ bkj . Eq. (24) can now be expressed for the nodes zr as

ln gðzrÞ �1

2zr0zr ¼ c�

1

2a0B�1a|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}a

þXDj¼1

XDk¼1

�1

2bjk

� �|fflfflfflfflfflffl{zfflfflfflfflfflffl}

bjk

zrj zrk þ z0 ðB�1aÞ|fflfflffl{zfflfflffl}

c

þeðzr; hÞ

The optimal values for B and a can therefore be found by a weighted linearleast squares regression of the ‘‘dependent variable’’ ðln gðzrÞ � 1

2zr0zrÞ on

a constant (parameter a), z (parameters c), and all squared and interactedvalues of z (parameters b). The ‘‘observations’’ correspond to the setof random draws or integration nodes fz1; . . . ; zRg. All ‘‘estimated’’ elementsof matrix B�1 can be recovered directly from b. The vector a follows as Bc.

Actually, Richard and Zhang (2007) propose to apply anotherimportance sampling reformulation to Eq.(25) in order to obtain betterestimates of the optimal B and a. The result is an iterative procedure. Similarto their findings, it does not seem to improve the results in the simulationsreported below significantly, so the one-step estimates of B and a asdescribed above are used.

FLORIAN HEISS52

4. THE PERFORMANCE OF DIFFERENT

ALGORITHMS

In order to compare the performance of the various algorithms, they areapplied to artificial data sets in this section. The data generating processis a panel probit model designed to be simple while providing the flexibilityto assess the impact of all relevant model parameters. For all periodst ¼ 1; . . .T , assume that the dependent variable yit is generated as

yit ¼ 1½b0 þ b1xit þ uit � 0� (27)

For all simulations, the regressor xit is an i.i.d. standard normal randomvariable and the parameters are b0 ¼ b1 ¼ 1. The error term is jointlynormal with a structure which represents an i.i.d. standard normal errorplus a stationary AR(1) process with a correlation of r and variance s2.These two parameters together drive the overall autocorrelation of uit. So

uit ¼ vit þ eit (28)

vit ¼ rvit�1 þ wit (29)

eit �Nð0; 1Þ; wit �Nð0;s2ð1� r2ÞÞ (30)

The crucial parameters which drive the numerical difficulty are T, s2, and rand will be varied in the simulations.

From this model, 1,000 samples are generated for different sets ofparameters. The approximated results are then compared to true values.4

The implemented algorithms differ in four aspects discussed above:

Formulation of the integration problem: Stern’s decomposition/partialanalytic simulator PA (9) vs. GHK (12) Integration approach: Pseudo-MC (PMC) vs. quasi-MC (QMC, Haltonsequences) vs. integration on sparse grids (SGI) Approximation of the original integration (RAW) vs. adaptive rescalingEIS/ASGI. By default, just one iteration is used for the iterative updatingof parameters. The accuracy level corresponding to the number of random draws/deterministic nodes.

The appendix lists pseudo code that should help to clarify details of theimplementation of the various algorithms.


The results are presented as graphs with the horizontal axis showing thenumber of replications (random draws/deterministic nodes) and the verticalaxis showing the root mean squared approximation error. The RMSE of allalgorithm should converge to zero as the number of function evaluationsincreases. In terms of computational time, the algorithms are not equallyexpensive. In the implemented code, the run time for the PA algorithm andGHK are roughly comparable with the same number of replications. Thesame holds for PMC vs. QMC vs. SGI. The adaptive rescaling takesmore time. The difference depends on the problem (T and the number ofreplications). In a representative setting with T ¼ 10 and R ¼ 1; 000, theadaptive rescaling slows down computations by a factor of roughly 5.

Fig. 2 shows the results for Stern’s PA approach without adaptiverescaling for the different integration approaches. In all cases, the PMC(‘‘Simulation’’) algorithm is clearly dominated by the QMC (‘‘Haltonsequences’’) approach. For the relatively well-behaved models with small Tand/or s, SGI is the dominant algorithm and delivers very accurate resultseven with small numbers of function evaluations. If, however, the problembecomes very ill behaved with both high T and s2 (Fig. 2(d) and (f )), thisalgorithm does not perform well. The integrand is very close to zero at mostof the support. The sparse grid does not seem to be able to capture the smallareas with positive values well.

The remaining results presented are for the more interesting case of a highvariance s2 ¼ 5, other results can be requested from the author. Fig. 3shows similar results but now for the algorithms with adaptive rescalingin the sense of EIS/ASGI. The effect of this rescaling is impressive. In thefigures with the RMSE-axis scaled such that the previous results can be seenwell (Fig. 3(a), (c), and (e)), the RMSE of the rescaling methods is in almostall cases hard to distinguish from zero. If the RMSE axis is scaled to showdifferences of the integration methods, the sparse grids algorithm (ASGI)dominates all others. Its advantage relative to EIS based on QMC usingHalton sequences becomes smaller in less well-behaved problems, but neverdisappears.

The three graphs on the left of Fig. 4 compare Stern’s PA approach withGHK. As the previous literature such as Borsch-Supan and Hajivassiliou(1993), the results show that GHK clearly dominates Stern’s PA approachfor any given integration method and parameter setting. Among theintegration approaches, sparse grids is again most effective in all cases. Thethree graphs on the right of Fig. 4, evaluate the effect of adaptive rescalingin the GHK setting. The GHK approach based on random draws andHalton sequences is much improved by adaptive rescaling/EIS. Combining

FLORIAN HEISS54

Fig. 2. Integration Approach: Stern’s PA Simulator without Rescaling.

(a) T ¼ 5; s2 ¼ :2; (b) T ¼ 5; s2 ¼ 5; (c) T ¼ 10; s2 ¼ :2; (d) T ¼ 10; s2 ¼ 5;

(e) T ¼ 20;s2 ¼ :2; (f) T ¼ 20; s2 ¼ 5.


Fig. 3. Adaptive Rescaling: Stern’s PA Simulator with Rescaling (s2 ¼ 5).

(a) T ¼ 5; (b) T ¼ 5 (Zoomed y Scale); (c) T ¼ 10; (d) T ¼ 10 (Zoomed y Scale);

(e) T ¼ 20; (f) T ¼ 20 (Zoomed y Scale).

FLORIAN HEISS56

Fig. 4. GHK vs. Stern’s PA Simulator (s2 ¼ 5). (a) T ¼ 5; (b) T ¼ 5 (Zoomed

y Scale); (c) T ¼ 10; (d) T ¼ 10 (Zoomed y Scale); (e) T ¼ 20; (f) T ¼ 20 (Zoomed

y Scale).


the three potential improvements GHK, SGI, and adaptive rescaling doesnot help a lot compared to an algorithm which uses only two of them.

Finally, Fig. 5 investigates the effect of the parameters r and s onthe accuracy of approximation. Higher values of r and s increase theintertemporal correlation and makes the integrand less well-behaved –overall, the approximation errors increase. Comparing the algorithms, in allcases, SGI dominates PMC and QMC. Adaptive rescaling improves theresults, but the magnitude of this improvement depends on the difficultyof the problem and the efficiency of the integration algorithm. It reallypays in all cases for PMC and for most cases for QMC. For SGI, theadaptive rescaling has a sizeable effect mainly in the difficult cases of highr and/or s.

5. SUMMARY

The panel probit model is a leading example for the use of maximumsimulated likelihood estimators in applied econometric research. In the late1980 and early 1990, there has been a lot of research on the best algorithmfor simulation-based evaluation of the likelihood function for this kind ofmodel and the multinomial probit model which essentially has the samestructure. The bottom line from this research was that the GHK simulatordelivers the best results among a long list of competitors in most settings.However, simulation studies have shown that in ‘‘difficult’’ cases, such aslong panels with high error term persistence, even the GHK simulator canperform poorly.

This chapter revisits the topic of approximating the outcome probabilitiesimplied by this type of models. It starts from GHK and Stern’s partiallyanalytic simulator and changes them in two ways in order to improvethe approximation performance: the use of deterministic SGI and anautomated adaptive rescaling of the integral in the spirit of EIS.

The results using simulated data show that both approaches workimpressively well. Not surprisingly, the GHK approach of formulating anumerical integration problem is more efficient than Stern’s partiallyanalytic (decomposition) approach. Another result from the literature whichalso clearly shows up in the results is that quasi-Monte Carlo simulationbased on Halton draws is much more effective than simulation based on a(pseudo-) random number generator.

One new result is that SGI clearly outperforms the other integrationapproaches in almost all simulations. As a typical example, in the model

FLORIAN HEISS58

Fig. 5. The Role of r and s: GHK, T ¼ 10. (a) s ¼ 1; r ¼ 0:1; (b) s ¼ 5; r ¼ 0:1;(c) s ¼ 1; r ¼ 0:5; (d) s ¼ 5; r ¼ 0:5; (e) s ¼ 1; r ¼ 0:9; (f) s ¼ 5; r ¼ 0:9.


with 10 longitudinal observations and a relatively high variance of thedependent error component of s2 ¼ 5, the sparse grids results with R ¼ 163function evaluations are more accurate than the Halton results withR ¼ 871 and the simulation results with R410; 000 function evaluations.

Another result is that in almost all combinations of GHK vs. Stern’sapproach and the three different integration methods, adding an adaptiverescaling step in the spirit of EIS improves the results impressively witha given number of function evaluations or – put differently – allows to save alot of computational costs with a given level of required accuracy.

In summary, common computational practice in panel and multinomialprobit models can be improved a lot using the methods presented in thischapter. This decline in computational costs can be substantial in empiricalwork and can be used to invest in more careful specification testing, morecomplex models, the ability to use larger data sets, or the exploration ofapplications which are otherwise infeasible.

NOTES

1. It might seem odd that the integral is written in terms of standard normalinstead of uniform random variables. The reason is that Eqs. (6) and (8) are alsowritten in this way and this will make a consistent discussion below easier.2. Since computers are inherently unable to generate true random numbers,

implemented simulation algorithms are often called pseudo random numbers.3. A more familiar problem from local approximation may provide an intuition:

Consider a second-order Taylor approximation in two dimensions. It involves theterms z1; z2; z1z2; z21, and z22, but not the terms z21z2, z1z

22, or z21z

22. It is a complete

polynomial of total order 2.4. True joint outcome probabilities are of course unavailable. Instead, results

from the GHK algorithm with 500,000 draws are taken as ‘‘true’’ values.

REFERENCES


multinomial logit model. Transportation Research B, 35, 677–693.


and scrambled Halton sequences. Transportation Research B, 37, 837–855.

Borsch-Supan, A., & Hajivassiliou, V. (1993). Smooth unbiased multivariate probability



Bungartz, H. J., & Griebel, M. (2004). Sparse grids. Acta Numerica, 13, 147–269.

FLORIAN HEISS60

Butler, J. S., & Moffit, R. (1982). A computationally efficient quadrature procedure for the

one-factor multinomial probit model. Econometrica, 50, 761–764.

Cools, R. (2003). An encyclopedia of cubature formulas. Journal of Complexity, 19, 445–453.

Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration.


Geweke, J. (1996). Monte Carlo simulation and numerical integration. In: H. M. Amman,

D. A. Kendrick & J. Rust (Eds),Handbook of computational economics (Vol. 1, pp. 731–800).

Amsterdam: Elsevier Science.

Geweke, J. F., Keane, M. P., & Runkle, D. E. (1997). Statistical inference in the multinomial

multiperiod probit model. Journal of Econometrics, 80, 125–165.

Gourieroux, C., & Monfort, A. (1996). Simulation-based econometric methods. Oxford: Oxford

University Press.

Greene, W. H. (2008). Econometric analysis (4th ed.). London: Prentice Hall.

Hajivassiliou, V., McFadden, D., & Ruud, P. (1996). Simulation of multivariate normal

rectangle probabilities and their derivatives: Theoretical and computational results.


Hajivassiliou, V. A., & Ruud, P. A. (1994). Classical estimation methods for LDV models using

simulation. In: RF. Engle & DL. McFadden (Eds), Handbook of econometrics (Vol. 4,

pp. 2383–2441). New York: Elsevier.

Heiss, F., & Winschel, V. (2008). Likelihood approximation by numerical integration on sparse

grids. Journal of Econometrics, 144, 62–80.

Judd, K. L. (1998). Numerical methods in economics. Cambridge, MA: MIT Press.

Keane, M. P. (1994). A computationally practical simulation estimator for panel data.


Lee, L. F. (1997). Simulated maximum likelihood estimation of dynamic discrete choice

statistical models: Some Monte Carlo results. Journal of Econometrics, 82, 1–35.

Liesenfeld, R., & Richard, J. F. (2008). Improving mcmc, using efficient importance sampling.

Computational Statistics & Data Analysis, 53, 272–288.

Liu, Q., & Pierce, D. A. (1994). A note on Gauss–Hermite quadrature. Biometrika, 81, 624–629.

McFadden, D. (1989). A method of simulated moments for estimation of discrete choice models

without numerical integration. Econometrica, 57, 995–1026.

Naylor, J., & Smith, A. (1988). Econometric illustrations of novel numerical integration

strategies for Bayesian inference. Journal of Econometrics, 38, 103–125.

Pinheiro, J. C., & Bates, D. M. (1995). Approximations to the log-likelihood function in the

nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4, 12–35.

Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of

limited and discrete dependent variable models with nested random effects. Journal of

Econometrics, 128, 301–323.

Richard, J. F., & Zhang, W. (2007). Efficient high-dimensional importance sampling. Journal of


Sloan, I. H., & Wozniakowski, H. (1998). When are quasi-Monte Carlo algorithms efficient for

high dimensional integrals? Journal of Complexity, 14, 1–33.

Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain

classes of functions. Soviet Mathematics Doklady, 4, 240–243.

Stern, S. (1992). A method for smoothing simulated moments of discrete probabilities in

multinomial probit models. Econometrica, 60, 943–952.

Train, K. (2009). Discrete choice methods with simulation (2nd ed.). Cambridge University Press.


PSEUDO-CODE

FLORIAN HEISS62

FLORIAN HEISS64

A COMPARISON OF THE

MAXIMUM SIMULATED

LIKELIHOOD AND COMPOSITE

MARGINAL LIKELIHOOD

ESTIMATION APPROACHES IN THE

CONTEXT OF THE MULTIVARIATE

ORDERED-RESPONSE MODEL

Chandra R. Bhat, Cristiano Varin and

Nazneen Ferdous

ABSTRACT

This chapter compares the performance of the maximum simulatedlikelihood (MSL) approach with the composite marginal likelihood(CML) approach in multivariate ordered-response situations. The abilityof the two approaches to recover model parameters in simulated data setsis examined, as is the efficiency of estimated parameters and computa-tional cost. Overall, the simulation results demonstrate the ability of theCML approach to recover the parameters very well in a 5–6 dimensionalordered-response choice model context. In addition, the CML recovers





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)0000026007

65

dx.doi.org/10.1108/S0731-9053(2010)0000026007

parameters as well as the MSL estimation approach in the simulationcontexts used in this study, while also doing so at a substantially reducedcomputational cost. Further, any reduction in the efficiency of the CMLapproach relative to the MSL approach is in the range of nonexistent tosmall. When taken together with its conceptual and implementationsimplicity, the CML approach appears to be a promising approachfor the estimation of not only the multivariate ordered-response modelconsidered here, but also for other analytically intractable econometricmodels.

1. INTRODUCTION

Ordered-response model systems are used when analyzing ordinal discreteoutcome data that may be considered as manifestations of an underlyingscale that is endowed with a natural ordering. Examples include ratings data(of consumer products, bonds, credit evaluation, movies, etc.), or Likert-scale type attitudinal/opinion data (of air pollution levels, traffic congestionlevels, school academic curriculum satisfaction levels, teacher evaluations,etc.), or grouped data (such as bracketed income data in surveys ordiscretized rainfall data), or count data (such as the number of trips made bya household, the number of episodes of physical activity pursued by anindividual, and the number of cars owned by a household). In all of thesesituations, the observed outcome data may be considered as censored(or coarse) measurements of an underlying latent continuous randomvariable. The censoring mechanism is usually characterized as a partitioningor thresholding of the latent continuous variable into mutually exclusive(nonoverlapping) intervals. The reader is referred to McKelvey and Zavoina(1971) and Winship and Mare (1984) for some early expositions of theordered-response model formulation, and Liu and Agresti (2005) for asurvey of recent developments. The reader is also referred to a forthcomingbook by Greene and Hensher (2010) for a comprehensive history andtreatment of the ordered-response model structure. These recent reviewsindicate the abundance of applications of the ordered-response model in thesociological, biological, marketing, and transportation sciences, and the listof applications only continues to grow rapidly.

While the applications of the ordered-response model are quite wide-spread, much of these are confined to the analysis of a single outcome,with a sprinkling of applications associated with two and three correlated

CHANDRA R. BHAT ET AL.66

ordered-response outcomes. Some very recent studies of two correlatedordered-response outcomes include Scotti (2006), Mitchell and Weale(2007), Scott and Axhausen (2006), and LaMondia and Bhat (2009).1 Thestudy by Scott and Kanaroglou (2002) represents an example of threecorrelated ordered-response outcomes. But the examination of more thantwo to three correlated outcomes is rare, mainly because the extension to anarbitrary number of correlated ordered-response outcomes entails, in theusual likelihood function approach, integration of dimensionality equalto the number of outcomes. On the other hand, there are many instanceswhen interest may be centered around analyzing several ordered-responseoutcomes simultaneously, such as in the case of the number of episodes ofeach of several activities, or satisfaction levels associated with a related set ofproducts/services, or multiple ratings measures regarding the state of healthof an individual/organization (we will refer to such outcomes as cross-sectional multivariate ordered-response outcomes). There are also instanceswhen the analyst may want to analyze time-series or panel data of ordered-response outcomes over time, and allow flexible forms of error correlationsover these outcomes. For example, the focus of analysis may be to examinerainfall levels (measured in grouped categories) over time in each ofseveral spatial regions, or individual stop-making behavior over multipledays in a week, or individual headache severity levels at different points intime (we will refer to such outcomes as panel multivariate ordered-responseoutcomes).

In the analysis of cross-sectional and panel ordered-response systemswith more than three outcomes, the norm until very recently has been toapply numerical simulation techniques based on a maximum simulatedlikelihood (MSL) approach or a Bayesian inference approach. However,such simulation-based approaches become impractical in terms of computa-tional time, or even infeasible, as the number of ordered-response outcomesincreases. Even if feasible, the numerical simulation methods do getimprecise as the number of outcomes increase, leading to convergenceproblems during estimation. As a consequence, another approach that hasseen some (though very limited) use recently is the composite marginallikelihood (CML) approach. This is an estimation technique that is gainingsubstantial attention in the statistics field, though there has been relativelylittle coverage of this method in econometrics and other fields. The CMLmethod, which belongs to the more general class of composite likelihoodfunction approaches, is based on forming a surrogate likelihood functionthat compounds much easier-to-compute, lower-dimensional, marginallikelihoods. The CML method is easy to implement and has the advantage

MSL and CML Estimation Approaches 67

of reproducibility of results. Under usual regularity assumptions, the CMLestimator is consistent and asymptotically normally distributed. Themaximum CML estimator should lose some efficiency from a theoreticalperspective relative to a full likelihood estimator, but this efficiency lossappears to be empirically minimal (see Zhao & Joe, 2005; Lele, 2006; Joe &Lee, 2009).2 Besides, the simulation estimation methods for evaluating theanalytically intractable likelihood function also leads to a loss in estimatorefficiency.

The objective of this chapter is on introducing the CML inferenceapproach to estimate general panel models of ordered-response. We alsocompare the performance of the MSL approach with the CML approach inordered-response situations when the MSL approach is feasible. We usesimulated data sets with known underlying model parameters toevaluate the two estimation approaches. The ability of the two approachesto recover model parameters is examined, as is the sampling variance andthe simulation variance of parameters in the MSL approach relative to thesampling variance in the CML approach. The computational costs of thetwo approaches are also presented.

The rest of this chapter is structured as follows. In the next section,we present the structures of the cross-sectional and panel multivariateordered-response systems. Section 3 discusses the simulation estimationmethods (with an emphasis on the MSL approach) and the CML estimationapproach. Section 4 presents the experimental design for the simula-tion experiments, while Section 5 discusses the results. Section 6 concludesthe chapter by highlighting the important findings.

2. THE MULTIVARIATE ORDERED-RESPONSE

SYSTEM

2.1. The Cross-Sectional Multivariate Ordered-ResponseProbit (CMOP) Formulation

Let q be an index for individuals (q ¼ 1, 2,y,Q, where Q denotes the totalnumber of individuals in the data set), and let i be an index for the ordered-response variable (i ¼ 1, 2,y, I, where I denotes the total number ofordered-response variables for each individual). Let the observed discrete(ordinal) level for individual q and variable i be mqi (mqi may take one of Ki

values; i.e., mqiA{1, 2,y,Ki} for variable i). In the usual ordered-response


framework notation, we write the latent propensity ( y�qi) for each ordered-response variable as a function of relevant covariates and relate this latentpropensity to the observed discrete level mqi through threshold bounds(see McKelvey & Zavoina, 1975):

y�qi ¼ b0ixqi þ �qi; yqi ¼ mqi if ymqi�1i oy�qioymqi

i , (1)

where xqi is a (L� 1) vector of exogenous variables (not including aconstant), bi is a corresponding (L� 1) vector of coefficients to be estimated,�qi is a standard normal error term, and ymqi

i is the upper bound threshold fordiscrete level mqi of variable i (y

0i oy1i oy2i . . .oyKi�1

i oyKi

i ; y0i ¼ �1; yKi

i ¼

þ1 for each variable i). The �qi terms are assumed independentand identical across individuals (for each and all i). For identificationreasons, the variance of each �qi term is normalized to 1. However, we allowcorrelation in the �qi terms across variables i for each individual q.Specifically, we define �q ¼ ð�q1; �q2; �q3; . . . ; �qI Þ

0. Then, �q is multivariatenormal distributed with a mean vector of zeros and a correlation matrixas follows:

�q � N

0

0

..

.

0

0BBBB@

1CCCCA;

1 r12 r13 � � � r1Ir21 1 r23 � � � r2I

..

. ... ..

. . .. ..

.

rI1 rI2 rI3 � � � 1

0BBBBB@

1CCCCCA

2666664

3777775; or

�q � N½0;R�

(2)

The off-diagonal terms of R capture the error covariance across theunderlying latent continuous variables; that is, they capture the effects ofcommon unobserved factors influencing the underlying latent propensities.These are the so-called polychoric correlations between pairs of observedordered-response variables. Of course, if all the correlation parameters(i.e., off-diagonal elements of R), which we will stack into a vertical vectorO, are identically zero, the model system in Eq. (1) collapses to independentordered-response probit models for each variable. Note that the diagonalelements of R are normalized to one for identification purposes.

The parameter vector (to be estimated) of the cross-sectional multi-variate probit model is d ¼ ðb01; b

02; . . . ; b

0I ; y01; y

02; . . . ; y

0I ; O0Þ0; where yi ¼

ðy1i ; y2i ; . . . ; y

Ki�1i Þ

0 for i ¼ 1; 2; . . . ; I . The likelihood function for individual


q may be written as follows:

LqðdÞ ¼ Prðyq1 ¼ mq1; yq2 ¼ mq2; . . . ; yqI ¼ mqI Þ (3)

LqðdÞ ¼Z y

mq11�b01xq1

v1¼ymq1�1

1�b01xq1

Z ymq22�b02xq2

v2¼ymq2�1

2�b02xq2

� � �

�

Z ymqII�b0I xqI

vI¼ymqI�1

I�b0I xqI

fI ðv1; v2; . . . ; vI jOÞ dv1dv2 . . . dvI ,

where fI is the standard multivariate normal density function ofdimension I. The likelihood function above involves an I-dimensionalintegral for each individual q.

2.2. The Panel Multivariate Ordered-ResponseProbit (PMOP) Formulation

Let q be an index for individuals as earlier (q ¼ 1, 2,y,Q), but let j now bean index for the jth observation (say at time tqi) on individual q( j ¼ 1,2,y, J, where J denotes the total number of observations on individual q).3

Let the observed discrete (ordinal) level for individual q at the jthobservation be mqj (mqj may take one of K values; i.e., mqiA{1, 2,y,K}).In the usual random-effects ordered-response framework notation, we writethe latent variable (y�qj) as a function of relevant covariates as:

y�qj ¼ b0xqj þ uq þ �qj ; yqj ¼ mqj if ymqj�1oy�qjoymqj , (4)

where xqj is a (L� 1) vector of exogenous variables (not including aconstant), b is a corresponding (L� 1) vector of coefficients to be estimated,�qj is a standard normal error term uncorrelated across observations jfor individual q and also uncorrelated across individuals q, and ymqj isthe upper bound threshold for discrete level mqj (y0oy1oy2 . . .oyK�1

oyK ; y0 ¼ �1; yK ¼ þ1). The term uq represents an individual-specificrandom term, assumed to be normally distributed with mean zero andvariance s2. The term uq is independent of uq0 for qaq0. The net result of thespecification above is that the joint distribution of the latent variablesðy�q1; y

�q2; . . . y

�qJÞ for the qth subject is multivariate normal with standardized

mean vector ðb0xq1=m; b0xq2=m; . . . b

0xqJ=mÞ and a correlation matrix withconstant nondiagonal entries s2=m2, where m ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffi1þ s2p

.The standard random-effects ordered-response model of Eq. (4) allows

easy estimation, since one can write the probability of the sequence of


observed ordinal responses across the multiple observations on the sameindividual, conditional on uq, as the product of standard ordered-responsemodel probabilities, and then integrate the resulting probability over therange of normally distributed uq values for each individual. This results inonly a one-dimensional integral for each individual, which can be easilycomputed using numerical quadrature methods. However, the assumptionof equal correlation across the multiple observations on the same individualis questionable, especially for medium-to-long individual-specific series.An alternative would be to allow serial correlation within each subject-specific series of observations, as proposed by Varin and Czado (2010).For instance, one may adopt an autoregressive structure of order one for theerror terms of the same individual so that corrð�qj ; �qkÞ ¼ rjtqj�tqkj, where tqj isthe measurement time of observation yqj .

4 The autoregressive error structurespecification results in a joint multivariate distribution of the latent variablesðy�q1; y

�q2; . . . y

�qJÞ for the qth individual with standardized mean vector

ðb0xq1=m;b0xq2=m; . . . b

0xqJ=mÞ and a correlation matrix Rq with entries suchthat corrðy�qj ; y

�qgÞ ¼ ðs

2 þ r tqj�tqgj jÞ=m2, where m ¼ffiffiffiffiffiffiffiffiffiffiffiffiffi1þ s2p

. The cost of theflexibility is paid dearly though in terms of computational difficulty in thelikelihood estimator. Specifically, rather than a single dimension ofintegration, we now have an integral of dimension J for individual q. Theparameter vector (to be estimated) of the panel multivariate probit model isd ¼ ðb0; y1; y2; . . . ; yK�1; s;rÞ0, and the likelihood for individual q becomes:

LqðdÞ ¼ Prðyq1 ¼ mq1; yq2 ¼ mq2; . . . ; yqJ ¼ mqJÞ

LqðdÞ ¼Z amq1

v1¼amq1�1

Z amq2

v2¼amq2�1

. . .

Z amqJ

vJ¼amqJ�1

fJðv1; v2; . . . ; vJ jRqÞ dv1dv2 . . . dvJ ð5Þ

where amqj ¼ ðymqj � b0xqjÞ=m.The likelihood function above entails a J-dimensional integral for each

individual q. The above model is labeled as a mixed autoregressive ordinalprobit model by Varin and Czado (2010).

3. OVERVIEW OF ESTIMATION APPROACHES

As indicated in Section 1, models that require integration of more than threedimensions in a multivariate ordered-response model are typically estimatedusing simulation approaches, though some recent studies have considereda CML approach. Sections 3.1 and 3.2 provide an overview of each of thesetwo approaches in turn.


3.1. Simulation Approaches

Two broad simulation approaches may be identified in the literature formultivariate ordered-response modeling. One is based on a frequentistapproach, while the other is based on a Bayesian approach. We provide anoverview of these two approaches in the next two sections (Sections 3.1.1and 3.1.2), and then (in Section 3.1.3) discuss the specific simulationapproaches used in this chapter for estimation of the multivariate ordered-response model systems.

3.1.1. The Frequentist ApproachIn the context of a frequentist approach, Bhat and Srinivasan (2005)suggested a MSL method for evaluating the multi-dimensional integral in across-sectional multivariate ordered-response model system, using quasi-Monte Carlo simulation methods proposed by Bhat (2001, 2003). In theirapproach, Bhat and Srinivasan (BS) partition the overall error term intoone component that is independent across dimensions and another mixingcomponent that generates the correlation across dimensions. The estimationproceeds by conditioning on the error components that cause correlationeffects, writing the resulting conditional joint probability of the observedordinal levels across the many dimensions for each individual, and thenintegrating out the mixing correlated error components. An important issueis to ensure that the covariance matrix of the mixing error terms remains in acorrelation form (for identification reasons) and is positive-definite, whichBS maintain by writing the likelihood function in terms of the elements ofthe Cholesky-decomposed matrix of the correlation matrix of the mixingnormally distributed elements and parameterizing the diagonal elements ofthe Cholesky matrix to guarantee unit values along the diagonal. Anotheralternative and related MSL method would be to consider the correlationacross error terms directly without partitioning the error terms into twocomponents. This corresponds to the formulation in Eqs. (1) and (2) of thischapter. Balia and Jones (2008) adopt such a formulation in theireight-dimensional multivariate probit model of lifestyles, morbidity, andmortality. They estimate their model using a Geweke–Hajivassiliou–Keane (GHK) simulator (the GHK simulator is discussed in more detaillater in this chapter). However, it is not clear how they accommodatedthe identification sufficiency condition that the covariance matrix be acorrelation matrix and be positive-definite. But one can use the GHKsimulator combined with BS’s approach to ensure unit elements alongthe diagonal of the covariance matrix. Yet another MSL method to


approximate the multivariate rectangular (i.e., truncated) normal probabil-ities in the likelihood functions of Eq. (3) and (5) is based on the Genz–Bretz(GB) algorithm (also discussed in more detail later). In concept, all theseMSL methods can be extended to any number of correlated ordered-response outcomes, but numerical stability, convergence, and precisionproblems start surfacing as the number of dimensions increase.

3.1.2. The Bayesian ApproachChen and Dey (2000), Herriges, Phaneuf, and Tobias (2008), Jeliazkov,Graves, and Kutzbach (2008), and Hasegawa (2010) have considered analternate estimation approach for the multivariate ordered-response systembased on the posterior mode in an objective Bayesian approach. As in thefrequentist case, a particular challenge in the Bayesian approach is to ensurethat the covariance matrix of the parameters is in a correlation form, whichis a sufficient condition for identification. Chen and Dey proposed areparametization technique that involves a rescaling of the latent variablesfor each ordered-response variable by the reciprocal of the largest unknownthreshold. Such an approach leads to an unrestricted covariance matrix of therescaled latent variables, allowing for the use of standard Markov ChainMonte Carlo (MCMC) techniques for estimation. In particular, the Bayesianapproach is based on assuming prior distributions on the nonthresholdparameters, reparameterizing the threshold parameters, imposing a standardconjugate prior on the reparameterized version of the error covariance matrixand a flat prior on the transformed threshold, obtaining an augmentedposterior density using Baye’s Theorem for the reparameterized model, andfitting the model using a MCMC method. Unfortunately, the methodremains cumbersome, requires extensive simulation, and is time-consuming.Further, convergence assessment becomes difficult as the number ofdimensions increase. For example, Muller and Czado (2005) used a Bayesianapproach for their panel multivariate ordered-response model, and foundthat the standard MCMC method exhibits bad convergence properties. Theyproposed a more sophisticated group move multigrid MCMC technique, butthis only adds to the already cumbersome nature of the simulation approach.In this regard, both the MSL and the Bayesian approach are ‘‘brute force’’simulation techniques that are not very straightforward to implement andcan create convergence assessment problems.

3.1.3. Simulators Used in this ChapterIn this chapter, we use the frequentist approach to compare simulationapproaches with the CML approach. Frequentist approaches are widely


used in the literature, and are included in several software programs that arereadily available. Within the frequentist approach, we test two MSLmethods with the CML approach, just to have a comparison of more thanone MSL method with the CML approach. The two MSL methods we selectare among the most effective simulators for evaluating multivariate normalprobabilities. Specifically, we consider the GHK simulator for the CMOPmodel estimation in Eq. (3), and the GB simulator for the PMOP modelestimation in Eq. (5).

3.1.3.1. The Geweke–Hajivassiliou–Keane Probability Simulator for theCMOP Model. The GHK is perhaps the most widely used probabilitysimulator for integration of the multivariate normal density function, and isparticularly well known in the context of the estimation of the multivariateunordered probit model. It is named after Geweke (1991), Hajivassiliou(Hajivassiliou & McFadden, 1998), and Keane (1990, 1994). Train (2003)provides an excellent and concise description of the GHK simulator in thecontext of the multivariate unordered probit model. In this chapter, we adaptthe GHK simulator to the case of the multivariate ordered-responseprobit model.

The GHK simulator is based on directly approximating the probabilityof a multivariate rectangular region of the multivariate normal densitydistribution. To apply the simulator, we first write the likelihood function inEq. (3) as follows:

LqðdÞ ¼ Prðyq1 ¼ mq1Þ Prðyq2 ¼ mq2jyq1 ¼ mq1Þ Prðyq3 ¼ mq3jyq1 ¼ mq1; yq2

¼ mq2Þ . . .PrðyqI ¼ mqI jyq1 ¼ mq1; yq2 ¼ mq2; . . . ; yqI�1 ¼ mqI�1Þ ð6Þ

Also, write the error terms in Eq. (2) as:

�q1

�q2

..

.

�qI

26666664

37777775¼

l11 0 0 � � � 0

l21 l22 0 � � � 0

..

. ... ..

. . .. ..

.

lI1 lI2 lI3 � � � lII

26666664

37777775

vq1

vq2

..

.

vqI

26666664

37777775

�q ¼ Lvq ð7Þ

where L is the lower triangular Cholesky decomposition of the correlationmatrix R, and vq terms are independent and identically distributed standardnormal deviates (i.e., vqBN[0, II]). Each (unconditional/conditional)


probability term in Eq. (7) can be written as follows:

Prðyq1 ¼mq1Þ ¼ Prymq1�1

1 � b0

1xq1

l11ovq1o

ymq1

1 � b0

1xq1

l11

!

Prðyq2 ¼mq2jyq1 ¼mq1Þ

¼ Prymq2�1

2 � b0

2xq2 � l21vq1

l22ovq2o

ymq2

2 � b0

2xq2 � l21vq1

l22

ymq1�1

1 � b0

1xq1

l11

��

ovq1oymq1

1 � b0

1xq1

l11

!

Prðyq3 ¼mq3jyq1 ¼mq1;yq2 ¼mq2Þ

¼ Pr

ymq3�1

3 � b0

3xq3 � l31vq1 � l32vq2

l33ovq3o

ymq3

3 � b0

3xq3 � l31vq1 � l32vq2

l33

ymq1�1

1 � b0

1xq1

l11ovq1o

ymq1

1 � b0

1xq1

l11

�� ;

ymq2�1

2 � b0

2xq2 � l21vq1

l22ovq2o

ymq2

2 � b0

2xq2� l21vq1

l22

0BBBBBBBBBB@

1CCCCCCCCCCA

..

.

PrðyqI ¼mqI jyq1 ¼mq1;yq2 ¼mq2; . . . ;yqI�1 ¼mqI�1Þ

¼ Pr

ymqI�1I � b

0

I xqI � lI1vq1� lI2vq2 � � � � � lIðI�1ÞvqðI�1Þ

lII

ovqIoymqI

I � b0

I xqI � lI1vq1� lI2vq2 � � � � � lIðI�1ÞvqðI�1Þ

lII

ymq1�1

1 � b0

1xq1

l11ovq1o

ymq1

1 � b0

1xq1

l11

�� ;ymq2�1

2 � b0

2xq2� l21vq1

l22

ovq2oymq2

2 � b0

2xq2� l21vq1

l22; � � � ;

ymqðI�1Þ�1

I�1 � b0

I�1xqI�1� lðI�1Þ1vq1 � lðI�1Þ2vq2 � � � � � lðI�1ÞðI�2ÞvqðI�2Þ

lðI�1ÞðI�1ÞovqðI�1Þ

oymqðI�1Þ

I�1 � b0

I�1xqI�1 � lðI�1Þ1vq1 � lðI�1Þ2vq2 � � � � � lðI�1ÞðI�2ÞvqðI�2Þ

lðI�1ÞðI�1Þ

0BBBBBBBBBBBBBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCCCCCCCCCCCCA

ð8Þ

The error terms vqi are drawn d times (d¼ 1, 2,y,D) from the univariatestandard normal distribution with the lower and upper bounds as above.To be precise, we use a randomized Halton draw procedure to generate the drealizations of vqi, where we first generate standard Halton draw sequences


of size D� 1 for each individual for each dimension i (i¼ 1, 2,y, I), andthen randomly shift the D� 1 integration nodes using a random draw fromthe uniform distribution (see Bhat, 2001, 2003 for a detailed discussionof the use of Halton sequences for discrete choice models). These randomshifts are employed because we generate 10 different randomized Haltonsequences of size D� 1 to compute simulation error. Gauss codeimplementing the Halton draw procedure is available for download fromthe home page of Chandra Bhat at http://www.caee.utexas.edu/prof/bhat/halton.html. For each randomized Halton sequence, the uniform deviatesare translated to truncated draws from the normal distribution for vqithat respect the lower and upper truncation points (see, e.g., Train, 2003,p. 210). An unbiased estimator of the likelihood function for individual q isobtained as:

LGHK;qðdÞ ¼1

D

XDd¼1

LdqðdÞ (9)

where Ldq ðdÞ is an estimate of Eq. (6) for simulation draw d. A consistent

and asymptotically normal distributed GHK estimator dGHK is obtainedby maximizing the logarithm of the simulated likelihood functionLGHKðdÞ ¼

Qq LGHK;qðdÞ. The covariance matrix of parameters is estimated

using the inverse of the sandwich information matrix (i.e., using the robustasymptotic covariance matrix estimator associated with quasi-maximumlikelihood (ML); see McFadden & Train, 2000).

The likelihood function (and hence, the log-likelihood function)mentioned above is parameterized with respect to the parameters of theCholesky decomposition matrix L rather than the parameters of the originalcovariance parameter R. This ensures the positive-definiteness of R, but alsoraises two new issues: (1) the parameters of the Cholesky matrix L should besuch that R should be a correlation matrix and (2) the estimated parametervalues (and asymptotic covariance matrix) do not correspond to R, but to L.The first issue is overcome by parameterizing the diagonal terms of L asshown below (see Bhat & Srinivasan, 2005):

L ¼

1 0 0 � � � 0

l21

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� l221

q0 � � � 0

..

. ... ..

. . .. ..

.

lI1 lI2 lI3 � � �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1� l2I1 � l2I2 � � � � l2IðI�1Þ

q

26666664

37777775

(10)


http://www.caee.utexas.edu/prof/bhat/halton.html

http://www.caee.utexas.edu/prof/bhat/halton.html

The second issue is easily resolved by estimating R from the convergentvalues of the Cholesky decomposition parameters (R ¼ LLu), and thenrunning the parameter estimation procedure one more time with thelikelihood function parameterized with the terms of R.

3.1.3.2. The GB Simulator for the PMOP Model. An alternativesimulation-based approximation of multivariate normal probabilities isprovided by the GB algorithm (Genz & Bretz, 1999). At the first step,this method transforms the original hyper-rectangle integral region to anintegral over a unit hypercube, as described in Genz (1992). Thetransformed integral region is filled in by randomized lattice rules using anumber of points depending on the integral dimension and the desiredprecision. Robust integration error bounds are then derived by means ofadditional shifts of the integration nodes in random directions (this issimilar to the generation of randomized Halton sequences, as described inBhat (2003), but with randomized lattice points rather than Halton points).The additional random shifts are employed to compute simulation errorsusing 10 sets of randomized lattice points for each individual. The interestedreader is referred to Genz (2003) for details.

More recently, Genz’s algorithm has been further developed byGenz and Bretz (2002). Fortran and Matlab code implementing the GBalgorithm is available for download from the home page of Alan Genzat http://www.math.wsu.edu/faculty/genz/homepage. Furthermore, theFortran code has been included in an R (R Development Core Team,2009) package called mvtnorm freely available from the repositoryhttp://cran.r-project.org/. For a brief description of package mvtnorm, seeHothorn, Bretz, and Genz (2001) and Mi, Miwa, and Hothorn (2009).Technically, the algorithm allows for computation of integrals up to 1,000dimensions. However, the computational cost for reliable integral approx-imations explodes with the raising of the integral dimension, makingthe use of this algorithm impractical for likelihood inference except forlow-dimensions.

In the PMOP model, a positive-definite correlation matrix Rq shouldresult as long as s40 and 0oro1. The GB approach implemented in the Rroutine is based on a check to ensure these conditions hold. If they donot hold (i.e., the Berndt, Hall, Hall, and Hausman (BHHH) algorithmimplemented in the R routine is trying to go outside the allowed parameterspace), the algorithm reduces the ‘‘Newton–Raphson step’’ by half size toreturn the search direction within the parameter space.


http://www.math.wsu.edu/faculty/genz/homepage

http://cran.r-project.org/

3.2. The Composite Marginal Likelihood Technique – The PairwiseMarginal Likelihood Inference Approach

The CML estimation approach is a relatively simple approach that can beused when the full likelihood function is near impossible or plain infeasibleto evaluate due to the underlying complex dependencies. For instance, ina recent application, Varin and Czado (2010) examined the headachepain intensity of patients over several consecutive days. In this study,a full information likelihood estimator would have entailed as many as815 dimensions of integration to obtain individual-specific likelihoodcontributions, an infeasible proposition using the computer-intensivesimulation techniques. As importantly, the accuracy of simulationtechniques is known to degrade rapidly at medium-to-high dimensions,and the simulation noise increases substantially. This leads to convergenceproblems during estimation. In contrast, the CML method, which belongsto the more general class of composite likelihood function approaches(see Lindsay, 1988), is based on forming a surrogate likelihood functionthat compounds much easier-to-compute, lower-dimensional, marginallikelihoods. The CML approach can be applied using simple optimizationsoftware for likelihood estimation. It also represents a conceptually andpedagogically simpler simulation-free procedure relative to simulationtechniques, and has the advantage of reproducibility of the results.Finally, as indicated by Varin and Vidoni (2009), it is possible that the‘‘maximum CML estimator can be consistent when the ordinary fulllikelihood estimator is not’’. This is because the CML procedures aretypically more robust and can represent the underlying low-dimensionalprocess of interest more accurately than the low-dimensional processimplied by an assumed (and imperfect) high-dimensional multivariatemodel.

The simplest CML, formed by assuming independence across the latentvariables underlying the ordinal outcome variables (in the context ofour chapter), entails the product of univariate probabilities for eachvariable. However, this approach does not provide estimates of correla-tion that are of interest in a multivariate context. Another approach isthe pairwise likelihood function formed by the product of likelihoodcontributions of all or a selected subset of couplets (i.e., pairs ofvariables or pairs of observations). Almost all earlier research effortsemploying the CML technique have used the pairwise approach, including


Apanasovich et al. (2008), Bellio and Varin (2005), de Leon (2005),Varin and Vidoni (2009), Varin, Host, and Skare (2005), and Engle,Shephard, and Sheppard (2007). Alternatively, the analyst can also considerlarger subsets of observations, such as triplets or quadruplets or even higher-dimensional subsets (see Engler, Mohapatra, Louis, & Betensky, 2006;Caragea & Smith, 2007). In general, the issue of whether to use pairwiselikelihoods or higher-dimensional likelihoods remains an open, andunder-researched, area of research. However, it is generally agreed thatthe pairwise approach is a good balance between statistical and computationefficiency.

The properties of the CML estimator may be derived using thetheory of estimating equations (see Cox & Reid, 2004). Specifically, underusual regularity assumptions (Molenberghs & Verbeke, 2005, p. 191),the CML estimator is consistent and asymptotically normal distributed(this is because of the unbiasedness of the CML score function, whichis a linear combination of proper score functions associated with themarginal event probabilities forming the composite likelihood).5 Of course,the maximum CML estimator loses some asymptotic efficiency from atheoretical perspective relative to a full likelihood estimator (Lindsay,1988; Zhao & Joe, 2005). On the other hand, there is also a loss inasymptotic efficiency in the MSL estimator relative to a full likeli-hood estimator (see McFadden & Train, 2000). Given the full likelihoodestimator has to be approximated using simulation techniques in amultivariate ordered-response system of dimensionality more than 3, it isof interest to compare the MSL and CML estimators in terms of asymptoticefficiency.

Earlier applications of the CML approach (and specifically the pairwiselikelihood approach) to multivariate ordered-response systems include deLeon (2005) and Ferdous, Eluru, Bhat, and Meloni (2010) in the context ofCMOP systems, and Varin and Vidoni (2006) and Varin and Czado (2010)in the context of panel multivariate ordered-response probit (PMOP)systems. Bhat, Sener, and Eluru (2010) also use a CML approach toestimate their multivariate ordered-response probit system in the contextof a spatially dependent ordered-response outcome variable. In this study,we do not use the high multivariate dimensionality of most of these earlierstudies. Rather, we consider relatively lower multivariate dimensionalitysimulation situations, so that we are able to estimate the models using MSLtechniques too.


3.2.1. Pairwise Likelihood Approach for the CMOP ModelThe pairwise marginal likelihood function for individual q may be writtenfor the CMOP model as follows:

LCMOPCML;qðdÞ ¼

YI�1i¼1

YIg¼iþ1

Prðyqi ¼ mqi; yqg ¼ mqgÞ

¼YI�1i¼1

YIg¼iþ1

F2 ymqi

i � b0ixqi; ymqg

g � b0gxqg; rig� �

�F2 ymqi

i � b0ixqi; ymqg�1g � b0gxqg;rig

� �

�F2 ymqi�1i � b0ixqi; y

mqg

g � b0gxqg;rig� �

þF2 ymqi�1i � b0ixqi; y

mqg�1g � b0gxqg;rig

� �

266666666664

377777777775

ð11Þ

where F2ð:; :; rigÞ is the standard bivariate normal cumulative distributionfunction with correlation rig. The pairwise marginal likelihood function isLCMOPCML ðdÞ ¼

QqL

CMOPCML;qðdÞ.

The pairwise estimator dCML obtained by maximizing the logarithm ofthe pairwise marginal likelihood function with respect to the vector d isconsistent and asymptotically normal distributed with asymptotic mean dand covariance matrix given by the inverse of Godambe’s (1960) sandwichinformation matrix GðdÞ (see Zhao & Joe, 2005):

VCMLðdÞ ¼ ½GðdÞ��1 ¼ ½HðdÞ��1JðdÞ½HðdÞ��1; where

HðdÞ ¼ E �@2 logLCMOP

CML ðdÞ@d@d0

� �and

JðdÞ ¼ E@ logLCMOP

CML ðdÞ@d

� �@ logLCMOP

CML ðdÞ@d0

� �� ð12Þ

HðdÞ and JðdÞ can be estimated in a straightforward manner at the CMLestimate (dCML):

HðdÞ ¼ �XQq¼1

@2 logLCMOPCML;qðdÞ

@d@d0

" #

d

¼ �XQq¼1

XI�1i¼1

XIg¼iþ1

@2 log Prðyqi ¼ mqi; yqg ¼ mqgÞ

@d@d0

" #

d

; and

JðdÞ ¼XQq¼1

@ logLCMOPCML;qðdÞ

@d

!@ logLCMOP

CML;qðdÞ

@d0

!" #

d

ð13Þ


In general, and as confirmed later in the simulation study, we expectthat the ability to recover and pin down the parameters will be a littlemore difficult for the correlation parameters in R (when the correla-tions are low) than for the slope and threshold parameters, becausethe correlation parameters enter more nonlinearly in the likelihoodfunction.

3.2.2. Pairwise Likelihood Approach for the PMOP ModelThe pairwise marginal likelihood function for individual q may be writtenfor the PMOP model as follows:

LPMOPCML;qðdÞ ¼

YJ�1j¼1

YJg¼jþ1

Prðyqj ¼ mqj ; yqg ¼ mqgÞ

¼YJ�1j¼1

YJg¼jþ1

F2 amqj ; amqg ;rjg

� F2 amqj ; amqg�1 ;rjg

�F2 amqj�1; amqg ; rjg

þ F2 amqj�1; amqg�1;rjg

24

35 ð14Þ

where amqj ¼ ðymqj � b0xqjÞ=m; m ¼ffiffiffiffiffiffiffiffiffiffiffiffiffi1þ s2p

, and rjg ¼ ðs2 þ r tqj�tqgj jÞ=m2.

The pairwise marginal likelihood function is LPMOPCML ðdÞ ¼

QqL

PMOPCML;qðdÞ.

The pairwise estimator dCML obtained by maximizing the logarithmof the pairwise marginal likelihood function with respect to the vector d isconsistent and asymptotically normal distributed with asymptotic mean d.The covariance matrix of the estimator may be computed in a fashionsimilar to that for the CMOP case, with LCMOP

CML;qðdÞ being replaced byLPMOPCML;qðdÞ.As in the CMOP case, we expect that the ability to recover and pin down

the parameters will be a little more difficult for the correlation parameter r(when r is low) than for the slope and threshold parameters.

3.2.3. Positive-Definiteness of the Implied Multivariate Correlation MatrixA point that we have not discussed thus far in the CML approach is how toensure the positive-definiteness of the symmetric correlation matrix R (in theCMOP model) and Rq (in the PMOP model). This is particularly an issue forR in the CMOP model, so we will discuss this mainly in the context of theCMOP model. Maintaining a positive-definite matrix for Rq in the PMOPmodel is relatively easy, so we only briefly discuss the PMOP case towardthe end of this section.

There are three ways that one can ensure the positive-definiteness of the Rmatrix. The first technique is to use Bhat and Srinivasan’s technique ofreparameterizing R through the Cholesky matrix, and then using these


Cholesky-decomposed parameters as the ones to be estimated. Within theoptimization procedure, one would then reconstruct the R matrix, and then‘‘pick off ’’ the appropriate elements of this matrix for the rig estimatesat each iteration. This is probably the most straightforward and cleantechnique. The second technique is to undertake the estimation witha constrained optimization routine by requiring that the implied multi-variate correlation matrix for any set of pairwise correlation estimates bepositive-definite. However, such a constrained routine can be extremelycumbersome. The third technique is to use an unconstrained optimizationroutine, but check for positive-definiteness of the implied multivariatecorrelation matrix. The easiest method within this third technique is toallow the estimation to proceed without checking for positive-definitenessat intermediate iterations, but check that the implied multivariatecorrelation matrix at the final converged pairwise marginal likelihoodestimates is positive-definite. This will typically work for the case of amultivariate ordered-response model if one specifies exclusion restrictions(i.e., zero correlations between some error terms) or correlation patternsthat involve a lower dimension of effective parameters (such as in the PMOPmodel in this chapter). Also, the number of correlation parametersin the full multivariate matrix explodes quickly as the dimensionality ofthe matrix increases, and estimating all these parameters becomes almostimpossible (with any estimation technique) with the usual sample sizesavailable in practice. So, imposing exclusion restrictions is a goodeconometric practice. However, if the above simple method of allowingthe pairwise marginal estimation approach to proceed without checkingfor positive-definiteness at intermediate iterations does not work, thenone can check the implied multivariate correlation matrix for positive-definiteness at each and every iteration. If the matrix is not positive-definite during a direction search at a given iteration, one can constructa ‘‘nearest’’ valid correlation matrix (see Ferdous et al., 2010 for adiscussion).

In the CMOP CML analysis of this chapter, we used an unconstrainedoptimization routine and ensured that the implied multivariate correlationmatrix at convergence was positive-definite. In the PMOP CML analysis ofthis chapter, we again employed an unconstrained optimization routine incombination with the following reparameterizations: r ¼ 1=½1þ expð�cÞ�and s ¼ expðpÞ. These reparameterizations were used to guarantee s40 and0oro1, and therefore the positive-definiteness of the Rq multivariatecorrelation matrix. Once estimated, the c and p estimates were translatedback to estimates of r and s.


4. EXPERIMENTAL DESIGN

4.1. The CMOP Model

To compare and evaluate the performance of the GHK and the CMLestimation techniques, we undertake a simulation exercise for a multivariateordered-response system with five ordinal variables. Further, to examinethe potential impact of different correlation structures, we undertake thesimulation exercise for a correlation structure with low correlations andanother with high correlations. For each correlation structure, theexperiment is carried out for 20 independent data sets with 1,000 datapoints. Prespecified values for the d vector are used to generate samples ineach data set.

In the set-up, we use three exogenous variables in the latent equation forthe first, third, and fifth ordered-response variables, and four exogenousvariables for the second and fourth ordered-response variables. The valuesfor each of the exogenous variables are drawn from a standard univariatenormal distribution. A fixed coefficient vector bi ði ¼ 1; 2; 3; 4; 5Þ is assumedon the variables, and the linear combination b0ixqi ðq ¼ 1; 2; . . . ;Q; Q ¼1; 000; i ¼ 1; 2; 3; 4; 5Þ is computed for each individual q and category i.Next, we generate Q five-variate realizations of the error term vectorð�q1; �q2; �q3; �q4; �q5Þ with predefined positive-definite low error correlationstructure (Rlow) and high error correlation structure (Rhigh) as follows:

Rlow ¼

1 :30 :20 :22 :15

:30 1 :25 :30 :12

:20 :25 1 :27 :20

:22 :30 :27 1 :25

:15 :12 :20 :25 1

26666664

37777775

and Rhigh ¼

1 :90 :80 :82 :75

:90 1 :85 :90 :72

:80 :85 1 :87 :80

:82 :90 :87 1 :85

:75 :72 :80 :85 1

26666664

37777775

(15)

The error term realization for each observation and each ordinalvariable is then added to the systematic component ðb0ixqiÞ as in Eq. (1)and then translated to ‘‘observed’’ values of yqi (0, 1, 2,y) based onprespecified threshold values. We assume four outcome levels for thefirst and the fifth ordered-response variables, three for the second andthe fourth ordered-response variables, and five for the third ordered-response variable. Correspondingly, we prespecify a vector of threethreshold values [ðyi ¼ y1i ; y

2i ; y

3i Þ, where i ¼ 1 and 5] for the first and the


fifth ordered-response equations, two for the second and the fourthequations [ðyi ¼ y1i ; y

2i Þ, where i ¼ 2 and 4], and four for the third

ordered-response equation [ðyi ¼ y1i ; y2i ; y

3i ; y

4i Þ, where i ¼ 3].

As mentioned earlier, the above data generation process is undertaken20 times with different realizations of the random error term to generate20 different data sets. The CML estimation procedure is applied to eachdata set to estimate data-specific values of the d vector. The GHK simulatoris applied to each data set using 100 draws per individual of the randomizedHalton sequence.6 In addition, to assess and to quantify simulationvariance, the GHK simulator is applied to each data set 10 times with dif-ferent (independent) randomized Halton draw sequences. This allows us toestimate simulation error by computing the standard deviation of estimatedparameters among the 10 different GHK estimates on the same data set.

A few notes are in order here. We chose to use a setting with five ordinalvariables so as to keep the computation time manageable for the MSLestimations (e.g., 10 ordinal variables will increase computation timesubstantially, especially since more number of draws per individual mayhave to be used; note also that we have a total of 400 MSL estimation runsjust for the five ordinal variable case in our experimental design).At the same time, a system of five ordinal variables leads to a large enoughdimensionality of integration in the likelihood function where simulationestimation has to be used. Of course, one can examine the effect of varyingthe number of ordinal variables on the performance of the MSL and CMLestimation approaches. In this chapter, we have chosen to focus on fivedimensions, and examine the effects of varying correlation patterns anddifferent model formulations (corresponding to cross-sectional and panelsettings). A comparison with higher numbers of ordinal variables is left asa future exercise. However, in general, it is well known that MSL estimationgets more imprecise as the dimensionality of integration increases. On theother hand, our experience with CML estimation is that the performancedoes not degrade very much as the number of ordinal variables increases(see Ferdous et al., 2010). Similarly, one can examine the effect of varyingnumbers of draws for MSL estimation. Our choice of 100 draws perindividual was based on experimentation with different numbers of drawsfor the first data set. We found little improvement in ability to recoverparameters or simulation variance beyond 100 draws per individual for thisdata set, and thus settled for 100 draws per individual for all data sets(as will be noted in Section 5, the CMOP MSL estimation with 100 drawsper individual indeed leads to negligible simulation variance). Finally, wechose to use three to four exogenous variables in our experimental design


(rather than use a single exogenous variable) so that the resulting simulationdata sets would be closer to realistic ones where multiple exogenousvariables are employed.

4.2. The PMOP Model

For the panel case, we consider six observations (J ¼ 6) per individual,leading to a six-dimensional integral per individual for the full likeli-hood function. Note that the correlation matrix Rq has entries such thatcorrðy�qj ; y

�qgÞ ¼ ðs

2 þ r tqj�tqgj jÞ=m2, where m ¼ffiffiffiffiffiffiffiffiffiffiffiffiffi1þ s2p

. Thus, in thePMOP case, Rq is completely determined by the variance s2 of theindividual-specific nonvarying random term uq and the single autoregressivecorrelation parameter r determining the correlation between the �qj and �qkterms: corrð�qj ; �qkÞ ¼ rjtqj�tqkj. To examine the impact of different magni-tudes of the autoregressive correlation parameter, we undertake thesimulation exercise for two different values of r: 0.3 and 0.7. For eachcorrelation parameter, the experiment is carried out for 100 independentdata sets with 200 data points (i.e., individuals).7 Prespecified values forthe d vector are used to generate samples in each data set.

In the set-up, we use two exogenous variables in the latent equation. Oneis a binary time-constant variable (xq1) simulated from a Bernoulli variablewith probability equal to 0.7, and another (xqj2) is a continuous time-varyingvariable generated from the autoregressive model shown below:

ðxqj2 � 1Þ ¼ 0:6ðxq;j�1;2 � 1Þ þ gqj ; gqj �iidNð0; 0:22Þ. (16)

A fixed coefficient vector b is assumed, with b1 ¼ 1 (coefficient on xq1) andb2 ¼ 1 (coefficient on xqj2). The linear combination b0xqjðxqj ¼ ðxq1; xqj2Þ

0;q ¼ 1; 2; . . . ; 200Þ is computed for each individual q’s jth observation. Next,we generate independent time-invariant values of uq for each individualfrom a standard normal distribution (i.e., we assume s2 ¼ 1), and latentserially correlated errors for each individual q as follows:

�qj ¼Zq1�

iidNð0; 1Þ for j ¼ 1

r�qj�1 þffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p� �Zqj ; Zqj �

iidNð0; 1Þ for j � 2:

8><>: (17)

The error term realizations for each individual’s observation is thenadded to the systematic component ðb0xqjÞ as in Eq. (4) and then translatedto ‘‘observed’’ values of yqj based on the following prespecified threshold


values: y1 ¼ 1:5, y2 ¼ 2:5, and y3 ¼ 3:0. The above data generation processis undertaken 100 times with different realizations of the random errorterms uq and �qj to generate 100 different data sets. The CML estimationprocedure is applied to each data set to estimate data-specific values of thed vector. The GB simulator is applied to each data set 10 times withdifferent (independent) random draw sequences. This allows us to estimatesimulation error by computing the standard deviation of estimatedparameters among the 10 different GB estimates on the same data set.The algorithm is tuned with an absolute error tolerance of 0.001 for eachsix-dimensional integral forming the likelihood. The algorithm is adaptive inthat it starts with few points and then increases the number of points perindividual until the desired precision is obtained, but with the constraintthat the maximal number of draws is 25,000.

5. PERFORMANCE COMPARISON BETWEEN

THE MSL AND CML APPROACHES

In this section, we first identify a number of performance measures anddiscuss how these are computed for the MSL approach (GHK for CMOPand GB for PMOP) and the CML approach. The subsequent sectionspresent the simulation and computational results.

5.1. Performance Measures

The steps discussed below for computing performance measures are fora specific correlation matrix pattern. For the CMOP model, we considertwo correlation matrix patterns, one with low correlations and anotherwith high correlations. For the PMOP model, we consider two correlationpatterns, corresponding to the autoregressive correlation parameter valuesof 0.3 and 0.7.

5.1.1. MSL Approach

(1) Estimate the MSL parameters for each data set s (s ¼ 1, 2,y, 20 forCMOP and s ¼ 1, 2,y, 100 for PMOP; i.e., S ¼ 20 for CMOP andS ¼ 100 for PMOP) and for each of 10 independent draws, and obtainthe time to get the convergent values and the standard errors (SE).


Note combinations for which convergence is not achieved. Everythingbelow refers to cases when convergence is achieved. Obtain the meantime for convergence (TMSL) and standard deviation of convergencetime across the converged runs and across all data sets (the time toconvergence includes the time to compute the covariance matrix ofparameters and the corresponding parameter standard errors).

(2) For each data set s and draw combination, estimate the standard errorsof parameters (using the sandwich estimator).

(3) For each data set s, compute the mean estimate for each modelparameter across the draws. Label this as MED, and then take the meanof the MED values across the data sets to obtain a mean estimate.Compute the absolute percentage bias (APB) as:

APB ¼mean estimate� true value

true value

�� 100

(4) Compute the standard deviation of the MED values across the data setsand label this as the finite sample standard error (essentially, this is theempirical standard error).

(5) For each data set s, compute the median standard error for each modelparameter across the draws. Call this MSED, and then take the mean ofthe MSED values across the R data sets and label this as the asymptoticstandard error (essentially this is the standard error of the distribution ofthe estimator as the sample size gets large). Note that we compute themedian standard error for each model parameter across the draws andlabel it as MSED rather than computing the mean standard error foreach model parameter across the draws. This is because, for some draws,the estimated standard errors turned out to be rather large relative toother independent standard error estimates for the same data set.On closer inspection, this could be traced to the unreliability of thenumeric Hessian used in the sandwich estimator computation. This isanother bothersome issue with MSL – it is important to compute thecovariance matrix using the sandwich estimator rather than using theinverse of the cross-product of the first derivatives (due to the simulationnoise introduced when using a finite number of draws per individual inthe MSL procedure; see McFadden & Train, 2000). Specifically, usingthe inverse of the cross-product of the first derivatives can substantiallyunderestimate the covariance matrix. But coding the analytic Hessian(as part of computing the sandwich estimator) is extremely difficult,


while using the numeric Hessian is very unreliable. Craig (2008) alsoalludes to this problem when he states that ‘‘(y) the randomness that isinherent in such methods [referring here to the GB algorithm,but applicable in general to MSL methods] is sometimes more thana minor nuisance.’’ In particular, even when the log-likelihood functionis computed with good precision so that the simulation error inestimated parameters is very small, this is not always adequate toreliably compute the numerical Hessian. To do so, one will generallyneed to compute the log-likelihood with a substantial level of precision,which, however, would imply very high computational times even inlow dimensionality situations. Finally, note that the mean asymptoticstandard error is a theoretical approximation to the finite samplestandard error, since, in practice, one would estimate a model on onlyone data set from the field.

(6) Next, for each data set s, compute the simulation standard deviationfor each parameter as the standard deviation in the estimatedvalues across the independent draws (about the MED value). Call thisstandard deviation as SIMMED. For each parameter, take the mean ofSIMMED across the different data sets. Label this as the simulationstandard error for each parameter.

(7) For each parameter, compute a simulation-adjusted standard error asfollows:

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðasymptotic standard errorÞ2 þ ðsimulation standard errorÞ2

q

5.1.2. CML Approach

(1) Estimate the CML parameters for each data set s and obtain the timeto get the convergent values (including the time to obtain the Godambematrix-computed covariance matrix and corresponding standarderrors). Determine the mean time for convergence (TCML) across theS data sets.8

(2) For each data set s, estimate the standard errors (using the Godambeestimator).

(3) Compute the mean estimate for each model parameter across the R datasets. Compute APB as in the MSL case.

(4) Compute the standard deviation of the CML parameter estimates acrossthe data sets and label this as the finite sample standard error (essentially,this is the empirical standard error).


5.2. Simulation Results

5.2.1. THE CMOP ModelTable 1a presents the results for the CMOP model with low correlations,and Table 1b presents the corresponding results for the CMOP model withhigh correlations. The results indicate that both the MSL and CMLapproaches recover the parameters extremely well, as can be observed bycomparing the mean estimate of the parameters with the true values (see thecolumn titled ‘‘parameter estimates’’). In the low correlation case, the APBranges from 0.03 to 15.95% (overall mean value of 2.21% – see last row oftable under the column titled ‘‘absolute percentage bias’’) across parametersfor the MSL approach, and from 0.00 to 12.34% (overall mean value of1.92%) across parameters for the CML approach. In the high correlationcase, the APB ranges from 0.02 to 5.72% (overall mean value of 1.22% – seelast row of table under the column titled ‘‘absolute percentage bias’’) acrossparameters for the MSL approach, and from 0.00 to 6.34% (overall meanvalue of 1.28%) across parameters for the CML approach. These areincredibly good measures for the ability to recover parameter estimates, andindicate that both the MSL and CML perform about evenly in the contextof bias. Further, the ability to recover parameters does not seem to beaffected at all by whether there is low correlation or high correlation (in fact,the overall APB reduces from the low correlation case to the highcorrelation case). Interestingly, the APB values are generally much higherfor the correlation (r) parameters than for the slope (b) and threshold (y)parameters in the low correlation case, but the situation is exactly reversedin the high correlation case where the APB values are generally higher forthe slope (b) and threshold (y) parameters compared to the correlation (r)parameters (for both the MSL and CML approaches). This is perhapsbecause the correlation parameters enter more nonlinearly in the likelihoodfunction than the slope and threshold parameters, and need to be particularlystrong before they start having any substantial effects on the log-likelihoodfunction value. Essentially, the log-likelihood function tends to be relativelyflat at low correlations, leading to more difficulty in accurately recoveringthe low correlation parameters. But, at high correlations, the log-likelihoodfunction shifts considerably in value with small shifts in the correlationvalues, allowing them to be recovered accurately.9

The standard error measures provide several important insights. First, thefinite sample standard error and asymptotic standard error values are quiteclose to one another, with very little difference in the overall mean valuesof these two columns (see last row). This holds for both the MSL and CML


Table 1a. Evaluation of Ability to Recover ‘‘True’’ Parameters by the MSL and CML Approaches – With Low ErrorCorrelation Structure.

Parameter True

Value

MSL Approach CML Approach Relative Efficiency

Parameter estimates Standard error estimates Parameter estimates Standard error estimates MASEMSL/

MASECML

SASEMSL/

MASECML

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard

error

ðMASEMSLÞ

Simulation

standard

error

Simulation

adjusted

standard error

ðSASEMSLÞ

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard error

ðMASECMLÞ

Coefficients

b11 0.5000 0.5167 3.34% 0.0481 0.0399 0.0014 0.0399 0.5021 0.43% 0.0448 0.0395 1.0109 1.0116

b21 1.0000 1.0077 0.77% 0.0474 0.0492 0.0005 0.0492 1.0108 1.08% 0.0484 0.0482 1.0221 1.0222

b31 0.2500 0.2501 0.06% 0.0445 0.0416 0.0010 0.0416 0.2568 2.73% 0.0252 0.0380 1.0957 1.0961

b12 0.7500 0.7461 0.52% 0.0641 0.0501 0.0037 0.0503 0.7698 2.65% 0.0484 0.0487 1.0283 1.0311

b22 1.0000 0.9984 0.16% 0.0477 0.0550 0.0015 0.0550 0.9990 0.10% 0.0503 0.0544 1.0100 1.0104

b32 0.5000 0.4884 2.31% 0.0413 0.0433 0.0017 0.0434 0.5060 1.19% 0.0326 0.0455 0.9518 0.9526

b42 0.2500 0.2605 4.19% 0.0372 0.0432 0.0006 0.0432 0.2582 3.30% 0.0363 0.0426 1.0149 1.0150

b13 0.2500 0.2445 2.21% 0.0401 0.0346 0.0008 0.0346 0.2510 0.40% 0.0305 0.0342 1.0101 1.0104

b23 0.5000 0.4967 0.66% 0.0420 0.0357 0.0021 0.0358 0.5063 1.25% 0.0337 0.0364 0.9815 0.9833

b33 0.7500 0.7526 0.34% 0.0348 0.0386 0.0005 0.0386 0.7454 0.62% 0.0441 0.0389 0.9929 0.9930

b14 0.7500 0.7593 1.24% 0.0530 0.0583 0.0008 0.0583 0.7562 0.83% 0.0600 0.0573 1.0183 1.0184

b24 0.2500 0.2536 1.46% 0.0420 0.0486 0.0024 0.0487 0.2472 1.11% 0.0491 0.0483 1.0067 1.0079

b34 1.0000 0.9976 0.24% 0.0832 0.0652 0.0017 0.0652 1.0131 1.31% 0.0643 0.0633 1.0298 1.0301

b44 0.3000 0.2898 3.39% 0.0481 0.0508 0.0022 0.0508 0.3144 4.82% 0.0551 0.0498 1.0199 1.0208

b15 0.4000 0.3946 1.34% 0.0333 0.0382 0.0014 0.0382 0.4097 2.42% 0.0300 0.0380 1.0055 1.0061

b25 1.0000 0.9911 0.89% 0.0434 0.0475 0.0016 0.0475 0.9902 0.98% 0.0441 0.0458 1.0352 1.0358

b35 0.6000 0.5987 0.22% 0.0322 0.0402 0.0007 0.0402 0.5898 1.69% 0.0407 0.0404 0.9959 0.9961

Correlation coefficients

r12 0.3000 0.2857 4.76% 0.0496 0.0476 0.0020 0.0476 0.2977 0.77% 0.0591 0.0467 1.0174 1.0184

r13 0.2000 0.2013 0.66% 0.0477 0.0409 0.0019 0.0410 0.2091 4.56% 0.0318 0.0401 1.0220 1.0231

r14 0.2200 0.1919 12.76% 0.0535 0.0597 0.0035 0.0598 0.2313 5.13% 0.0636 0.0560 1.0664 1.0682

r15 0.1500 0.1739 15.95% 0.0388 0.0439 0.0040 0.0441 0.1439 4.05% 0.0419 0.0431 1.0198 1.0239

r23 0.2500 0.2414 3.46% 0.0546 0.0443 0.0040 0.0445 0.2523 0.92% 0.0408 0.0439 1.0092 1.0133

r24 0.3000 0.2960 1.34% 0.0619 0.0631 0.0047 0.0633 0.3013 0.45% 0.0736 0.0610 1.0342 1.0372

r25 0.1200 0.1117 6.94% 0.0676 0.0489 0.0044 0.0491 0.1348 12.34% 0.0581 0.0481 1.0154 1.0194

r34 0.2700 0.2737 1.37% 0.0488 0.0515 0.0029 0.0516 0.2584 4.28% 0.0580 0.0510 1.0094 1.0110

r35 0.2000 0.2052 2.62% 0.0434 0.0378 0.0022 0.0378 0.1936 3.22% 0.0438 0.0391 0.9662 0.9678

r45 0.2500 0.2419 3.25% 0.0465 0.0533 0.0075 0.0538 0.2570 2.78% 0.0455 0.0536 0.9937 1.0034

Threshold parameters

y11

�1.0000 �1.0172 1.72% 0.0587 0.0555 0.0007 0.0555 �1.0289 2.89% 0.0741 0.0561 0.9892 0.9893

y12 1.0000 0.9985 0.15% 0.0661 0.0554 0.0011 0.0554 1.0010 0.10% 0.0536 0.0551 1.0063 1.0065

y13 3.0000 2.9992 0.03% 0.0948 0.1285 0.0034 0.1285 2.9685 1.05% 0.1439 0.1250 1.0279 1.0282

y21 0.0000 �0.0172 – 0.0358 0.0481 0.0007 0.0481 �0.0015 – 0.0475 0.0493 0.9750 0.9751

y22 2.0000 1.9935 0.32% 0.0806 0.0831 0.0030 0.0831 2.0150 0.75% 0.0904 0.0850 0.9778 0.9784

y31

�2.0000 �2.0193 0.97% 0.0848 0.0781 0.0019 0.0781 �2.0238 1.19% 0.0892 0.0787 0.9920 0.9923

y32

�0.5000 �0.5173 3.47% 0.0464 0.0462 0.0005 0.0462 �0.4968 0.64% 0.0519 0.0465 0.9928 0.9928

y33 1.0000 0.9956 0.44% 0.0460 0.0516 0.0011 0.0516 1.0014 0.14% 0.0584 0.0523 0.9877 0.9879

y34 2.5000 2.4871 0.52% 0.0883 0.0981 0.0040 0.0982 2.5111 0.44% 0.0735 0.1002 0.9788 0.9796

y41 1.0000 0.9908 0.92% 0.0611 0.0615 0.0031 0.0616 1.0105 1.05% 0.0623 0.0625 0.9838 0.9851

y42 3.0000 3.0135 0.45% 0.1625 0.1395 0.0039 0.1396 2.9999 0.00% 0.1134 0.1347 1.0356 1.0360

y51

�1.5000 �1.5084 0.56% 0.0596 0.0651 0.0032 0.0652 �1.4805 1.30% 0.0821 0.0656 0.9925 0.9937

y52 0.5000 0.4925 1.50% 0.0504 0.0491 0.0017 0.0492 0.5072 1.44% 0.0380 0.0497 0.9897 0.9903

y53 2.0000 2.0201 1.01% 0.0899 0.0797 0.0017 0.0798 2.0049 0.24% 0.0722 0.0786 1.0151 1.0154

Overall mean value

across parameters

– 2.21% 0.0566 0.0564 0.0022 0.0564 – 1.92% 0.0562 0.0559 1.0080 1.0092

Table 1b. Evaluation of Ability to Recover ‘‘True’’ Parameters by the MSL and CML Approaches – With High ErrorCorrelation Structure.

Parameter True

Value



MASECML

SASEMSL/

MASECML

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard

error

ðMASEMSLÞ

Simulation

standard

error

Simulation

adjusted

standard error

ðSASEMSLÞ

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard error

ðMASECMLÞ

Coefficients

b11 0.5000 0.5063 1.27% 0.0300 0.0294 0.0020 0.0294 0.5027 0.54% 0.0292 0.0317 0.9274 0.9294

b21 1.0000 1.0089 0.89% 0.0410 0.0391 0.0026 0.0392 1.0087 0.87% 0.0479 0.0410 0.9538 0.9560

b31 0.2500 0.2571 2.85% 0.0215 0.0288 0.0017 0.0289 0.2489 0.42% 0.0251 0.0290 0.9943 0.9961

b12 0.7500 0.7596 1.27% 0.0495 0.0373 0.0028 0.0374 0.7699 2.65% 0.0396 0.0395 0.9451 0.9477

b22 1.0000 1.0184 1.84% 0.0439 0.0436 0.0036 0.0437 1.0295 2.95% 0.0497 0.0463 0.9419 0.9451

b32 0.5000 0.5009 0.17% 0.0343 0.0314 0.0023 0.0315 0.5220 4.39% 0.0282 0.0352 0.8931 0.8955

b42 0.2500 0.2524 0.96% 0.0284 0.0294 0.0021 0.0294 0.2658 6.34% 0.0263 0.0315 0.9318 0.9343

b13 0.2500 0.2473 1.08% 0.0244 0.0233 0.0015 0.0234 0.2605 4.18% 0.0269 0.0251 0.9274 0.9293

b23 0.5000 0.5084 1.67% 0.0273 0.0256 0.0020 0.0256 0.5100 2.01% 0.0300 0.0277 0.9221 0.9248

b33 0.7500 0.7498 0.02% 0.0302 0.0291 0.0019 0.0291 0.7572 0.96% 0.0365 0.0318 0.9150 0.9170

b14 0.7500 0.7508 0.11% 0.0416 0.0419 0.0039 0.0420 0.7707 2.75% 0.0452 0.0450 0.9302 0.9341

b24 0.2500 0.2407 3.70% 0.0311 0.0326 0.0033 0.0327 0.2480 0.80% 0.0234 0.0363 0.8977 0.9022

b34 1.0000 1.0160 1.60% 0.0483 0.0489 0.0041 0.0491 1.0000 0.00% 0.0360 0.0513 0.9532 0.9566

b44 0.3000 0.3172 5.72% 0.0481 0.0336 0.0028 0.0337 0.3049 1.62% 0.0423 0.0368 0.9133 0.9165

b15 0.4000 0.3899 2.54% 0.0279 0.0286 0.0026 0.0288 0.4036 0.90% 0.0274 0.0301 0.9516 0.9554

b25 1.0000 0.9875 1.25% 0.0365 0.0391 0.0036 0.0393 1.0008 0.08% 0.0452 0.0398 0.9821 0.9862

b35 0.6000 0.5923 1.28% 0.0309 0.0316 0.0030 0.0317 0.6027 0.45% 0.0332 0.0329 0.9607 0.9649

Correlation coefficients

r12 0.9000 0.8969 0.34% 0.0224 0.0177 0.0034 0.0180 0.9019 0.21% 0.0233 0.0183 0.9669 0.9845

r13 0.8000 0.8041 0.51% 0.0174 0.0201 0.0035 0.0204 0.8009 0.11% 0.0195 0.0203 0.9874 1.0023

r14 0.8200 0.8249 0.60% 0.0284 0.0265 0.0061 0.0272 0.8151 0.60% 0.0296 0.0297 0.8933 0.9165

r15 0.7500 0.7536 0.49% 0.0248 0.0243 0.0046 0.0247 0.7501 0.01% 0.0242 0.0251 0.9678 0.9849

r23 0.8500 0.8426 0.87% 0.0181 0.0190 0.0081 0.0207 0.8468 0.38% 0.0190 0.0198 0.9606 1.0438

r24 0.9000 0.8842 1.75% 0.0187 0.0231 0.0097 0.0251 0.9023 0.26% 0.0289 0.0244 0.9484 1.0284

r25 0.7200 0.7184 0.22% 0.0241 0.0280 0.0072 0.0289 0.7207 0.09% 0.0295 0.0301 0.9298 0.9600

r34 0.8700 0.8724 0.27% 0.0176 0.0197 0.0036 0.0200 0.8644 0.65% 0.0208 0.0220 0.8972 0.9124

r35 0.8000 0.7997 0.04% 0.0265 0.0191 0.0039 0.0195 0.7988 0.15% 0.0193 0.0198 0.9645 0.9848

r45 0.8500 0.8421 0.93% 0.0242 0.0231 0.0128 0.0264 0.8576 0.89% 0.0192 0.0252 0.9156 1.0480

Threshold parameters

y11

�1.0000 �1.0110 1.10% 0.0600 0.0520 0.0023 0.0520 �1.0322 3.22% 0.0731 0.0545 0.9538 0.9548

y12 1.0000 0.9907 0.93% 0.0551 0.0515 0.0022 0.0515 1.0118 1.18% 0.0514 0.0528 0.9757 0.9766

y13 3.0000 3.0213 0.71% 0.0819 0.1177 0.0065 0.1179 2.9862 0.46% 0.1185 0.1188 0.9906 0.9921

y21 0.0000 �0.0234 – 0.0376 0.0435 0.0028 0.0436 0.0010 – 0.0418 0.0455 0.9572 0.9592

y22 2.0000 2.0089 0.44% 0.0859 0.0781 0.0066 0.0784 2.0371 1.86% 0.0949 0.0823 0.9491 0.9525

y31

�2.0000 �2.0266 1.33% 0.0838 0.0754 0.0060 0.0757 �2.0506 2.53% 0.0790 0.0776 0.9721 0.9752

y32

�0.5000 �0.5086 1.73% 0.0305 0.0440 0.0030 0.0441 �0.5090 1.80% 0.0378 0.0453 0.9702 0.9725

y33 1.0000 0.9917 0.83% 0.0516 0.0498 0.0035 0.0499 0.9987 0.13% 0.0569 0.0509 0.9774 0.9798

y34 2.5000 2.4890 0.44% 0.0750 0.0928 0.0066 0.0930 2.5148 0.59% 0.1144 0.0956 0.9699 0.9724

y41 1.0000 0.9976 0.24% 0.0574 0.0540 0.0050 0.0542 1.0255 2.55% 0.0656 0.0567 0.9526 0.9566

y42 3.0000 3.0101 0.34% 0.1107 0.1193 0.0125 0.1200 3.0048 0.16% 0.0960 0.1256 0.9498 0.9550

y51

�1.5000 �1.4875 0.84% 0.0694 0.0629 0.0056 0.0632 �1.5117 0.78% 0.0676 0.0649 0.9699 0.9737

y52 0.5000 0.4822 3.55% 0.0581 0.0465 0.0041 0.0467 0.4968 0.64% 0.0515 0.0472 0.9868 0.9906

y53 2.0000 1.9593 2.03% 0.0850 0.0741 0.0064 0.0744 2.0025 0.12% 0.0898 0.0761 0.9735 0.9771

Overall mean value

across parameters

– 1.22% 0.0429 0.0428 0.0044 0.0432 – 1.28% 0.0455 0.0449 0.9493 0.9621

estimation approaches, and for both the low and high correlation cases, andconfirms that the inverses of the sandwich information estimator (in the caseof the MSL approach) and the Godambe information matrix estimator(in the case of the CML approach) recover the finite sample covariancematrices remarkably well. Second, the empirical and asymptotic standarderrors for the threshold parameters are higher than for the slope andcorrelation parameters (for both the MSL and CML cases, and for both thelow and high correlation cases). This is perhaps because the thresholdparameters play a critical role in the partitioning of the underlying latentvariable into ordinal outcomes (more so than the slope and correlationparameters), and so are somewhat more difficult to pin down. Third, acomparison of the standard errors across the low and high correlation casesreveals that the empirical and asymptotic standard errors are much lowerfor the correlation parameters in the latter case than in the former case. Thisreinforces the finding earlier that the correlation parameters are much easierto recover at high values because of the considerable influence they haveon the log-likelihood function at high values; consequently, not only arethey recovered accurately, but they are also recovered more precisely at highcorrelation values. Fourth, across all parameters, there is a reduction in theempirical and asymptotic standard errors for both the MSL and CML casesbetween the low and high correlation cases (though the reduction is muchmore for the correlation parameters than for the noncorrelation para-meters). Fifth, the simulation error in the MSL approach is negligible tosmall. On average, based on the mean values in the last row of the table, thesimulation error is about 3.9% of the sampling error for the low correlationcase and 10.3% of the sampling error for the high correlation case. Thehigher simulation error for the high correlation case is not surprising, sincewe use the same number of Halton draws per individual in both the low andhigh correlation cases, and the multivariate integration is more involvedwith a high correlation matrix structure. Thus, as the levels of correlationsincrease, the evaluation of the multivariate normal integrals can be expectedto become less precise at a given number of Halton draws per individual.However, overall, the results suggest that our MSL simulation procedure iswell tuned, and that we are using adequate numbers of Halton draws perindividual for the accurate evaluation of the log-likelihood function andthe accurate estimation of the model parameters (this is also reflected in thenegligible difference in the simulation-adjusted standard error and the meanasymptotic standard error of parameters in the MSL approach).

The final two columns of each of Tables 1a and 1b provide a relativeefficiency factor between the MSL and CML approaches. The first of these


columns provides the ratio of the asymptotic standard error of parametersfrom the MSL approach and the asymptotic standard error of thecorresponding parameters from the CML approach. The second of thesecolumns provides the ratio of the simulation-adjusted standard error ofparameters from the MSL approach and the asymptotic standard error ofparameters from the CML approach. As expected, the second columnprovides slightly higher values of efficiency, indicating that CML efficiencyincreases when one also considers the presence of simulation standard errorin the MSL estimates. However, this efficiency increase is negligible in thecurrent context because of very small MSL simulation error. The moreimportant and interesting point, though, is that the relative efficiency of theCML approach is as good as the MSL approach in the low correlation case.This is different from the relative efficiency results obtained in Renard,Molenberghs, and Geys (2004), Zhao and Joe (2005), and Kuk and Nott(2000) in other model contexts, where the CML has been shown to loseefficiency relative to a ML approach. However, note that all these otherearlier studies focus on a comparison of a CML approach vis-a-vis a MLapproach, while, in our setting, we must resort to MSL to approximate thelikelihood function. To our knowledge, this is the first comparison of theCML approach to an MSL approach, applicable to situations when the fullinformation ML estimator cannot be evaluated analytically. In this regard,it is not clear that the earlier theoretical result that the difference betweenthe asymptotic covariance matrix of the CML estimator (obtained as theinverse of the Godambe matrix) and of the ML estimator (obtained as theinverse of the cross-product matrix of derivatives) should be positive semi-definite would extend to our case because the asymptotic covariance of MSLis computed as the inverse of the sandwich information matrix.10 Basically,the presence of simulation noise, even if very small in the estimates of theparameters as in our case, can lead to a significant drop in the amountof information available in the sandwich matrix, resulting in increasedstandard errors of parameters when using MSL. Our results regarding theefficiency of individual parameters suggests that any reduction in efficiencyof the CML (because of using only pairwise likelihood rather than the fulllikelihood) is balanced by the reduction in efficiency because of using MSLrather than ML, so that there is effectively no loss in asymptotic efficiency inusing the CML approach (relative to the MSL approach) in the CMOP casefor low correlation. However, for the high correlation case, the MSL doesprovide slightly better efficiency than the CML. However, even in this case,the relative efficiency of parameters in the CML approach ranges between90 and 99% (mean of 95%) of the efficiency of the MSL approach, without


considering simulation standard error. When considering simulation error,the relative efficiency of the CML approach is even better at about 96% ofthe MSL efficiency (on average across all parameters). Overall, there is littleto no drop in efficiency because of the use of the CML approach in theCMOP simulation context.

5.2.2. The PMOP ModelMost of the observations made from the CMOP model results also hold forthe PMOP model results presented in Table 2. Both the MSL and CMLapproaches recover the parameters extremely well. In the low correlationcase, the APB ranges from 0.26 to 4.29% (overall mean value of 1.29%)across parameters for the MSL approach, and from 0.65 to 5.33% (overallmean value of 1.84%) across parameters for the CML approach. In the highcorrelation case, the APB ranges from 0.45 to 6.14% (overall mean value of2.06%) across parameters for the MSL approach, and from 0.41 to 5.71%(overall mean value of 2.40%) across parameters for the CML approach.Further, the ability to recover parameters does not seem to be affected toomuch in an absolute sense by whether there is low correlation or highcorrelation. The CML approach shows a mean value of APB that increasesabout 1.3 times (from 1.84 to 2.40%) between the low and high r valuescompared to an increase of about 1.6 times (from 1.29 to 2.06%) for theMSL approach. It in indeed interesting that the PMOP results indicatea relative increase in the APB values from the low to high correlation case,while there was actually a corresponding relative decrease in the CMOPcase. Another result is that the APB increases from the low to the highcorrelation case for the threshold (y) and variance ðs2Þ parameters in boththe MSL and CML approaches. On the other hand, the APB decreases fromthe low to the high correlation case for the correlation (r) parameter, andremains relatively stable between the low and high correlation cases for theslope (b) parameters. That is, the recovery of the slope parameters appearsto be less sensitive to the level of correlation than is the recovery of otherparameters.

The finite sample standard error and asymptotic standard error values areclose to one another, with very little difference in the overall mean values ofthese two columns (see last row). This holds for both the MSL and CMLapproaches. Also, as in the CMOP case, the empirical and asymptoticstandard errors for the threshold parameters are generally higher thanfor the other parameters. The simulation error in the MSL approach isnegligible, at about 0.1% or less than the sampling error for both the lowand high correlation cases. Note that, unlike in the CMOP case, the PMOP


Table 2. Evaluation of Ability to Recover ‘‘True’’ Parameters by the MSL and CML Approaches – The Panel Case.

Parameter True

Value



MASECML

SASEMSL/

MASECML

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard

error

ðMASEMSLÞ

Simulation

standard

error

Simulation

adjusted

standard error

ðSASEMSLÞ

Mean

estimate

Absolute

percentage

bias

Finite

sample

standard

error

Asymptotic

standard error

ðMASECMLÞ

r ¼ 0.30

b1 1.0000 0.9899 1.01% 0.1824 0.1956 0.0001 0.1956 0.9935 0.65% 0.1907 0.1898 1.0306 1.0306

b2 1.0000 1.0093 0.93% 0.1729 0.1976 0.0001 0.1976 1.0221 2.21% 0.1955 0.2142 0.9223 0.9223

r 0.3000 0.2871 4.29% 0.0635 0.0605 0.0000 0.0605 0.2840 5.33% 0.0632 0.0673 0.8995 0.8995

s2 1.0000 1.0166 1.66% 0.2040 0.2072 0.0002 0.2072 1.0142 1.42% 0.2167 0.2041 1.0155 1.0155

y1 1.5000 1.5060 0.40% 0.2408 0.2615 0.0001 0.2615 1.5210 1.40% 0.2691 0.2676 0.9771 0.9771

y2 2.5000 2.5129 0.52% 0.2617 0.2725 0.0002 0.2725 2.5272 1.09% 0.2890 0.2804 0.9719 0.9719

y3 3.0000 3.0077 0.26% 0.2670 0.2814 0.0002 0.2814 3.0232 0.77% 0.2928 0.2882 0.9763 0.9763

Overall mean value

across parameters

– 1.29% 0.1989 0.2109 0.0001 0.2109 – 1.84% 0.2167 0.2159 0.9705 0.9705

r ¼ 0.70

b1 1.0000 1.0045 0.45% 0.2338 0.2267 0.0001 0.2267 1.0041 0.41% 0.2450 0.2368 0.9572 0.9572

b2 1.0000 1.0183 1.83% 0.1726 0.1812 0.0001 0.1812 1.0304 3.04% 0.1969 0.2199 0.8239 0.8239

r 0.7000 0.6854 2.08% 0.0729 0.0673 0.0001 0.0673 0.6848 2.18% 0.0744 0.0735 0.9159 0.9159

s2 1.0000 1.0614 6.14% 0.4634 0.4221 0.0004 0.4221 1.0571 5.71% 0.4864 0.4578 0.9220 0.9220

y1 1.5000 1.5192 1.28% 0.2815 0.2749 0.0002 0.2749 1.5304 2.03% 0.3101 0.3065 0.8968 0.8968

y2 2.5000 2.5325 1.30% 0.3618 0.3432 0.0003 0.3432 2.5433 1.73% 0.3904 0.3781 0.9076 0.9076

y3 3.0000 3.0392 1.31% 0.4033 0.3838 0.0003 0.3838 3.0514 1.71% 0.4324 0.4207 0.9123 0.9123

Overall mean value

across parameters

– 2.06% 0.2842 0.2713 0.0002 0.2713 – 2.40% 0.3051 0.2990 0.9051 0.9051

MSL estimation did not involve the same number of draws per individualfor the low and high correlation cases; rather, the number of draws varied toensure an absolute error tolerance of 0.001 for each six-dimensional integralforming the likelihood. Thus, it is no surprise that the simulation error doesnot increase much between the low and high correlation cases as it did inthe CMOP case. A significant difference with the CMOP case is that theempirical standard errors and asymptotic standard errors are consistentlylarger for the high correlation case than for the low correlation case, with aparticularly substantial increase in the standard error of s2.

The final two columns provide a relative efficiency factor between theMSL and CML approaches. The values in these two columns are identicalbecause of the very low simulation error. As in the CMOP case, theestimated efficiency of the CML approach is as good as the MSL approachin the low correlation case (the relative efficiency ranges between 90 and103%, with a mean of 97%). For the high correlation case, the relativeefficiency of parameters in the CML approach ranges between 82 and 96%(mean of 91%) of the efficiency of the MSL approach, indicating areduction in efficiency as the dependence level goes up (again, consistentwith the CMOP case). Overall, however, the efficiency of the CMLapproach remains high for all the parameters.

5.3. Nonconvergence and Computational Time

The simulation estimation of multivariate ordered-response model caninvolve numerical instability because of possible unstable operations such aslarge matrix inversions and imprecision in the computation of the Hessian.This can lead to convergence problems. On the other hand, the CMLapproach is a straightforward approach that should be easy to implementand should not have any convergence-related problems. In this empiricalstudy, we classified any estimation run that had not converged in 5 hours ashaving nonconverged.

We computed nonconvergence rates in two ways for the MSL approach.For the CMOP model, we computed the nonconvergence rates in termsof the starting seeds that led to failure in a complete estimation of 10simulation runs (using different randomized Halton sequences) for eachdata set. If a particular starting seed led to failure in convergence for any ofthe 10 simulation runs, that seed was classified as a failed seed. Otherwise,the seed was classified as a successful seed. This procedure was applied foreach of the 20 data sets generated for each of the low and high correlation


matrix structures until we had a successful seed.11 The nonconvergence ratewas then computed as the number of failed seeds divided by the totalnumber of seeds considered. Note that this would be a good reflection ofnonconvergence rates if the analyst ran the simulation multiple times on asingle data set to recognize simulation noise in statistical inferences. But,in many cases, the analyst may run the MSL procedure only once on a singledata set, based on using a high level of accuracy in computing themultivariate integrals in the likelihood function. For the PMOP model,which was estimated based on as many draws as needed to obtain anabsolute error tolerance of 0.001 for each six-dimensional integralforming the likelihood, we, therefore, consider another way of computingnonconvergence. This is based on the number of unsuccessful runs out of the1,000 simulated estimation runs considered (100 data sets times 10 simulatedestimation runs). The results indicated a nonconvergence rate of 28.5%for the low correlation case and 35.5% for the high correlation case inthe CMOP model, and a nonconvergence rate of 4.2% for the lowcorrelation case and 2.4% for the high correlation case in the PMOP model(note, however, that the rates cannot be compared between the CMOP andPMOP models because of very different ways of computing the rates, asdiscussed above). For both the CMOP and PMOP models, and both thelow and high correlation cases, we always obtained convergence with theCML approach.

Next, we examined the time to convergence per converged estimation runfor the MSL and CML procedures (the time to convergence included thetime to compute the standard error of parameters). For the CMOP model,we had a very well-tuned and efficient MSL procedure with an analyticgradient (written in Gauss matrix programming language). We used naıveindependent probit starting values for the MSL as well as the CML in theCMOP case (the CML is very easy to code relative to the MSL, and was alsoundertaken in the GAUSS language for the CMOP model). The estimationswere run on a desktop machine. But, for the PMOP model, we used an MSLcode written in the R language without an analytic gradient, and a CMLcode written using a combination of C and R languages. However, we usedthe CML convergent values (which are pretty good) as the MSL start valuesin the PMOP model to compensate for the lack of analytic MSL gradients.The estimations were run on a powerful server machine. As a consequenceof all these differences, one needs to be careful in the computational timecomparisons. Here, we only provide a relative computational time factor(RCTF), computed as the mean time needed for an MSL run divided by themean time needed for a CML run. In addition, we present the standard


deviation of the run times (SDR) as a percentage of mean run time for theMSL and CML estimations.

The RCTF for the CMOP model for the case of the low correlationmatrix is 18, and for the case of the high correlation matrix is 40. Thesubstantially higher RCTF for the high correlation case is because of anincrease in the mean MSL time between the low and high correlation cases;the mean CML time hardly changed. The MSL SDR in the CMOP modelfor the low correlation case is 30% and for the high correlation case is 47%,while the CML SDR is about 6% for both the low and high correlationcases. The RCTF for the PMOP model for the case of low correlation is 332,and for the case of high correlation is 231. The MSL SDR values for the lowand high correlation cases in the PMOP model are in the order of 16–24%,though this small SDR is also surely because of using the CML convergentvalues as the start values for the MSL estimation runs. The CMLSDR values in the PMOP model are low (6–13%) for both the low andhigh correlation cases. Overall, the computation time results do veryclearly indicate the advantage of the CML over the MSL approach – theCML approach estimates parameters in much less time than the MSL, andthe stability in the CML computation time is substantially higher thanthe stability in the MSL computation times. As the number of ordered-response outcomes increase, one can only expect a further increase inthe computational time advantage of the CML over the MSL estimationapproach.

6. CONCLUSIONS

This chapter compared the performance of the MSL approach with theCML approach in multivariate ordered-response situations. We usedsimulated data sets with known underlying model parameters to evaluatethe two estimation approaches in the context of a cross-sectional ordered-response setting as well as a panel ordered-response setting. The ability ofthe two approaches to recover model parameters was examined, as was thesampling variance and the simulation variance of parameters in the MSLapproach relative to the sampling variance in the CML approach. Thecomputational costs of the two approaches were also presented.

Overall, the simulation results demonstrate the ability of the CMLapproach to recover the parameters in a multivariate ordered-responsechoice model context, independent of the correlation structure. In addition,the CML approach recovers parameters as well as the MSL estimation


approach in the simulation contexts used in this study, while also doing so ata substantially reduced computational cost and improved computationalstability. Further, any reduction in the efficiency of the CML approachrelative to the MSL approach is in the range of nonexistent to small. Allthese factors, combined with the conceptual and implementation simplicityof the CML approach, makes it a promising and simple approach not onlyfor the multivariate ordered-response model considered here, but also forother analytically intractable econometric models. Also, as the dimension-ality of the model explodes, the CML approach remains practical andfeasible, while the MSL approach becomes impractical and/or infeasible.Additional comparisons of the CML approach with the MSL approach forhigh-dimensional model contexts and alternative covariance patterns aredirections for further research.

NOTES

1. The first three of these studies use the bivariate ordered-response probit(BORP) model in which the stochastic elements in the two ordered-responseequations take a bivariate normal distribution, while the last study develops a moregeneral and flexible copula-based bivariate ordered-response model that subsumesthe BORP as but one special case.2. A handful of studies (see Hjort & Varin, 2008; Mardia, Kent, Hughes, &

Taylor, 2009; Cox & Reid, 2004) have also theoretically examined the limitingnormality properties of the CML approach, and compared the asymptotic variancematrices from this approach with the maximum likelihood approach. However, sucha precise theoretical analysis is possible only for very simple models, and becomesmuch harder for models such as a multivariate ordered-response system.3. In this study, we assume that the number of panel observations is the same

across individuals. Extension to the case of different numbers of panel observationsacross individuals does not pose any substantial challenges. However, the efficiencyof the composite marginal likelihood (CML) approach depends on the weights usedfor each individual in the case of varying number of observations across individuals(see Kuk & Nott, 2000; Joe & Lee, 2009 provide a recent discussion and propose newweighting techniques). But one can simply put a weight of one without any loss ofgenerality for each individual in the case of equal number of panel observations foreach individual. In our study, the focus is on comparing the performance of themaximum simulated likelihood approach with the CML approach, so we steer clearof issues related to optimal weights for the CML approach by considering the ‘‘equalobservations across individuals’’ case.4. Note that one can also use more complicated autoregressive structures of order

p for the error terms, or use more general structures for the error correlation.For instance, while we focus on a time-series context, in spatial contexts related toordered-response modeling, Bhat et al. (2010) developed a specification where the


correlation in physical activity between two individuals may be a function of severalmeasures of spatial proximity and adjacency.5. Intuitively, in the pairwise CML approach used in this chapter, the surrogate

likelihood function represented by the CML function is the product of the marginallikelihood functions formed by each pair of ordinal variables. In general,maximization of the original likelihood function will result in parameters that tendto maximize each pairwise likelihood function. Since the CML is the product ofpairwise likelihood contributions, it will, therefore, provide consistent estimates.Another equivalent way to see this is to assume we are discarding all but two randomlyselected ordinal variables in the original likelihood function. Of course, we will not beable to estimate all the model parameters from two random ordinal variables, but if wecould, the resulting parameters would be consistent because information (captured byother ordinal variables) is being discarded in a purely random fashion. The CMLestimation procedure works similarly, but combines all ordinal variables observed twoat a time, while ignoring the full joint distribution of the ordinal variables.6. Bhat (2001) used Halton sequence to estimate mixed logit models, and found

that the simulation error in estimated parameters is lower with 100 Halton drawsthan with 1,000 random draws (per individual). In our study, we carried out theGHK analysis of the multivariate ordered-response model with 100 randomizedHalton draws as well as 500 random draws per individual, and found the 100randomized Halton draws case to be much more accurate/efficient as well as muchless time-consuming. So, we present only the results of the 100 randomized Haltondraws case here.7. Note that we use more independent data sets for the panel case than the cross-

sectional case, because the number of individuals in the panel case is fewer than thenumber of individuals in the cross-sectional case. Essentially, the intent is to retainthe same order of sampling variability in the two cases across individuals and datasets (the product of the number of observations per data set and the number of datasets is 20,000 in the cross-sectional and the panel cases). Further, the lower numberof data sets in the cross-sectional case is helpful because maximum simulatedlikelihood is more expensive relative to the panel case, given that the number ofparameters to be estimated is substantially more than in the panel case. Note alsothat the dimensionality of the correlation matrices is about the same in the cross-sectional and panel cases. We use T ¼ 6 in the panel case because the serialcorrelation gets manifested in the last five of the six observations for each individual.The first observation error term eq1 for each individual q is randomly drawn from thenormal distribution with variance s2.8. The CML estimator always converged in our simulations, unlike the MSL

estimator.9. One could argue that the higher absolute percentage bias values for the

correlation parameters in the low correlation case compared to the high correlationcase is simply an artifact of taking percentage differences from smaller basecorrelation values in the former case. However, the sum of the absolute values of thedeviations between the mean estimate and the true value is 0.0722 for the lowcorrelation case and 0.0488 for the high correlation case. Thus, the correlation valuesare indeed being recovered more accurately in the high correlation case compared tothe low correlation case.


10. McFadden and Train (2000) indicate, in their use of independent number ofrandom draws across observations, that the difference between the asymptoticcovariance matrix of the MSL estimator obtained as the inverse of the sandwichinformation matrix and the asymptotic covariance matrix of the MSL estimatorobtained as the inverse of the cross-product of first derivatives should be positive-definite for finite number of draws per observation. Consequently, for the case ofindependent random draws across observations, the relationship between the MSLsandwich covariance matrix estimator and the CML Godambe covariance matrix isunclear. The situation gets even more unclear in our case because of the use ofHalton or Lattice point draws that are not based on independent random drawsacross observations.11. Note that we use the terminology ‘‘successful seed’’ to simply denote if the

starting seed led to success in a complete estimation of the 10 simulation runs.In MSL estimation, it is not uncommon to obtain nonconvergence (because of anumber of reasons) for some sets of random sequences. There is, however, nothingspecific to be learned here in terms of what starting seeds are likely to be successfuland what starting seeds are likely to be unsuccessful. The intent is to use theterminology ‘‘successful seed’’ simply as a measure of nonconvergence rates.

ACKNOWLEDGMENTS

The authors are grateful to Lisa Macias for her help in formatting thisdocument. Two referees provided important input on an earlier version ofthe chapter.

REFERENCES

Apanasovich, T. V., Ruppert, D., Lupton, J. R., Popovic, N., Turner, N. D., Chapkin, R. S., &

Carroll, R. J. (2008). Aberrant crypt foci and semiparametric modelling of correlated

binary data. Biometrics, 64(2), 490–500.

Balia, S., & Jones, A. M. (2008). Mortality, lifestyle and socio-economic status. Journal of

Health Economics, 27(1), 1–26.

Bellio, R., & Varin, C. (2005). A pairwise likelihood approach to generalized linear models with

crossed random effects. Statistical Modelling, 5(3), 217–227.


multinomial logit model. Transportation Research Part B, 35(7), 677–693.


and scrambled Halton sequences. Transportation Research Part B, 37(9), 837–855.

Bhat, C. R., Sener, I. N., & Eluru, N. (2010). A flexible spatially dependent discrete choice

model: Formulation and application to teenagers’ weekday recreational activity

participation. Transportation Research Part B, 44(8–9), 903–921.


Bhat, C. R., & Srinivasan, S. (2005). A multidimensional mixed ordered-response model

for analyzing weekend activity participation. Transportation Research Part B, 39(3),

255–278.

Caragea, P. C., & Smith, R. L. (2007). Asymptotic properties of computationally efficient

alternative estimators for a class of multivariate normal models. Journal of Multivariate

Analysis, 98(7), 1417–1440.

Chen, M.-H., & Dey, D. K. (2000). Bayesian analysis for correlated ordinal data models. In:

D. K. Dey, S. K. Gosh & B. K. Mallick (Eds), Generalized linear models: A Bayesian

perspective. New York: Marcel Dekker.

Cox, D., & Reid, N. (2004). A note on pseudolikelihood constructed from marginal densities.

Biometrika, 91(3), 729–737.

Craig, P. (2008). A new reconstruction of multivariate normal orthant probabilities. Journal of

the Royal Statistical Society: Series B, 70(1), 227–243.

de Leon, A. R. (2005). Pairwise likelihood approach to grouped continuous model and its

extension. Statistics & Probability Letters, 75(1), 49–57.

Engle, R. F., Shephard, N., & Sheppard, K. (2007). Fitting and testing vast dimensional time-

varying covariance models. Finance Working Papers, FIN-07-046. Stern School of

Business, New York University.

Engler, D. A., Mohapatra, M., Louis, D. N., & Betensky, R. A. (2006). A pseudolikelihood

approach for simultaneous analysis of array comparative genomic hybridizations.

Biostatistics, 7(3), 399–421.

Ferdous, N., Eluru, N., Bhat, C. R., & Meloni, I. (2010). A multivariate ordered-response

model system for adults’ weekday activity episode generation by activity purpose and

social context. Transportation Research Part B, 44(8–9), 922–943.

Genz, A. (1992). Numerical computation of multivariate normal probabilities. Journal of

Computational and Graphical Statistics, 1(2), 141–149.

Genz, A. (2003). Fully symmetric interpolatory rules for multiple integrals over hyper-spherical

surfaces. Journal of Computational and Applied Mathematics, 157(1), 187–195.

Genz, A., & Bretz, F. (1999). Numerical computation of multivariate t-probabilities with

application to power calculation of multiple contrasts. Journal of Statistical Computation

and Simulation, 63(4), 361–378.

Genz, A., & Bretz, F. (2002). Comparison of methods for the computation of multivariate t

probabilities. Journal of Computational and Graphical Statistics, 11(4), 950–971.

Geweke, J. (1991). Efficient simulation from the multivariate normal and student-t distributions

subject to linear constraints. Computer Science and Statistics: Proceedings of the Twenty

Third Symposium on the Interface, Foundation of North America Inc., Fairfax, VA (pp.

571–578).

Godambe, V. (1960). An optimum property of regular maximum likelihood equation. The

Annals of Mathematical Statistics, 31(4), 1208–1211.

Greene, W. H., & Hensher, D. A. (2010). Modeling ordered choices: A primer. Cambridge:

Cambridge University Press.

Hajivassiliou, V., & McFadden, D. (1998). The method of simulated scores for the estimation of

LDV models. Econometrica, 66(4), 863–896.

Hasegawa, H. (2010). Analyzing tourists’ satisfaction: A multivariate ordered probit approach.

Tourism Management, 31(1), 86–97.

Herriges, J. A., Phaneuf, D. J., & Tobias, J. L. (2008). Estimating demand systems when

outcomes are correlated counts. Journal of Econometrics, 147(2), 282–298.


Hjort, N. L., & Varin, C. (2008). ML, PL, QL in Markov chain models. Scandinavian Journal of

Statistics, 35(1), 64–82.

Hothorn, T., Bretz, F., & Genz, A. (2001). On multivariate t and gauss probabilities in R. R

News, 1(2), 27–29.

Jeliazkov, I., Graves, J., & Kutzbach, M. (2008). Fitting and comparison of models for

multivariate ordinal outcomes. Advances in Econometrics, 23, 115–156.

Joe, H., & Lee, Y. (2009). On weighting of bivariate margins in pairwise likelihood. Journal of

Multivariate Analysis, 100(4), 670–685.

Keane, M. (1990). Four essays in empirical macro and labor economics. Ph.D. thesis, Brown

University, Providence, RI.



Kuk, A. Y. C., & Nott, D. J. (2000). A pairwise likelihood approach to analyzing correlated

binary data. Statistics & Probability Letters, 47(4), 329–335.

LaMondia, J., & Bhat, C. R. (2009). A conceptual and methodological framework of leisure

activity loyalty accommodating the travel context: Application of a copula-based bivariate

ordered-response choice model. Technical paper, Department of Civil, Architectural and

Environmental Engineering, The University of Texas at Austin.

Lele, S. R. (2006). Sampling variability and estimates of density dependence: A composite-

likelihood approach. Ecology, 87(1), 189–202.

Lindsay, B. G. (1988). Composite likelihood methods. Contemporary Mathematics, 80, 221–239.

Liu, I., & Agresti, A. (2005). The analysis of ordered categorical data: An overview and a survey

of recent developments. TEST: An Official Journal of the Spanish Society of Statistics

and Operations Research, 14(1), 1–73.

Mardia, K., Kent, J. T., Hughes, G., & Taylor, C. C. (2009). Maximum likelihood estimation

using composite likelihoods for closed exponential families. Biometrika, 96(4), 975–982.

McFadden, D., & Train, K. (2000). Mixed MNL models for discrete response. Journal of

Applied Econometrics, 15(5), 447–470.

McKelvey, R., & Zavoina, W. (1971). An IBM Fortran IV program to perform n-chotomous

multivariate probit analysis. Behavioral Science, 16, 186–187.

McKelvey, R. D., & Zavoina, W. (1975). A statistical model for the analysis of ordinal-level

dependent variables. Journal of Mathematical Sociology, 4, 103–120.

Mi, X., Miwa, T., & Hothorn, T. (2009). mvtnorm: New numerical algorithm for multivariate

normal probabilities. The R Journal, 1, 37–39.

Mitchell, J., & Weale, M. (2007). The reliability of expectations reported by British households:

Micro evidence from the BHPS. Discussion paper. National Institute of Economic and

Social Research, London, UK.

Molenberghs, G., & Verbeke, G. (2005). Models for discrete longitudinal data. New York:

Springer Series in Statistics, Springer ScienceþBusiness Media, Inc.

Muller, G., & Czado, C. (2005). An autoregressive ordered probit model with application to

high frequency financial data. Journal of Computational and Graphical Statistics, 14(2),

320–338.

R Development Core Team. (2009). R: A language and environment for statistical computing

(ISBN 3-900051-07-0. Available at http://www.R-project.org.). Vienna, Austria: R

Foundation for Statistical Computing.

Renard, D., Molenberghs, G., & Geys, H. (2004). A pairwise likelihood approach to estimation

in multilevel probit models. Computational Statistics & Data Analysis, 44(4), 649–667.


http://www.R-project.org

Scott, D. M., & Axhausen, K. W. (2006). Household mobility tool ownership: Modeling

interactions between cars and season tickets. Transportation, 33(4), 311–328.

Scott, D. M., & Kanaroglou, P. S. (2002). An activity-episode generation model that captures

interactions between household heads: Development and empirical analysis. Transporta-

tion Research Part B, 36(10), 875–896.

Scotti, C. (2006). A bivariate model of Fed and ECB main policy rates. International Finance

Discussion Papers 875, Board of Governors of the Federal Reserve System, Washington,

DC, USA.

Train, K. (2003). Discrete choice methods with simulation (1st ed.). Cambridge: Cambridge

University Press.

Varin, C., & Czado, C. (2010). A mixed autoregressive probit model for ordinal longitudinal

data. Biostatistics, 11(1), 127–138.

Varin, C., Host, G., & Skare, O. (2005). Pairwise likelihood inference in spatial generalized

linear mixed models. Computational Statistics & Data Analysis, 49(4), 1173–1191.

Varin, C., & Vidoni, P. (2006). Pairwise likelihood inference for ordinal categorical time series.

Computational Statistics & Data Analysis, 51(4), 2365–2373.

Varin, C., & Vidoni, P. (2009). Pairwise likelihood inference for general state space models.

Econometric Reviews, 28(1–3), 170–185.

Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American

Sociological Review, 49(4), 512–525.

Zhao, Y., & Joe, H. (2005). Composite likelihood estimation in multivariate data analysis.

The Canadian Journal of Statistics, 33(3), 335–356.


PRETEST ESTIMATION IN THE

RANDOM PARAMETERS

LOGIT MODEL

Tong Zeng and R. Carter Hill

ABSTRACT

In this paper we use Monte Carlo sampling experiments to examine theproperties of pretest estimators in the random parameters logit (RPL)model. The pretests are for the presence of random parameters. We studythe Lagrange multiplier (LM), likelihood ratio (LR), and Wald tests,using conditional logit as the restricted model. The LM test is the fastesttest to implement among these three test procedures since it only usesrestricted, conditional logit, estimates. However, the LM-based pretestestimator has poor risk properties. The ratio of LM-based pretestestimator root mean squared error (RMSE) to the random parameterslogit model estimator RMSE diverges from one with increases in thestandard deviation of the parameter distribution. The LR and Wald testsexhibit properties of consistent tests, with the power approaching one asthe specification error increases, so that the pretest estimator isconsistent. We explore the power of these three tests for the randomparameters by calculating the empirical percentile values, size, andrejection rates of the test statistics. We find the power of LR and Waldtests decreases with increases in the mean of the coefficient distribution.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)0000026008

107

dx.doi.org/10.1108/S0731-9053(2010)0000026008

The LM test has the weakest power for presence of the random coefficientin the RPL model.

1. INTRODUCTION

In this paper, we use Monte Carlo sampling experiments to examine theproperties of pretest estimators in the random parameters logit (RPL)model, also called the mixed logit model. The pretests are for the presence ofrandom parameters. We study the Lagrange multiplier (LM), likelihoodratio (LR), and Wald tests, using conditional logit as the restricted model.Unlike the conditional logit model, the mixed logit model does not imposethe Independence from Irrelevant Alternatives (IIA) assumption. The mixedlogit model can capture random taste variation among individuals andallows the unobserved factors of utility to be correlated over time as well.

The choice probabilities in the mixed logit model cannot be calculatedanalytically because they involve a multidimensional integral, which doesnot have closed form solution. The integral can be approximated usingsimulation. The requirement of a large number of pseudo-random numbersduring the simulation leads to long computational times. We are interestedin testing the randomness of the mixed logit coefficients and the propertiesof pretest estimators in the mixed logit following the test for randomness.If the model coefficients are not random, then the mixed logit modelreduces to the simpler conditional logit model. The most commonly usedtest procedures for this purpose are the Wald (or t-) test and the LR testfor the significance of the random components of the coefficients.The problem is that in order to implement these tests the mixed logit modelmust be estimated. It is much faster to implement the LM test, as therestricted estimates come from the conditional logit model, which is easilyestimated.

We use Monte Carlo experiments in the context of one- and two-parameter choice models with four alternatives to examine the riskproperties of pretest estimator based on LM, LR, and Wald tests. Weexplore the power of the three tests for random parameters by calculatingthe empirical 90th and 95th percentile values of the three tests andexamining rejection rates of the three tests by using the empirical 90th and95th percentile values as the critical values for 10 and 5% significance levels.We find the pretest estimators based on the LR and Wald statistics have

TONG ZENG AND R. CARTER HILL108

root mean squared error (RMSE) that is less than that of the randomparameters logit model when the parameter variance is small, but thatpretest estimator RMSE is worse than that of the random parameters logitmodel over the remaining parameter space. The LR and Wald tests exhibitproperties of consistent tests, with the power approaching one as thespecification error increases. However, the power of LR and Wald testsdecreases with increases in the mean of the coefficient distribution. Theratio of LM-based pretest estimator RMSE to that RMSE of the randomparameters logit model rises and becomes further away from one withincreases in the standard deviation of the parameter distribution.

The plan of the paper is as follows. In the following section, we reviewthe conditional logit model and introduce the mixed logit specification.In Section 3, we introduce quasi-random numbers and describe ourMonte Carlo experiments. We also show the efficiency of the quasi-randomnumbers in this section. Section 4 summarizes the RMSE properties of thepretest estimator based on LM, LR, and Wald tests, and the size-correctedrejection rates of these three tests. Some conclusions and recommendationsare given in Section 5.

2. CONDITIONAL AND MIXED LOGIT MODELS

The conditional logit model is frequently used in applied econometrics.The related choice probability can be computed conveniently withoutmultivariate integration. The IIA assumption of the conditional logit modelis inappropriate in many choice situations, especially for the choices that areclose substitutes. The IIA assumption arises because in logit models theunobserved components of utility have independent and identical Type Iextreme value distributions. This is violated in many cases, such as whenunobserved factors that affect the choice persist over time.

Unlike probit models, the mixed logit model, it is fully flexiblebecause its unobserved utility is not limited to the normal distribution. Itdecomposes the random components of utility into two parts: one havingthe independent, identical Type I extreme value distribution and the otherrepresenting individual tastes, having any distribution. The related utilityassociated with alternative i as evaluated by individual n in the mixed logitmodel is written as

Uni ¼ b0nxni þ �ni (1)

Pretest Estimation in the Random Parameters Logit Model 109

where xni are observed variables for alternative i and individual n, bn is avector of coefficients for individual n varying over individuals in thepopulation with density function f(b), and �ni is an i.i.d. extreme valuerandom component that is independent of bn and xni. If bn is fixed, themixed logit becomes the conditional logit model and the choice probabilityLniðbÞ for individual n choosing alternative i is

LniðbÞ ¼eb0xniP

j

eb0xnj

(2)

In the mixed logit model we specify a distribution for the random coefficientsf ðbjyÞ, where y contains distribution means and variances. These are theparameters to be ultimately estimated. The choice probability is

Pni ¼

Zeb0xniP

i

eb0xni

f ðbjyÞdb ¼Z

LniðbÞf ðbjyÞdb (3)

Hensher and Greene (2003) discuss how to choose an appropriatedistribution for random coefficients. In the following section, we willdescribe how to estimate the unknown parameters ðyÞ and introduce thequasi-Monte Carlo methods. Train (2003) summarizes many aspects ofconditional logit and mixed logit models.

3. QUASI-MONTE CARLO METHODS

3.1. Simulated Log-Likelihood Function

Unlike the conditional logit model, the mixed logit probability cannot becalculated analytically because the related integral does not have a closedform solution. The choice probability can be estimated through simulationand the unknown parameters ðyÞ can be estimated by maximizing thesimulated log-likelihood (SLL) function. With simulation, a value of blabeled br, representing the rth draw, is chosen randomly from a previouslyspecified distribution. The standard logit LniðbÞ in Eq. (2) can be calculatedgiven value br. Repeating this process R times, and the simulated probabilityof individual n choosing alternative i is obtained by averaging Lniðb

rÞ:

�Pni ¼1

R

XRr¼1

LniðbrÞ (4)


The simulated log-likelihood function is

SLLðyÞ ¼XNn¼1

XJi¼1

dni ln �Pni (5)

where dni ¼ 1 if individual n chooses alternative i and dni ¼ 0 otherwise.Each individual is assumed to make choices independently and faces choiceonce.

3.2. The Halton Sequences

The classical Monte Carlo method is used above to estimate the probabilityPni. It reduces the integration problem to the problem of estimating anexpected value on the basis of the strong law of large numbers. In generalterms, the classical Monte Carlo method is described as a numerical methodbased on random sampling using pseudo-random numbers. In terms of thenumber of pseudo-random draws, N, it gives us a probabilistic error bound,or convergence rate OðN�1=2Þ, since there is never any guarantee that theexpected accuracy is achieved in a concrete calculation (Niederreiter, 1992,p. 7). It represents the stochastic character of the classical Monte Carlomethod. For the classical Monte Carlo method the convergence rate of thenumerical integration does not depend on the dimension of the integration.Good estimates, however, require a large number of pseudo-randomnumbers that leads to long computational times.

To reduce the cost of long run times, we can replace the pseudo-randomnumbers with a constructed set of points. The same or even higherestimation accuracy can be reached with fewer points. The essence of thenumber theoretic method (NTM) is to find a set of uniformly scatteredpoints over an s-dimensional unit cube. Such set of points obtained by NTMis usually called a set of quasi-random numbers, or a number theoretic net.Sometimes it can be used in the classical Monte Carlo method to achievea significantly higher accuracy. The difference between the quasi-MonteCarlo method and the classical Monte Carlo method is the quasi-MonteCarlo method uses quasi-random numbers instead of pseudo-randomnumbers. There are several methods to construct the quasi-randomnumbers: here we use the Halton sequences proposed by Halton (1960).Bhat (2001) found the error measures of the estimated parameters weresmaller using 100 Halton draws than using 1,000 random number draws inmixed logit model estimation.


Halton sequences are based on the base-p number system. Any integer ncan be written as

n � nMnM�1 � � � n2n1n0 ¼ n0 þ n1pþ n2p2 þ � � � þ nMpM (6)

where M ¼ ½lognp� ¼ ½ln n= ln p�, square brackets denoting the integral part.The base is p, which can be any integer except 1; ni is the digit at position i,0 � i �M, 0 � ni � p� 1; and pi is the weight of position i. With the basep¼ 10, the integer n¼ 468 has n0¼ 8, n1¼ 6, n2¼ 4.

Using the base-p number system, we can construct one and only onefraction f that is smaller than 1 by writing n with a different base numbersystem and reversing the order of the digits in n. It is called the radicalinverse function and is defined as follows:

f ¼ fpðnÞ ¼ 0 � n0n1n2 � � � nM ¼ n0p�1 þ n1p

�2 þ � � � þ nMp�M�1 (7)

The Halton sequence of length N is developed from the radical inversefunction and the points of the Halton sequence are fpðnÞ for n ¼ 1; 2 . . .Nwhere p is a prime number. The k-dimensional sequence is defined as:

jn ¼ ðfp1ðnÞ;fp2

ðnÞ; . . .fpkðnÞÞ (8)

where p1; p2; . . . ; pk are prime to each other and are always chosen from thefirst k primes to achieve a smaller discrepancy.

In applications, Halton sequences are used to replace random numbergenerators to produce points in the interval [0,1]. The points of the Haltonsequence are generated iteratively. A one-dimensional Halton sequencebased on prime p divides 0–1 interval into p segments. It systematically fillsin the empty space by iteratively dividing each segment into smaller psegments. The position of the points is determined by the base that is usedto construct iteration. A large base implies more points in each iteration,or a long cycle. Due to the high correlation among the initial points ofHalton sequence, the first 10 points of the sequences are usually discardedin applications (Morokoff & Caflisch, 1995; Bratley, Fox, & Niederreiter,1992).

Compared to pseudo-random numbers, the coverage of the points ofHalton sequence are more uniform, since the pseudo-random numbersmay cluster in some areas and leave some areas uncovered. This can be seenfrom Fig. 1, which is similar to a figure from Bhat (2001). Fig. 1(a) is aplot of 200 points taken from a uniform distribution of two dimensionsusing pseudo-random numbers. Fig. 1(b) is a plot of 200 points obtainedby the Halton sequence. The latter scatters more uniformly on the unit


(a)

(b)

Fig. 1. Comparing Pseudo-Random Numbers to Halton Sequences. Note: Fig. 1(a) 200 points psedu-random numbers in

two dimensions. Fig. 1(b) 200 points of two dimension Halton sequence generated with prime 2 and 3.

Pretest

Estim

atio

nin

theRandom

Parameters

LogitModel

113

square than the former. Since the points generated from the Haltonsequences are deterministic points, unlike the classical Monte Carlomethod, quasi-Monte Carlo provides a deterministic error boundinstead of probabilistic error bound. It is called the ‘‘discrepancy’’ inthe literature of NTMs. The smaller the discrepancy, the more evenly thequasi-random numbers spread over the domain. The deterministic errorbound of quasi-Monte Carlo method with the Halton sequences isOðN�1ðlnNÞkÞ (Halton, 1960), which is smaller than the probabilistic errorbound of classical Monte Carlo method OðN�1=2Þ. The shortcoming ofthe Halton sequences is it needs a large number of points to ensure auniform scattering on the domain for large dimensions, usually k � 10.It increases the computational time and also leads to high correlation amonghigher coordinates of the Halton sequences. Higher dimension Haltonsequences can be refined by scrambling their points, which is explored byBhat (2003).

Monte Carlo simulation methods require random samples from variousdistributions. A discrepancy-preserving transformation is often appliedin quasi-Monte Carlo simulation to transform a set of n quasi-randomnumbers fYk ¼ ðYk1; . . .YksÞ; k ¼ 1; . . . ; ng generated from the s-dimen-sional unit cube, with discrepancy d, to a random variable x with anotherstatistical distribution by solving:

xk ¼ ðF�11 ðYk1Þ; . . . ;F

�1s ðYksÞÞ; k ¼ 1; . . . ; n

We achieve the same discrepancy d with respect to FðxÞ, where FðxÞis an increasing continuous multivariate distribution function FðxÞ ¼Fðx1; . . . ; xsÞ ¼ Ps

i¼1FiðxiÞ, and FiðxiÞ is the marginal distribution functionof x (Fang & Wang, 1994, Chapter 4). Due to the faster convergence rateand fewer draws, less computational time is needed.

We apply the Halton sequences with the maximum simulated likeli-hood method to estimate the mixed logit model. How to choose the numberof Halton draws is an issue in application of the Halton sequences. Differentresearchers provide different suggestions. To determine the number ofHalton draws in our experiments, we compare the results of estimatedmixed logit parameters with different sets of Halton draws and pseudo-random numbers. Specifically we compare estimator bias, Monte Carlosampling variability (standard deviations), the average nominal standarderrors, the ratio of average nominal standard errors to the MonteCarlo standard deviations, and the RMSE of random coefficientestimates.


3.3. Monte Carlo Experiment Design

Our experiments are based on a mixed logit model that has no interceptterm, with one or two coefficients that are independent of each other. In ourexperiments, each individual faces four mutually exclusive alternatives onone choice occasion. The utility of individual n choosing alternative i is

Uni ¼ b0nxni þ �ni (9)

The explanatory variables for each individual and each alternative xni aregenerated from independent standard normal distributions. The coefficientsfor each individual bn are generated from the normal distribution Nð �b; �s2bÞ.The values of xni and bn are held fixed over each experiment design. Thechoice probability for each individual is generated by comparing the utilityof each alternative:

Irni ¼1 b0nxni þ �rni4b0nxnj þ �rnj0 Otherwise

(8iaj

(10)

The indicator function Irni ¼ 1 if individual n chooses alternative i, and is 0otherwise. The values of the random errors have i.i.d. extreme value type Idistributions, �rni representing the rth draw. We calculate and compare theutility of each alternative. This process is repeated 1,000 times. The simulatedchoice probability Pni for each individual n choosing alternative i is

Pni ¼1

1; 000

X1;000r¼1

Irni (11)

The dependent variable values yni are determined by these simulated choiceprobabilities. In our experiments, we choose the estimation sample sizeN¼ 200 and generate 999 Monte Carlo samples for the coefficientdistributions with specific means and variances. To test the efficiency ofthe mixed logit estimators using the Halton sequence, we use 25, 100, and250 Halton draws and 1,000 pseudo-random draws to estimate the meanand variance of the coefficient distribution.

3.4. Findings in Simulation Efficiency

In the one coefficient case, the two parameters of interest are �b and �sb.We denote the estimates of these parameters as b and sb. We use SAS 9.2


and NLOGIT Version 4.0 to check and compare the results of our GAUSSprogram. The Monte Carlo experiments were programmed in Gauss 9.0using portions of Ken Train’s posted GAUSS code [http://elsa.berkeley.edu/Btrain/software.html] and use NSAM¼ 999 Monte Carlo samples. Table 1shows the Monte Carlo average of the estimated mixed logit parameters andthe error measures of mixed logit estimates with one random parameter.Specifically,

MC average�b ¼

Xbi=NSAM

MC standard deviation ðs:d:Þ of b ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXðbi �

�bÞ2=ðNSAM� 1Þ

r

Average nominal standard error ðs:e:Þ of b ¼X ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

cvarðbiÞ=NSAM

q

Root mean squared error ðRMSEÞ of b ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXðbi � �bÞ2=NSAM

q

Table 1. The Mixed Logit Model Estimated with Classical Monte Carloand Quasi-Monte Carlo Estimation: One Parameter Model.

Monte Carlo Parameters and Results Classical Monte

Carlo Estimation

Quasi-Monte Carlo

Estimation

�b¼ 1.5a Number of

random draws

Number of Halton draws

�sb¼ 0.8a

1,000

25 100 250

Monte Carlo average value of bi 1.486 1.468 1.477 1.477

Monte Carlo average value of sbi 0.625 0.594 0.606 0.602

Monte Carlo SD of bi 0.234 0.226 0.233 0.232

Monte Carlo SD of sbi 0.363 0.337 0.372 0.375

Average nominal SE of bi 0.240 0.236 0.237 0.237

Average nominal SE of sbi 0.434 0.417 0.447 0.465

Average nominal SECMC SD of bi 1.026 1.044 1.017 1.021

Average nominal SECMC SD of sbi 1.196 1.238 1.202 1.241

RMSE of bi 0.234 0.228 0.234 0.233

RMSE of sbi 0.402 0.395 0.419 0.424

aThe true mean and the standard deviation of the distribution of random coefficient bn.


http://elsa.berkeley.edu/~train/software.html

http://elsa.berkeley.edu/~train/software.html

From Table 1, increases in the number of Halton draws changes errormeasures only by small amounts. The number of Halton draws influencesthe RMSE of mixed logit estimators slightly. The Monte Carlo averagevalue of bi underestimates the true parameter �b ¼ 1:5 by less than 2%.

With Halton draws, the average nominal standard errors of b are only 1%larger than the Monte Carlo standard deviations. The average nominalstandard errors for sb are 20% larger than the Monte Carlo averages. With100 Halton draws, the ratios of average nominal standard errors to MonteCarlo standard errors are closest to 1. Compared to the classical MonteCarlo estimation, our results confirm the findings of Bhat (2001, p. 691).We can reach almost the same RMSE of estimated parameters with only 100Halton draws as compared with 1,000 random draws, and the computa-tional time is considerably reduced with 100 Halton draws. Considering therelatively accurate estimation of the standard errors of b and sb; which areused to construct t-tests, and the acceptable estimation time, we use 100Halton draws to estimate the mixed logit parameters in our Monte Carloexperiments in one-parameter designs.

In Table 2, we use the same error measures and show the Monte Carloaverage values of the estimated mixed logit parameters with two randomcoefficients. The true mean and standard deviation of new independentrandom coefficient distributions are 2.5 and 0.3, respectively. In Table 2,with increases in the number of Halton draws, the percentage changes of theMonte Carlo average values of the estimated mixed logit parameters are nomore than 1%. Unlike the one random coefficient case, the Monte Carloaverage values of estimated means of two independent random coefficientdistributions are overestimated by 10%. However, the biases are stable andnot sensitive to the number of Halton draws. From Table 2, the averagenominal standard errors of b1i and b2i are underestimated and further awayfrom the Monte Carlo standard deviations than in one random coefficientcase. The ratios of the average nominal standard errors to the Monte Carlostandard deviations of the estimated parameters are slightly closer to onewith 100 Halton draws. Using 100 Halton draws also provides smallerRMSE of the estimated parameters. Based on these results, 100 Haltondraws are also used in our two independent random coefficient mixed logitmodel. All of these factors lead us to conclude that increasing the numberof Halton draws in our experiments will not greatly improve the RMSE ofestimated mixed logit parameters. Since the convergence rate of the quasi-Monte Carlo method with Halton sequences in theory is mainly determinedby the structure of the sequences, the simulation error will not considerablydecline with increases in the number of Halton draws for each individual.


4. PRETEST ESTIMATORS

Even though mixed logit model is a highly flexible model, it requires the useof simulation to obtain empirical estimates. It is desirable to have aspecification test to determine whether the mixed logit is needed or not. TheLR and Wald tests are the most popular test procedures used for testingcoefficient significance. The problem is that in order to implement these teststhe mixed logit model must be estimated. It is much faster to implement theLM test. It is interesting and important to examine the power of these threetests for the presence of the random coefficients in the mixed logit model.We use Monte Carlo experiments in the context of one- and two-parameter

Table 2. The Mixed Logit Model Estimated with Classical Monte Carloand Quasi-Monte Carlo Estimation: Two-Parameter Model.

Monte Carlo (MC) Parameter

Values and Results

Classical Monte

Carlo Estimation

Quasi-Monte Carlo

Estimation

�b1¼ 2.5 Number of

random draws

Number of Halton draws

�sb1 ¼ 0.3

1,000

25 100 250

MC average value b1i 2.733 2.754 2.732 2.728

MC average value sb1i 0.332 0.401 0.318 0.302

MC average value b2i 1.674 1.680 1.676 1.672

MC average value sb2i 0.601 0.615 0.605 0.592

MC s.d. b1i 0.491 0.497 0.477 0.490

MC s.d. sb1i 0.428 0.438 0.435 0.448

MC s.d. b2i 0.327 0.325 0.316 0.323

MC s.d. sb2i 0.439 0.423 0.430 0.447

Average nominal s.e. b1i 0.445 0.450 0.445 0.443

Average nominal s.e. sb1i 0.737 0.678 0.772 0.833

Average nominal s.e. b2i 0.298 0.300 0.297 0.297

Average nominal s.e. sb2i 0.512 0.494 0.499 0.537

Average nominal s.e./MC s.d. b1i 0.907 0.906 0.933 0.904

Average nominal s.e./MC s.d. sb1i 1.721 1.548 1.776 1.859

Average nominal s.e./MC s.d. b2i 0.912 0.923 0.940 0.919

Average nominal s.e./MC s.d. sb2i 1.167 1.168 1.160 1.202

RMSE b1i 0.543 0.558 0.531 0.540

RMSE sb1i 0.429 0.449 0.435 0.448

RMSE b2i 0.370 0.371 0.361 0.366

RMSE sb2i 0.481 0.461 0.472 0.492


choice models with four alternatives to examine the properties of pretestestimators in the random parameters logit model with LR, LM, and Waldtests.

In our experiments, the LR, Wald, and LM tests are constructed basedon the null hypothesis H0 : sb ¼ 0 with alternative hypothesis H1 : sb40.With this one-tail test, when the null hypothesis is true, the parameter willlie on the boundary of the parameter space. The asymptotic distribution ofthe estimator is complex, since it does not include a neighborhood of zero.The standard theory of the LR, Wald, and LM tests assumes that the trueparameter lies inside an open set in the parameter space. Thus, the LR,Wald, and LM statistics do not have the usual chi-square asymptoticdistribution. Under the null hypothesis, Gourieroux and Monfort (1995)and Andrews (2001) have shown that the asymptotic distribution of the LRand Wald statistics is a mixture of chi-square distributions. In ourexperiments, we use their results to analyze the power of the LR, LM,and Wald tests.

4.1. One-Parameter Model Results

In the one random parameter model, we use four different values for theparameter mean, �b ¼ f0:5; 1:5; 2:5; 3:0g. Corresponding to each value ofthe mean �b, we use six different values for the standard deviation ofthe parameter distribution, �sb ¼ f0; 0:15; 0:3; 0:8; 1:2; 1:8g. The restrictedand unrestricted estimates come from the conditional logit and mixed logitmodels, respectively. The LR, Wald, and LM tests are constructed basedon the null hypothesis H0 : sb ¼ 0 against the alternative hypothesisH1 : sb40. The inverse of information matrix in the Wald and LM testsis estimated using the outer product of gradients.

Fig. 2 shows the ratio of pretest estimator RMSE of �b relative to therandom parameters logit model estimator RMSE of �b using the LR, Wald,and LM tests at a 25% significance level. We choose a 25% significance levelbecause 5% pretests are not optimal in many settings, and this is also true inour experiments. Under the one-tail alternative hypothesis, the distributionof LR and Wald chi-square test statistics has a mixture of chi-squaredistributions. In the one-parameter case, the (1�2a)-quantile of a standardchi-square is the critical value for significance level a (Gourieroux &Monfort, 1995, p. 265). For the 25% significance level the critical value is0.455. Fig. 2 shows that the pretest estimators based on the LR and Waldstatistics have RMSE that is less than that of the random parameters logit


Fig. 2. Pretest Estimator RMSE �bCMixed Logit Estimator RMSE �b: One

Random Parameter Model.


model when the parameter variance is small, but that RMSE is larger thanthat of the random parameters logit model over the remaining parameterspace. The LR and Wald tests exhibit properties of consistent tests, with thepower approaching one as the specification error increases, so that thepretest estimator is consistent. The ratios of LM-based pretest estimatorRMSE of �b to that RMSE of the random parameters logit model rise andbecome further away from one with increases in the standard deviation ofthe parameter distribution. The poor properties of the LM-based pretestestimator arise from the poor power of the LM test in our experiments. It isinteresting that even though the pretest estimator based on the LR andWald statistics are consistent, the maximum risk ratio based on the LRand Wald tests increases in the parameter mean �b. The range over whichthe risk ratio is less than one also increases in the mean of the parameterdistribution �b.

To explore the power of the three tests for the presence of a randomcoefficient in the mixed logit model further, we calculate the empirical 90thand 95th percentile value of the LR, Wald, and LM statistics given thedifferent combinations of means and standard deviations of the parameterdistribution in the one random parameter model. The results in Table 3show that the Monte Carlo 90th and 95th percentile values of the three testschange with the changes in the mean and standard deviation of parameterdistribution. In general, the Monte Carlo critical values with differentparameter means are neither close to 1.64 and 2.71 (the (1�2a)-quantile ofstandard chi-square statistics for 10 and 5% significance level, respectively)nor to the usual critical values 2.71 and 3.84. When �b ¼ 0:5 and �sb ¼ 0,the 90th and 95th empirical percentiles of LR, Wald, and LM in ourexperiments both are greater than the asymptotic critical values 1.64and 2.71. With increases in the true standard deviation of the coefficientdistribution, the 90th and 95th empirical percentiles increase for the LR andWald statistics, indicating that these tests will have some power in choosingthe correct model with random coefficients. The corresponding percentilevalues based on the LM statistics decline, meaning that the LM test hasdeclining power. An interesting feature of Table 3 is that most Monte Carlocritical values based on the LR and Wald statistics decrease in the mean ofcoefficient distribution �b.

The results based on the empirical percentiles of the LR, Wald, and LMstatistics imply the rejection rates of the three tests will vary dependingon the mean and standard deviation of the parameter distribution. To getthe rejection rate for the three tests, we choose the ‘‘corrected’’ chi-squarecritical values 1.64 and 2.71 for 10 and 5% significance levels with one


degree of freedom. Table 4 provides the percentage of rejecting the nullhypothesis sb ¼ 0 using critical value 1.64 and 2.71. When the nullhypothesis is true, most empirical percentage rates of LR test rejecting thetrue null hypothesis are less than the nominal rejection rates 10 and 5%,and become further away from the nominal rejection rates with increasesin the parameter mean �b: All empirical rejection rates of Wald andLM tests given a true null hypothesis are greater than the relatedexpected percentage rates. When the number of Monte Carlo samples isincreased to 9,999, the results are essentially unchanged. For the case in

Table 3. 90th and 95th Empirical Percentiles of Likelihood Ratio,Wald, and Lagrange Multiplier Tests: One Random Parameter Model.

�b �ab LR 90th LR 95th Wald 90th Wald 95th LM 90th LM 95th

0.5 0.00 1.927 3.267 4.006 5.917 2.628 3.576

0.5 0.15 1.749 2.755 3.850 5.425 2.749 3.862

0.5 0.30 2.239 3.420 4.722 6.210 2.594 3.544

0.5 0.80 6.044 7.779 9.605 11.014 2.155 3.043

0.5 1.20 12.940 15.684 14.472 15.574 1.712 2.344

0.5 1.80 26.703 31.347 19.225 19.950 1.494 2.041

1.5 0.00 1.518 2.668 3.671 5.672 2.762 3.972

1.5 0.15 1.541 2.414 3.661 5.443 3.020 4.158

1.5 0.30 1.837 3.364 4.361 6.578 3.048 4.308

1.5 0.80 5.753 7.451 8.603 10.424 2.496 3.489

1.5 1.20 11.604 13.953 12.930 13.974 1.825 2.376

1.5 1.80 24.684 28.374 17.680 18.455 1.346 1.947

2.5 0.00 0.980 1.727 2.581 4.017 2.978 4.147

2.5 0.15 1.020 1.858 2.598 4.256 2.976 4.317

2.5 0.30 1.217 2.235 2.751 4.616 3.035 4.429

2.5 0.80 2.766 4.667 6.387 8.407 3.119 4.315

2.5 1.20 6.321 8.643 9.700 11.598 2.714 3.832

2.5 1.80 18.018 20.828 14.895 15.822 2.189 3.275

3.0 0.00 1.042 1.720 2.691 4.264 3.455 4.594

3.0 0.15 1.040 1.941 2.548 4.878 3.285 4.441

3.0 0.30 1.260 2.114 3.068 5.124 3.164 4.324

3.0 0.80 2.356 3.167 4.915 7.106 3.073 4.198

3.0 1.20 4.610 6.570 8.086 10.296 2.917 4.224

3.0 1.80 13.261 15.622 12.960 14.052 2.579 3.478

Note: Testing H0 : sb¼ 0; one-tail critical values are 1.64 (10%) and 2.71 (5%) compared to the

usual values 2.71 and 3.84, respectively.


which �b ¼ 0:5; the rejection rate of the 10% LR test is 9.6%, and the 5% testrejects 5% of the time. As the parameter mean �b increases, we again see thepercentage rejections decline. The Wald and LM test performance isrelatively the same.

Fig. 3 contains graphs based on the results of Table 4. From Fig. 3, wecan see the changes in the rejection rates of these three test statistics withincreases in the mean and standard deviation of the parameter distribution,respectively. We find the rejection frequency of the LR and Wald statisticsdeclines in the mean of the parameter distribution.

Table 4. Rejection Rate of Likelihood Ratio, Wald, and LagrangeMultiplier Tests: One Random Parameter Model.

�b �sb s:e:ðbÞa s:e:ðsbÞa LR 10%b LR 5%b Wald 10%b Wald 5%b LM 10%b LM 5%b

0.5 0.00 0.123 0.454 0.122 0.065 0.219 0.155 0.204 0.095

0.5 0.15 0.125 0.461 0.113 0.051 0.233 0.164 0.200 0.101

0.5 0.30 0.125 0.460 0.143 0.072 0.281 0.214 0.184 0.093

0.5 0.80 0.135 0.416 0.472 0.348 0.665 0.587 0.161 0.061

0.5 1.20 0.153 0.391 0.816 0.722 0.916 0.882 0.109 0.036

0.5 1.80 0.195 0.438 0.996 0.989 1.000 0.999 0.084 0.021

1.5 0.00 0.242 0.593 0.092 0.048 0.199 0.139 0.215 0.102

1.5 0.15 0.243 0.586 0.090 0.042 0.215 0.148 0.225 0.116

1.5 0.30 0.243 0.567 0.115 0.068 0.236 0.160 0.233 0.119

1.5 0.80 0.247 0.439 0.390 0.264 0.582 0.461 0.184 0.083

1.5 1.20 0.261 0.391 0.777 0.659 0.897 0.816 0.116 0.037

1.5 1.80 0.291 0.443 0.995 0.990 0.999 0.996 0.075 0.016

2.5 0.00 0.416 0.910 0.058 0.022 0.143 0.090 0.216 0.111

2.5 0.15 0.416 0.889 0.064 0.023 0.146 0.095 0.221 0.122

2.5 0.30 0.410 0.853 0.070 0.031 0.159 0.101 0.221 0.119

2.5 0.80 0.392 0.714 0.176 0.106 0.335 0.235 0.229 0.121

2.5 1.20 0.392 0.537 0.471 0.342 0.641 0.539 0.221 0.100

2.5 1.80 0.412 0.453 0.949 0.898 0.985 0.959 0.166 0.068

3.0 0.00 0.519 1.131 0.052 0.028 0.139 0.099 0.229 0.140

3.0 0.15 0.508 1.062 0.060 0.026 0.140 0.096 0.248 0.128

3.0 0.30 0.514 0.975 0.076 0.030 0.162 0.113 0.237 0.130

3.0 0.80 0.489 0.910 0.135 0.074 0.256 0.190 0.226 0.117

3.0 1.20 0.478 0.701 0.304 0.199 0.465 0.389 0.221 0.114

3.0 1.80 0.479 0.505 0.808 0.714 0.909 0.858 0.217 0.095

aThe average nominal standard error of estimated mean and standard deviation of the random

coefficient distribution.bTesting H0 :sb¼ 0; one-tail critical values are 1.64 (10%) and 2.71 (5%).


Due to the different sizes of the three tests, power comparisons areinvalid. We use the Monte Carlo percentile values for each combination ofparameter mean and standard deviation as the critical value to correct thesize of the three tests. Table 5 provides the size-corrected rejection rates forthe three tests. The size-corrected rejection rates for the LR and Wald testsincrease in the standard deviation of the coefficient distribution as expected.

Fig. 3. The Rejection Rate of LR, Wald, and LM Tests: One Random Parameter

Model. Note: Testing H0 :sb¼ 0; one-tail critical values are 1.64 (10%) and 2.71 (5%).


Based on these results, there is not too much difference between these twosize-corrected tests. The power of these two tests declines with increasesin the parameter mean. In our experiments, at the 10 and 5% significancelevels, the LM test shows the lowest power for the presence of the randomcoefficient among the three tests. Graphs in Fig. 4 are based on the resultsof Table 5. After adjusting the size of the test, the power of LR testdeclines slowly in the parameter mean. The results of the power of thesethree tests are consistent with the results of pretest estimators based on thesethree tests.

Table 5. Size Corrected Rejection rates of LR, Wald, and LM Tests:One Random Parameter Model.

�b �sb LR 10% LR 5% Wald 10% Wald 5% LM 10% LM 5%

0.5 0.00 0.100 0.050 0.100 0.050 0.100 0.050

0.5 0.15 0.094 0.035 0.093 0.036 0.108 0.060

0.5 0.30 0.121 0.055 0.123 0.056 0.099 0.049

0.5 0.80 0.431 0.287 0.498 0.336 0.066 0.028

0.5 1.20 0.792 0.676 0.834 0.746 0.040 0.016

0.5 1.80 0.995 0.980 0.999 0.991 0.022 0.005

1.5 0.00 0.100 0.050 0.100 0.050 0.100 0.050

1.5 0.15 0.100 0.043 0.098 0.047 0.112 0.056

1.5 0.30 0.124 0.068 0.124 0.067 0.115 0.058

1.5 0.80 0.407 0.269 0.383 0.240 0.078 0.031

1.5 1.20 0.788 0.663 0.758 0.616 0.035 0.014

1.5 1.80 0.995 0.990 0.995 0.988 0.011 0.005

2.5 0.00 0.100 0.050 0.100 0.050 0.100 0.050

2.5 0.15 0.101 0.060 0.100 0.056 0.099 0.052

2.5 0.30 0.119 0.069 0.110 0.065 0.103 0.057

2.5 0.80 0.256 0.166 0.242 0.173 0.104 0.051

2.5 1.20 0.565 0.460 0.544 0.444 0.082 0.037

2.5 1.80 0.971 0.942 0.961 0.931 0.062 0.022

3.0 0.00 0.100 0.050 0.100 0.050 0.100 0.050

3.0 0.15 0.099 0.058 0.096 0.059 0.089 0.046

3.0 0.30 0.120 0.071 0.114 0.080 0.083 0.042

3.0 0.80 0.197 0.133 0.192 0.121 0.079 0.042

3.0 1.20 0.403 0.294 0.392 0.282 0.072 0.041

3.0 1.80 0.873 0.803 0.859 0.764 0.051 0.031

Note: Testing H0 : sb¼ 0 using Monte Carlo percentile values as the critical values to adjust the

size the LR, Wald, and LM tests.


4.2. Two-Parameter Model Results

We expand the model to two parameters. The mean and standard deviationof the added random parameter b2 are set as 1.5 and 0.8, respectively.We use four different values for the first parameter mean, �b1 ¼ f0:5; 1:5;2:5; 3:0g. For each value of the mean �b1, we use six different values for the

Fig. 4. The Size-Corrected Rejection Rates: One Random Parameter Model.


standard deviation, �sb1 ¼ f0; 0:15; 0:3; 0:8; 1:2; 1:8g. In the two-parametermodel, the LR, Wald, and LM tests are constructed based on the joint nullhypothesis H0 : sb1 ¼ 0 and sb2 ¼ 0 against the alternative hypothesis H1 :sb140 or sb240 or sb140; and sb240. Fig. 5 shows the ratios of the pretest

Fig. 5. Pretest Estimator RMSE �bCMixed Logit Estimator RMSE �b: Two

Random Parameter Model. RMSE of b ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPðb1 � �b1Þ

2þPðb2 � �b2Þ

2� �

=NSAM;

r

where NSAM¼ 999.


estimator RMSE of �b1 and �b2 to the random parameters logit modelestimator RMSE of �b1 and �b2 based on the joint LR, Wald, and LM tests ata 25% significance level. Here we use the standard chi-square as the criticalvalue for 25% significance level, 2.773. The joint LR and Wald tests showproperties of consistent tests. The maximum risk ratio based on the jointLR and Wald tests still increases in the parameter mean �b1. In the two-parameter model, the pretest estimators based on the joint LR and Waldstatistics have larger RMSE than that of the random parameters logitmodel. The properties of the joint LM-based pretest estimator are also poorin the two-parameter model. Table 6 reports the 90th and 95th empiricalpercentiles of the joint LR, Wald and LM tests. They are different withdifferent combinations of means and standard deviations. All the empirical

Table 6. 90th and 95th Empirical Percentiles of Likelihood Ratio,Wald, and Lagrange Multiplier Tests: Two Random Parameter Model.

�b1 �sb1 �b2 �sb2 LR 90th LR 95th Wald 90th Wald 95th LM 90th LM 95th

0.5 0.00 1.5 0.8 14.634 17.899 13.493 14.647 4.033 5.196

0.5 0.15 1.5 0.8 13.583 17.001 13.148 14.118 4.164 5.242

0.5 0.30 1.5 0.8 13.504 16.043 13.060 14.156 4.208 5.420

0.5 0.80 1.5 0.8 14.961 17.867 12.496 13.157 4.052 5.062

0.5 1.20 1.5 0.8 19.940 23.966 13.536 14.305 4.168 5.215

0.5 1.80 1.5 0.8 29.429 32.083 15.208 16.081 3.989 5.218

1.5 0.00 1.5 0.8 14.109 16.844 12.638 14.074 6.329 8.105

1.5 0.15 1.5 0.8 12.645 15.466 11.961 13.448 5.991 7.689

1.5 0.30 1.5 0.8 11.955 14.415 11.498 12.641 5.881 7.444

1.5 0.80 1.5 0.8 12.341 14.569 11.022 12.017 4.480 5.601

1.5 1.20 1.5 0.8 15.529 17.472 11.760 12.860 4.478 5.699

1.5 1.80 1.5 0.8 22.300 25.700 13.321 14.155 4.682 5.639

2.5 0.00 1.5 0.8 11.315 13.966 10.161 11.439 5.094 6.275

2.5 0.15 1.5 0.8 10.449 13.120 9.820 11.137 4.920 6.368

2.5 0.30 1.5 0.8 9.998 12.437 9.707 10.986 5.051 6.230

2.5 0.80 1.5 0.8 10.388 12.690 9.554 10.657 4.714 6.092

2.5 1.20 1.5 0.8 14.168 17.001 10.527 11.433 4.552 5.829

2.5 1.80 1.5 0.8 21.625 24.694 12.815 13.704 4.994 6.248

3.0 0.00 1.5 0.8 9.713 12.354 8.905 10.552 4.528 5.729

3.0 0.15 1.5 0.8 9.185 11.450 8.493 10.215 4.434 5.923

3.0 0.30 1.5 0.8 8.384 10.388 8.262 9.7540 4.245 5.418

3.0 0.80 1.5 0.8 8.219 10.083 8.499 10.010 4.486 5.716

3.0 1.20 1.5 0.8 13.704 15.917 10.058 10.967 4.972 6.353

3.0 1.80 1.5 0.8 20.939 23.476 12.454 13.282 5.273 6.544


90th and 95th percentile values of the joint LR and Wald tests are muchgreater than the related standard chi-square statistics 4.605 and 5.991. TheMonte Carlo empirical percentiles of the joint LM test are also not close tothe standard chi-square statistics. Since the weighted chi-square statistics areeven smaller than the standard chi-square statistics, we choose the standardones to find the rejection rate of the three tests. These results change verylittle when the number of Monte Carlo samples is increased to 9,999. Table 7shows the rejection rates of the three joint tests based on the standard chi-square statistics for 10 and 5% significance level. The results are consistentwith Table 6. When the null hypothesis is true, the joint LR and Wald testsreject the true null hypothesis more frequently than the nominal rejectionrates 10 and 5%. They become closer to the nominal rejection rates with

Table 7. Rejection Rate of Likelihood Ratio, Wald, and LagrangeMultiplier Tests: Two Random Parameter Model.

�b1 �sb1 �b2 �sb2 LR 10% LR 5% Wald 10% Wald 5% LM 10% LM 5%

0.5 0.00 1.5 0.8 0.719 0.594 0.890 0.824 0.069 0.031

0.5 0.15 1.5 0.8 0.681 0.563 0.880 0.781 0.077 0.033

0.5 0.30 1.5 0.8 0.668 0.534 0.876 0.779 0.083 0.031

0.5 0.80 1.5 0.8 0.749 0.631 0.920 0.823 0.070 0.032

0.5 1.20 1.5 0.8 0.949 0.892 0.985 0.960 0.077 0.033

0.5 1.80 1.5 0.8 0.999 0.992 1.000 0.997 0.077 0.030

1.5 0.00 1.5 0.8 0.600 0.476 0.762 0.632 0.205 0.114

1.5 0.15 1.5 0.8 0.563 0.430 0.728 0.603 0.191 0.099

1.5 0.30 1.5 0.8 0.520 0.386 0.705 0.561 0.176 0.096

1.5 0.80 1.5 0.8 0.620 0.482 0.783 0.640 0.092 0.039

1.5 1.20 1.5 0.8 0.796 0.672 0.914 0.816 0.093 0.035

1.5 1.80 1.5 0.8 0.969 0.936 0.995 0.980 0.105 0.035

2.5 0.00 1.5 0.8 0.492 0.381 0.631 0.462 0.127 0.059

2.5 0.15 1.5 0.8 0.451 0.327 0.589 0.427 0.116 0.059

2.5 0.30 1.5 0.8 0.388 0.284 0.540 0.352 0.126 0.061

2.5 0.80 1.5 0.8 0.428 0.317 0.576 0.429 0.105 0.053

2.5 1.20 1.5 0.8 0.755 0.628 0.837 0.718 0.097 0.044

2.5 1.80 1.5 0.8 0.963 0.928 0.982 0.954 0.131 0.058

3.0 0.00 1.5 0.8 0.411 0.291 0.502 0.332 0.094 0.039

3.0 0.15 1.5 0.8 0.374 0.253 0.481 0.293 0.092 0.048

3.0 0.30 1.5 0.8 0.333 0.223 0.436 0.272 0.078 0.032

3.0 0.80 1.5 0.8 0.284 0.188 0.400 0.244 0.088 0.042

3.0 1.20 1.5 0.8 0.623 0.528 0.747 0.608 0.119 0.059

3.0 1.80 1.5 0.8 0.965 0.913 0.982 0.939 0.137 0.067


increases in the parameter mean �b1. When �b1 ¼ 0:5 and 3:0, the jointLM test rejecting the true null hypothesis is less than the nominal rejectionrates. However, with �b1 ¼ 1:5 and 2:5, it rejects more frequently than thenominal rejection rates 10 and 5%. Fig. 6 shows the graphs based on theresults of Table 7. They almost have the same trends as in the one-parameter

Fig. 6. The Rejection Rate of LR, Wald, and LM Tests: Two Random Parameter

Model.


case. The rejection frequency of the joint LR and Wald statistics decreases inthe mean of the parameter distribution �b1.

To compare the power of the three joint tests in the two-parameter case,we also correct the size of the three joint tests using the Monte Carloempirical critical values for 10 and 5% significance level. Table 8 providesthe size-corrected rejection rates for the three joint tests. Fig. 7 presents thegraphs based on Table 8. As in the one-parameter case, the joint LM testshows the weakest power for the presence of the random coefficient. Thepower of the joint LR and Wald tests decreases when the parameter mean �b1increases from 0.5 to 1.5. However, the power of these two joint testsincreases when the parameter mean �b1 increases further to 3.0.

Table 8. The Size Corrected Rejection Rates of LR, Wald, and LMTests: Two Random Parameter Model.

�b1 �sb1 �b2 �sb2 LR 10% LR 5% Wald 10% Wald 5% LM 10% LM 5%

0.5 0.00 1.5 0.8 0.100 0.050 0.100 0.050 0.100 0.050

0.5 0.15 1.5 0.8 0.076 0.034 0.077 0.032 0.110 0.053

0.5 0.30 1.5 0.8 0.079 0.037 0.077 0.031 0.113 0.060

0.5 0.80 1.5 0.8 0.108 0.049 0.036 0.010 0.100 0.046

0.5 1.20 1.5 0.8 0.344 0.185 0.105 0.040 0.106 0.050

0.5 1.80 1.5 0.8 0.788 0.618 0.318 0.148 0.099 0.050

1.5 0.00 1.5 0.8 0.100 0.050 0.100 0.050 0.100 0.050

1.5 0.15 1.5 0.8 0.065 0.036 0.074 0.032 0.088 0.040

1.5 0.30 1.5 0.8 0.054 0.028 0.050 0.013 0.086 0.035

1.5 0.80 1.5 0.8 0.059 0.021 0.034 0.010 0.034 0.008

1.5 1.20 1.5 0.8 0.145 0.060 0.058 0.011 0.028 0.010

1.5 1.80 1.5 0.8 0.446 0.287 0.163 0.052 0.028 0.007

2.5 0.00 1.5 0.8 0.100 0.050 0.100 0.050 0.100 0.050

2.5 0.15 1.5 0.8 0.074 0.035 0.086 0.042 0.090 0.053

2.5 0.30 1.5 0.8 0.071 0.027 0.080 0.042 0.098 0.048

2.5 0.80 1.5 0.8 0.083 0.029 0.066 0.027 0.075 0.040

2.5 1.20 1.5 0.8 0.214 0.105 0.127 0.049 0.076 0.040

2.5 1.80 1.5 0.8 0.609 0.422 0.447 0.235 0.092 0.049

3.0 0.00 1.5 0.8 0.100 0.050 0.100 0.050 0.100 0.050

3.0 0.15 1.5 0.8 0.088 0.036 0.085 0.044 0.094 0.056

3.0 0.30 1.5 0.8 0.069 0.024 0.077 0.027 0.081 0.041

3.0 0.80 1.5 0.8 0.056 0.023 0.081 0.035 0.094 0.048

3.0 1.20 1.5 0.8 0.275 0.151 0.229 0.064 0.124 0.064

3.0 1.80 1.5 0.8 0.720 0.535 0.547 0.302 0.145 0.080


5. CONCLUSIONS AND DISCUSSION

Our first finding confirmed earlier research by Bhat (2001) that usingstandard metrics there is no reason to use pseudo-random sampling tocompute the required integral for maximum simulated likelihood estimation

Fig. 7. The Size-Corrected Rejection Rates: Two Random Parameter Model.


of the mixed logit model. It appears that estimation is as accurate with 100Halton draws as with 1,000 pseudo-random draws.

There are two major findings regarding testing for the presence of randomparameters from our Monte Carlo experiments, neither of which weanticipated. First, the LM test should not be used in the random parameterslogit model to test the null hypothesis that the parameters are randomlydistributed across the population, rather than being fixed populationparameters. In the one-parameter model Monte Carlo experiment, the sizeof the LM test is approximately double the nominal level of Type I error.Then, the rejection rate decreases as the degree of the specification error rises,which is in direct contrast to the properties of a consistent test. This is themost troubling and disappointing finding, as the LM test is completed in afraction of a second, while LR and Wald tests requiring estimation of themixed logit model are time consuming to estimate even with a limitednumber of Halton draws. This outcome resulted despite our use of the nowwell-established adjusted chi-square critical value for one-tail tests on theboundary of a parameter space. This outcome is also not due toprogramming errors on our part, as our GAUSS code produces estimatesand LM test statistic values that are the same, allowing for convergencecriteria differences, as those produced by NLOGIT 4.0. In the one-parameterproblem the LR test had size close to the nominal level, while the Wald testrejected the true null hypothesis at about twice the nominal level.

Our second finding is that LR and Wald test performance depends on the‘‘signal-to-noise’’ ratio, that is, the ratio of the mean of the randomparameter distribution relative to its standard deviation. When this ratio islarger, the LR and Wald tests reject less frequently the null hypothesis thatthe parameter is fixed rather than random. Upon reflection, this makesperfect sense. When the parameter mean is large relative to its standarddeviation then the tests will have less ability to distinguish between randomand fixed parameters. The ‘‘skinny’’ density function of the populationparameter looks like a ‘‘spike’’ to the data. When the ratio of the mean ofthe random parameter distribution relative to its standard deviation is large,it matters less whether one chooses conditional logit or mixed logit, from thepoint of view of estimating the population-mean parameter. This shows upin lower size-corrected power for the LR and Wald tests when signal is largerelative to noise. It also shows up in the risk of the pretest estimator relativeto that of the mixed logit estimator. For the portion of the parameter spacewhere the relative risk is greater than one, as the signal increases relative tonoise the relative risk function increases, indicating that pretesting is a lesspreferred strategy.


In the one-parameter case, the LR test is preferred overall. For thecases when the signal-to-noise ratio is not large, the empirical criticalvalues, under the null, are at least somewhat close to the one-tail criticalvalues 1.64 (10%) and 2.71 (5%) from the mixture of chi-squaredistributions. When the signal-to-noise ratio increases, the similaritybetween the theoretically justified critical values and the test statisticpercentiles becomes less clear. The Wald test statistic percentiles are not asclose to the theoretically true values as for the LR test statistics. The LMtest statistic percentiles under the null are between those of the LR and Waldtest statistic distributions, but not encouragingly close to the theoreticallytrue values.

In the two random parameter case, we vary the value of one standarddeviation parameter, starting from 0, while keeping the other standarddeviation parameter fixed at a nonzero parameter. Thus, we do not observethe rejection rates of the test statistics under the null that both are zero.We observe, however, that the empirical percentiles of the LR andWald test statistics when one standard deviation is zero are far greaterthan the w2ð2Þ percentile values 4.605 (10%) and 5.991 (5%). The rejectionrates of these two tests, under the null, using these two critical values aregreater than 60% when signal-to-noise is lower, and falls to a rejection rateof more than 30% when signal-to-noise ratio is larger. Once again therejection rate profile of the LM test is flat, indicating that it is notmore likelyto reject the null hypothesis at larger parameter standard deviation values.The ‘‘size-corrected’’ rejection rates are not strictly correct. In them weobserve that the LR and Wald tests reject at a higher rate at higher signal-to-noise ratios. Further, in the two-parameter case the relative risk of thepretest estimators based on the LR and Wald test statistics are alwaysgreater than one. The pretesting strategy is not to be recommended underour Monte Carlo design.

Interesting questions arising from the Monte Carlo experiment results are:(1) why does the power of LR and Wald tests for the presence of the randomcoefficient decline in the parameter mean and (2) how can we refine the LMtest in the setting of the random parameters logit model? The LM test isdeveloped by Aitchison and Silvey (1958) and Silvey (1959) in associationwith the constrained estimation problem. In our setting, the Lagrangianfunction is

lnLðyÞ þ l0ðcðyÞ � qÞ


where lnLðyÞ is the log-likelihood function, which subject to the constraintsðcðyÞ � qÞ ¼ 0. The related first-order conditions are

@ lnLðyÞ@y

þ@cðyÞ@y

l ¼ 0

cðyÞ � q ¼ 0

8<:

Under the standard assumptions of the LM test, we knowffiffiffinpðy� yÞ � Nð0; IðyÞ�1Þ

and

n�1=2l � N 0;@cðyÞ@y0

IðyÞ�1@c0ðyÞ@y

� ��

Based on the first-order conditions of the Lagrangian function, we have

l0 @cðyÞ

@y0 IðyÞ�1

@c0ðyÞ

@yl ¼

@ lnLðyÞ

@y0 IðyÞ�1

@ lnLðyÞ@y

From the above results, the LM statistics has the asymptotic chi-squaredistribution. The asymptotic distribution of the LM statistic is derived fromthe distribution of Lagrange multipliers, which is essentially based on theasymptotic normality of the score vector. In the Lagrangian function, thelog-likelihood function is subject to the equality constraints. The weakpower of the LM test for the presence of the random coefficient is caused bythe failure of taking into account the properties of the one-tailed alternativehypothesis. Gourieroux, Holly and Monfort (1982) and Gourieroux andMonfort (1995) extended the LM test to a Kuhn–Tucker multiplier test andshowed that it is asymptotically equivalent to the LR and Wald tests.However, computing the Kuhn–Tucker multiplier test is complicated. In theKuhn–Tucker multiplier test a duality problem replaces the two optimiza-tion problems with inequality and equality constraints, which is shown asfollows:

minl1

nðl� l

0Þ0 @gðy

0Þ

@y0Iðy

0Þ�1 @g

0ðy0Þ

@yðl� l

0Þ; subject to l � 0

where y0and l

0are the equality constrained estimators. Compared to the

standard LM test, the Kuhn–Tucker multiplier test uses ðl� l0Þ to adjust


the estimated Lagrange multipliers l0. How to refine the LM test in the

random parameters logit model is our future research.

ACKNOWLEDGMENT

We thank Bill Greene, Tom Fomby, and two referees for their comments.All errors are ours.

REFERENCES

Aitchison, J., & Silvey, S. D. (1958). Maximum likelihood estimation of parameters subject to

restraints. Annals of Mathematical Statistics, 29, 813–828.

Andrews, D. W. K. (2001). Testing when a parameter is on the boundary of the maintained

hypothesis. Econometrica, 69(3), 683–734.


multinomial logit model. Transportation Research Part B, 35, 677–693.


and scrambled Halton sequences. Transportation Research Part B, 37(9), 837–855.

Bratley, P., Fox, B. L., & Niederreiter, H. (1992). Implementation and tests of low-discrepancy

sequences. ACM Transactions on Modeling and Computer Simulation, 2, 195–213.

Fang, K., & Wang, Y. (1994). Number-theoretic methods in statistics. London: Chapman and

Hall/CRC.

Gourieroux, C., Holly, A., & Monfort, A. (1982). Likelihood ratio test, Wald test, and Kuhn-

Tucker test in linear models with inequality constraints on the regression parameters.


Gourieroux, C., & Monfort, A. (1995). Statistics and econometric models. Cambridge:

Cambridge University Press.

Halton, J. H. (1960). On the efficiency of certain quasi-random sequences of points in

evaluating multi-dimensional integrals. Numerishe Mathematik, 2, 84–90.

Hensher, D., & Greene, W. (2003). The mixed logit model: The state of practice. Transportation,

30(2), 133–176.

Morokoff, W. J., & Caflisch, R. E. C. (1995). Quasi-Monte Carlo integration. Journal of

Computational Physics, 122, 218–230.

Niederreiter, H. (1992). Random number generation and quasi-Monte Carlo methods.

Philadelphia: Society for Industrial Mathematics.

Silvey, S. D. (1959). The Lagrangian multiplier test. Annals of Mathematical Statistics, 30,

389–407.

Train, K. E. (2003). Discrete choice methods with simulation. Cambridge: Cambridge University

Press.


SIMULATED MAXIMUM

LIKELIHOOD ESTIMATION OF

CONTINUOUS TIME STOCHASTIC

VOLATILITY MODELS$

Tore Selland Kleppe, Jun Yu and H. J. Skaug

ABSTRACT

In this chapter we develop and implement a method for maximumsimulated likelihood estimation of the continuous time stochastic volatilitymodel with the constant elasticity of volatility. The approach does notrequire observations on option prices, nor volatility. To integrate outlatent volatility from the joint density of return and volatility, a modifiedefficient importance sampling technique is used after the continuous timemodel is approximated using the Euler–Maruyama scheme. The MonteCarlo studies show that the method works well and the empiricalapplications illustrate usefulness of the method. Empirical results providestrong evidence against the Heston model.

$Kleppe gratefully acknowledges the hospitality during his research visit to Sim Kee Boon

Institute for Financial Economics at Singapore Management University. Yu gratefully

acknowledges support from the Singapore Ministry of Education AcRF Tier 2 fund under

Grant No. T206B4301-RS. We wish to thank two anonymous referees for their helpful

comments.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)0000026009

137

dx.doi.org/10.1108/S0731-9053(2010)0000026009

1. INTRODUCTION

Continuous time stochastic volatility (SV) models have been provend to bevery useful from various aspects. For example, it has been found that SVmodels provide a valuable tool for pricing contingent claims. Seminalcontributions include Wiggins (1987), Hull and White (1987), Heston(1993), and Duffie, Pan, and Singleton (2000). See Bakshi, Cao, and Chen(1997) for an empirical analysis of the SV models. For another example, SVmodels have been proved successful to describe the time series behavior offinancial variables. Important contributions include Andersen and Lund(1997) and Jones (2003). Unfortunately, maximum likelihood estimation(MLE) of continuous time SV models poses substantial challenges. The firstchallenge lies in the fact that the joint transition density of price (or return)and volatility is typically unknown in closed form. This is the well-knownproblem in the continuous time literature (see Aıt-Sahalia, 2002; Phillips &Yu, 2009a). The second challenge is that when only the time series ofspot prices is observed, volatility has to be integrated out from the jointtransition density. Such integrals are analytically unknown and have to becalculated numerically. The dimension of integration is the same as thenumber of observations. When the number of observations is large, which istypical in practical applications, the numerical integration is difficult.

In recent years, solutions have been provided to navigate such challenges.To deal with the second challenge, for example, Jones (2003) and Aıt-Sahaliaand Kimmel (2007) proposed to estimate the model using data from bothunderlying spot and options markets. Option price data are used to extractvolatility, making the integration of volatility out of the joint transitiondensity unnecessary. To deal with the first challenge, Jones (2003) suggestedusing in-filled Euler–Maruyama (EM) approximations that enables aGaussian approximation to the joint transition density, whereas Aıt-Sahaliaand Kimmel (2007) advocated using a closed form polynomial approximationthat can approximate the true joint transition density arbitrarily well. Withthe two problems circumvented, the full likelihood-based inference is possible.For example, the method of Aıt-Sahalia and Kimmel (2007) is frequentistic,whereas the method of Jones (2003) is Bayesian.

It is well known that option prices are derived from the risk-neutralmeasure. Consequently, a benefit of using data from both spot and optionsmarkets jointly is that one can learn about the physical as well as risk-neutral measures. However, this benefit comes at expense. To connect thephysical and risk-neutral measures, the functional form of the market priceof risk has to be specified.

TORE SELLAND KLEPPE ET AL.138

In this chapter, we develop and implement a method for maximumsimulated likelihood estimation of the continuous time SV model with theconstant elasticity of volatility (CEV-SV). The approach does not requireobservations of option prices or volatility and hence there is no need tospecify the functional form of the market price of risk. As a result, we onlylearn about the physical measure. The CEV-SV model was first proposed byJones (2003) as a simple way to nest some standard continuous time SVmodels, such as the square root SV model of Heston (1993) and theGARCH diffusion model of Nelson (1990). To our knowledge, the presentchapter constitutes the first time ML is used to estimate the CEV-SV modelusing the spot price only.

To deal with the second challenge, we propose to use a modified efficientimportance sampling (EIS) algorithm, originally developed in Richard andZhang (2007), to integrate out a latent volatility process. To deal with thefirst challenge, we consider the EM approximation. We examine theperformance of the proposed maximum simulated likelihood using bothsimulated data and real data. Based on simulated results, we find that thealgorithm performs well. Empirical illustration suggests that the Hestonsquare root SV model does not fit data well. This empirical findingreinforces those of Andersen, Benzoni, and Lund (2002), Jones (2003), Aıt-Sahalia and Kimmel (2007), and others.

This chapter is organized as follows. Section 2 discusses the model andintroduces the estimation method. Section 3 tests the accuracy of the methodby performingMonte Carlo (MC) simulations for the square root SV model ofHeston (1993), the GARCH diffusion model of Nelson (1990), and the CEV-SV model. In Section 4, we apply this estimation method to real data for thethree SV models and analyze and compare the empirical results. A comparisonwith the log-normal (LN) model is also performed. Section 5 concludes.

2. MODEL AND METHODOLOGY

This section first presents the CEV-SV model under consideration and thenoutlines the MC procedure used to do likelihood analysis when only theprice process is observed.

2.1. SV Model with Constant Elasticity of Volatility

The continuous time CEV model was recently proposed to model stochasticvolatility (see, e.g., Jones, 2003; Aıt-Sahalia & Kimmel, 2007). Although we

Simulated Maximum Likelihood Estimation of Continuous Time SV Models 139

mainly focus on the CEV-SV model in this chapter, the proposed approachis applicable more generally.

Let �st and �vt denote the log-price of some asset and the volatility,respectively, at some time t. Then the CEV model is specified in terms of theIto stochastic differential equation:

d�st

�vt

" #¼

aþ b �vt

aþ b �vt

" #dtþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1� r2Þ �vt

prffiffiffiffi�vtp

0 s �vgt

" #dBt;1

dBt;2

" #. (1)

Here Bt;1 and Bt;2 denote a pair of independent canonical Brownianmotions. The parameters y ¼ ½a;b; s;r; g; a; b� have the restrictions a40,r 2 ð�1; 1Þ, g � 1=2, and bo0 whenever g � 1 (see Jones (2003) for atreatment of the volatility process for g41). In addition, for g ¼ 1=2 wehave the restriction 2a4s2 to ensure that �vt stays strictly positive(Cox, Ingersoll, & Ross, 1985). The CEV model nests the affine SV modelof Heston (1993) ðg ¼ 1=2Þ and the GARCH diffusion model of Nelson(1990) ðg ¼ 1Þ, and we will treat these special cases separately in additionto the full CEV model. Parameters a and b characterize the linear driftstructure of the volatility, in which �a=b is the mean and �b captures themean reversion rate. Parameter s is the volatility-of-volatility and rrepresents the leverage effect. Parameter g is the CEV elasticity. Parametersa and b represent, respectively, the long run drift and risk premium of theprice process.

2.2. A Change of Variable and Time Discretization

Under the parameter constraints described above, the volatility process �vtis strictly positive with probability one. The importance samplingprocedure proposed here uses (locally) Gaussian importance densities for�vt, and thus the supports of �vt and the importance density are inherentlyconflicting. To remove this boundary restriction, we shall work with thelogarithm of the volatility process. As the latent process gets integratedout, the actual representation of the volatility (or the log-volatility) isirrelevant theoretically, but is very important for the construction of EISprocedures as will be clear below. In addition, this change of variable willinfluence the properties of the time-discretization scheme that will bediscussed shortly.


Define �zt ¼ logð �vtÞ. Then, by Ito’s lemma, we have:

d�st

�zt

" #¼

aþ bexpð �ztÞ

Mð �ztÞ

" #dtþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1�r2Þ

pexpð �zt=2Þ rexpð �zt=2Þ

0 sexpð �ztðg� 1ÞÞ

" #dBt;1

dBt;2

" #,

(2)

where Mð �ztÞ ¼ bþ aexpð� �ztÞ�s2 expð2 �ztðg� 1ÞÞ=2. Clearly, the law of �st isunaltered, but the latent process �zt now has support over the real line. As thetransition probability density (TPD) in the general case is not known undereither representation (Eq. (1) or Eq. (2)), an approximation is needed. Thisis achieved by defining a discrete time model that acts as an approximationto Eq. (2) based on the EM scheme using a time-step equal to D time units.This discrete time process is given as the nonlinear and heteroskedasticauto-regression

siþ1

ziþ1

" #¼

siþDðaþ bexpðziÞÞ

ziþDMðziÞ

" #

þffiffiffiffiDp


pexpðzi=2Þ rexpðzi=2Þ

0 sexpðziðg� 1ÞÞ

24

35 ei;1

ei;2

" #,

where ½ei;1; ei;2� are temporarily independent bi-variate standard normalshocks. It is convenient to work with the log-returns of the price, so wedefine xi ¼ si� si�1 as this process is stationary. Hence, the discrete timedynamics for xi are given by:

xiþ1

ziþ1

" #¼

Dðaþ bexpðziÞÞ

ziþDMðziÞ

" #

þffiffiffiffiDp



0 sexpðziðg� 1ÞÞ

24

35 ei;1

ei;2

" #.

Throughout the rest of this chapter, Eq. (3) will be the model that we shallwork with.

Several authors (see, e.g., Aıt-Sahalia, 2002; Durham & Gallant, 2002;Durham, 2006) have argued that one should transform the latent process(instead of the log-transform applied here) in such a manner that it becomesa (nonlinear) Ornstein–Uhlenbeck process – i.e., with a homoskedastic errorterm in the latent process. This variance stabilization transform is given


(see, e.g., Rao, 1999, p. 210), up to an affine transformation, by:

ZðvÞ ¼

logðvÞ if g ¼ 1

v1�g � 1

1� gotherwise:

8><>:

However, excluding g ¼ 1, the variance stabilizing does not completely solvethe finite boundary problem on the domain of the transformed volatilityZð �vtÞ. Thus, the Gaussian approximation obtained from the EM scheme willhave a support that conflicts with the continuous time model. In Section 3,some MC experiments are conducted to verify that an approximatelikelihood function based on the EM scheme (Eq. (3)) with observed log-volatility does not lead to unacceptable biases for sample sizes and timesteps that are relevant in practice.

Another reason for using the variance stabilization procedure wouldbe to bring the posterior density (i.e., the density of the latent given theobserved price returns and parameters) closer to a multivariate Gaussian,as in Durham (2006) in the context of an EM discretization of theHeston model. This should in theory pave the way for using a Laplaceapproximation-based importance density (see, e.g., Shephard & Pitt, 1997),i.e., a multivariate Gaussian, to calculate the marginal likelihood of thedata. By using the EIS procedure outlined below, there is no need to bringthe posterior density closer to normality globally, as our importance densityis only locally Gaussian. Thus, the argument against using the logarithm forall g does not apply here. We therefore conclude this discussion and use thelogarithm throughout the rest of the chapter.

2.3. TPDs and Joint Densities

Assume that we have n observations of xi, i.e., x ¼ ½x1; . . . ;xn�1, sampled

discretely over a regular time grid with D time units between the time-points.More general deterministic time grids are possible. Further, denote theunobserved vector of zis at the corresponding times as z ¼ ½z1; . . . ; zn�. Forsimplicity, we assume for now that z0 is a known constant. The marginaldistribution of z0 is not known in closed form in the general case, and inpractice we will estimate z0 along with the parameter vector y by maximumlikelihood.

Let f i ¼ f iðzi;xijzi�1; y;DÞ denote the Gaussian TPD of the discrete timeprocess Eq. (3). From the specification, it is evident that f i is a bi-variate


Gaussian density with mean vector and covariance matrix

Dðaþ bexpðzi�1ÞÞ

zi�1þDMðzi�1Þ

" #and D

expðzi�1Þ srexpzi�1

2ð2g� 1Þ

� �

srexpzi�1

2ð2g� 1Þ

� �s2 expð2zi�1ðg� 1ÞÞ

264

375,

respectively. Exploiting the Markov structure of the discretized model, thejoint density of ðz;xÞ is given by:

pðz;xjy;z0;DÞ ¼Yni¼1

f iðzi;xijzi�1;y;DÞ. (4)

Clearly, this expression should also be regarded as an approximation to thecontinuous time joint density obtained when the f is are exchanged withthe (unknown) exact transition densities. The approximation is known toconverge strongly as D! 0 (Kloeden & Platen, 1999).

2.4. Monte Carlo Evaluation of the Marginal Likelihood

Since the log-volatility z is unobserved, approximate evaluation (based onthe EM discretization) of the likelihood function for given values of y and z0involves an integral over z, say

lðy; z0jxÞ ¼Z

pðz;xjy; z0Þdz.

Due to the nonlinear structure of the discrete time model Eq. (3), noclosed form expression for this integral is at hand, and hence numericalmethods generally are needed. Since the dimension of the integral istypically of the order of 1,000–10,000, quadrature rules are of little use here.Instead, we apply an importance sampling technique where the impor-tance density is constructed using the EIS algorithm of Richard and Zhang(2007).

The EIS algorithm (approximately) minimizes the MC variance withina parametric class of auxiliary importance densities, say mðzja;x; z0Þ,being indexed by the n� 2-dimensional parameter a. We denote theoptimal choice of a by a.2 Further, we refer to mðzj0;x; z0Þ as the baselineimportance density where a ¼ 0 denotes a with all elements equal to zero.


In this work, we restrict the importance densities to have the form:

mðzja;x; z0Þ ¼Yni¼1

miðzijzi�1; xi; aiÞ.

Note that we allow the importance density to depend explicitly on theobserved vector x. The weak law of large numbers for S!1 suggests thatlðy; z0jxÞ may be approximated by

~lðy; z0jx; aÞ ¼1

S

XSj¼1

pð~zðjÞ;xjy; z0Þ

mð~zðjÞja; x; z0Þ, (5)

where ~zðjÞ; j ¼ 1; . . . ;S are drawn from mðzja; x; z0Þ. This law of largenumbers applies in particular to a that is obtained using the EIS algorithmso the variance of ~l is approximately minimized. Thus, the approximateMLE estimator will have the form

ðy; z0Þ ¼ argmaxðy;z0Þ

log ~lðy; z0jx; aÞ, (6)

where the logarithm is taken for numerical convenience.

2.4.1. The Baseline Importance DensityTypically mðzja;x; z0Þ is taken to be a parametric extension to the so-callednatural sampler (i.e., pðzjy; z0Þ) (see, e.g., Liesenfeld & Richard, 2003;Liesenfeld & Richard, 2006; Richard & Zhang, 2007; Bauwens & Galli,2009). In this chapter, we depart from this practice by introducing informa-tion from the data into the baseline importance density. More precisely,we define

f iðzijzi�1;xi; y;DÞ ¼f iðzi;xijzi�1; y;DÞRf iðzi;xijzi�1; y;DÞdzi

,

i.e., the conditional transition densities given xi, and set miðzijzi�1;xi; 0iÞ ¼f iðzijzi�1;xi; y;DÞ.

3 Since f iðzi;xijzi�1; y;DÞ is a bi-variate Gaussian density,f iðzijxi; zi�1; y;DÞ is also Gaussian, with mean and standard deviationgiven by:

m0i ðzi�1;xiÞ ¼ zi�1þDMðzi�1Þþsrðxi�Dðaþbexpðzi�1ÞÞÞexp zi�1 g�3

2

� �� ,


S0i ðzi�1Þ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDð1�r2Þ

pexpðzi�1ðg�1ÞÞ,

respectively.

2.4.2. The Parametrically Extended Importance DensityThe baseline importance density is in itself a valid, but not very efficientimportance density. Consequently, we parametrically extend it to introducemore flexibility. Following Richard and Zhang (2007), each factor of thebaseline importance density is (conditionally on zi�1) perturbed within theunivariate Gaussian family of distributions. Staying within the Gaussianfamily is numerically advantageous because sampling from m then becomesfast and conceptually simple. More precisely, the extension is done bymultiplying miðzijzi�1; xi; 0iÞ by expðai;1zi þ ai;2z

2i Þ and compensating with

the appropriate normalization factor. More precisely, we write mi as

miðzijzi�1;xi; aiÞ ¼Biðzijzi�1; xiÞciðzi; aiÞ

wiðzi�1; xi; aiÞ,

where

logBiðzijzi�1;xiÞ ¼ �ðzi � m0i ðzi�1;xiÞÞ

2

2S0i ðzi�1Þ2

,

logciðzi; aiÞ ¼ ai;1zi þ ai;2z2i ,

wiðzi�1;xi; aiÞ ¼Z

Biðzijzi�1;xiÞciðzi; aiÞdzi.

An explicit expression for wi is given in Appendix A. The mean and thestandard deviation of miðzijzi�1;xi; aiÞ that are used when sampling frommðzja;x; z0Þ have the form:

mai ðzi�1;xiÞ ¼m0i ðzi�1; xiÞ þ ai;1S0i ðzi�1Þ

2

1� 2ai;2S0i ðzi�1Þ2

, (7)

Sai ðzi�1Þ ¼S0i ðzi�1Þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1� 2ai;2S0i ðzi�1Þ2

q . (8)

For each mi to have finite variance, it is clear from Eq. (8) that ai;2 must havethe restriction ai;2o1=ð2S2

0iÞ.


2.4.3. The EIS RegressionsThe final piece of notation introduced is the fraction

xiðzi�1;xiÞ ¼f iðzi; xijzi�1; y;DÞBiðzijzi�1; xiÞ

.

As Bi is the shape of the conditional density f iðzijzi�1;xi; y;DÞ, xðzi�1;xiÞ isconstant as a function zi. The expression for xi is given in Appendix A.

Using this notation, we have

pðz;xjy; z0Þmðzja;x; z0Þ

¼Yni¼1

f iðzi; xijzi�1; yÞmiðzijzi�1;xi; aiÞ

¼Yni¼1

xiðzi�1; xiÞwiðzi�1; xi; aiÞciðzi; aiÞ

¼ x1ðz0;x1Þw1ðz0;x1; a1ÞYn�1i¼1

xiþ1ðzi;xiþ1Þwiþ1ðzi; xiþ1; aiþ1Þciðzi; aiÞ

" #

�1

cnðzn; anÞ. ð9Þ

This last representation enables us to work out how the parameter a shouldbe chosen to minimize MC variance using EIS type regressions. Firs, we setan ¼ ½0; 0� so that the last fraction is equal to 1 for all zn. In fact, setting anto zero effectively integrates out zn analytically, and thus for n ¼ 1 theprocedure is exact. Second, under the assumption that z0 is non-stochastic,x1w1 is also constant for fixed values of z0 and does not add to the varianceof the importance sampling procedure.

Finally, we notice that the log-variation (as a function of z) for each ofthe factors in the bracketed product of Eq. (9) depends only on a single zi.This gives rise to a recursive set of EIS ordinary least squares regression onthe form

log xiþ1ð ~zðjÞi ; xiþ1Þ þ log wiþ1ð ~z

ðjÞi ;xiþ1; aiþ1Þ ¼ ci þ logcið ~z

ðjÞi ; aiÞ þ ZðjÞi

¼ ci þ ai;1 ~zðjÞi þ ai;2ð ~z

ðjÞi Þ

2þ ZðjÞi ; i ¼ 1; . . . ; n� 1; j ¼ 1; . . . ;S, ð10Þ

where ZðjÞi are the regression residuals. The constant term ci is estimatedjointly with ai. In particular, we notice that the regressions are linear in ai,suggesting that computationally efficient linear least squares algorithms maybe applied. Note also that the EIS regressions need to be calculatedbackwards in time, as the ith regression depends on aiþ1.

The MC variance of Eq. (5), represented by ZðjÞi , stems from the fact thatthe left-hand side of Eq. (10) is nonquadratic (in zi) and thus deviatesfrom the quadratic model represented by logci. Still, since ~zðjÞi ; j ¼ 1; . . . ;S;


are typically strongly located by the information provided from the baselinedensity, the quadratic approximation works reasonably well.

A fortunate by-product of the EIS regressions is that the log-weightsin the likelihood estimate Eq. (5) are directly expressible in terms of theregression residuals. More precisely, Eq. (9) provides us with the followingexpression:

logpð~zðjÞ; xjy; z0Þ

mð~zðjÞja;x; z0Þ¼ logðx1w1Þ þ

Xn�1i¼1

½ci þ ZðjÞi �; j ¼ 1; . . . ;S, (11)

provided that we have set an to zero. Thus, the estimate of the likelihoodfunction can be calculated with very small effort once the relevant quantitiesin the regression models are calculated.

2.4.4. Iterative EIS and ImplementationSince the ~zðjÞs depend on a, and aiþ1 needs to be known to calculate ai, wemay treat the regressions (Eq. (10)) as a fixed point condition for a, towardwhich we generate a convergent sequence of aðkÞ for integers k. This is doneusing the following steps:

1. Set að0Þ ¼ 0, k ¼ 0, and let w 2 Rn�S denote a matrix filled withindependent standard normal variates.

2. Simulate the paths ~zðjÞ ¼ ~zðjÞðaðkÞÞ; j ¼ 1; . . . ;S forward in time (i.e., fori ¼ 1! n� 1) using

~zðjÞi ¼ maðkÞið ~zðjÞi�1;xiÞ þ S

aðkÞið ~zðjÞi�1Þwi;j for j ¼ 1; . . . ;S; i ¼ 1; . . . ; n� 1,

where for simplicity we define ~zðjÞ0 ¼ z0.3. Calculate a

ðkþ1Þi backwards in time (i.e., for i ¼ n� 1! 1) by estimating

the regression models (Eq. (10)) based on aðkþ1Þiþ1 in wiþ1 and the paths

~zðjÞðaðkÞÞ.4. Calculate the logarithm of the likelihood estimate Eq. (9) using the

quantities calculated for the regressions in step 3, and stop the iteration ifthis estimate has converged to the desired precision.

5. k kþ 1 and return to step 2.

Following Richard and Zhang (2007), we apply the same set of canonicalstandard normal variates w to generate the paths in step 2 for each iteration.Moreover, this same set of canonical variates is used for each evaluation ofthe simulated log-likelihood function when doing the likelihood maximiza-tion. This usage of common random numbers ensures the smoothness of the


simulated log-likelihood function and allows us to apply a BFGS quasi-Newton optimizer (Nocedal & Wright, 1999) based on finite differencegradients. Another measure to keep the simulated log-likelihood functionsmooth is to terminate the EIS iteration when the change (from iteration kto kþ 1) in log-likelihood value is a small number. We have used a changeof log-likelihood value of o1:0e� 9 as our stopping criterion.

The choice to apply a gradient-based optimization algorithm stems fromthe fact that the model has up to eight parameters, and a simplex-typeoptimization algorithm will generally require too many function evaluationsto converge for such problems. The computational cost of the extra EISiterations needed to obtain this high precision is thus typically won backwhen using a faster optimization algorithm. The typical number of EISiterations required is between 20 and 40 to obtain precision of the order of1:0e� 9. However, once such an evaluation is complete, computing the log-likelihood values needed for finite difference gradients can be much fastersince we may start the EIS iteration using the previous a and apply it for aslightly perturbed parameter vector. Typically, this approach requires about5 to 10 iterations to converge.

One final detail to improve numerical stability is to add a simpleline search, similar to those applied in line-searching optimizationalgorithms, to the EIS iteration. This is done by regarding the differencein iterates dðkþ1Þ ¼ aðkþ1Þ � aðkÞ as a ‘‘search direction,’’ which we may,when necessary, take shorter steps along. More precisely, when completingstep 3 above, we set aðkþ1Þ ¼ aðkÞ þ odðkþ1Þ, o 2 ð0; 1Þ, if the ‘‘raw’’ iterate instep 3 leads to an infinite variance in the importance density or some otherpathology.

Typical computing times for our FORTRAN90 implementation4 rangefrom 30 to 1,000 seconds for locating a maximum likelihood estimator fordata sets with around 2,000 observations using a standard PC. TheLAPACK (Anderson et al., 1999) routine dgels was used for the linearregressions, and all the random numbers were generated using Marsaglia’sKISS generator (see, e.g., Leong, Zhang, Lee, Luk, & Villasenor, 2005).

3. MONTE CARLO EXPERIMENTS

To assess the statistical properties of the EIS-MC procedure outlined inSection 2, we have conducted some MC experiments. The main objectivesof these experiments are to quantify the error arising from the EMdiscretization and from using EIS to integrate out the latent process.


The main sources (with no particular ordering) of statistical bias for theEIS-MC procedure are:

� Discretization of the continuous time model using a EM discretization.An indirect way of diagnosing this bias is to look for unacceptable errorsusing EM-based maximum likelihood when the latent process is observed.In this manner, importance sampling is avoided, and hence can beeliminated as a source of error.� Small sample biases from using the integrated likelihood function.Diagnostics for this is provided by comparing the EM-based MLEswhen the log-volatility is observed and unobserved.� MC errors from using Eq. (5) instead of using exact integration. Theseerrors will be discussed in the next section by using many differentrandom number seeds in the program.

All of the computations are done using a yearly time scale and withdaily observations, corresponding to D ¼ 1=252. We use S ¼ 32 paths inthe importance sampler throughout both the MC simulations and theapplication to real data. All the MC experiments are done with z bothobserved and unobserved, i.e., by maximizing Eqs. (4) and (5), respectively,with respect to y. Under observed z, the simulated z0 is applied, whereasunder unobserved z, we estimate z0 along with the other parameters.

The ‘‘true’’ parameter values used to generate the synthetic data sets arethe empirical parameter estimates obtained from the Standard & Poor’s 500data set, which we consider in Section 4. We first consider the Heston modeland the GARCH diffusion model, and then consider the full CEV modelunder two different simulation designs.

The synthetic data sets are simulated using the EM scheme with a timestep of D=2048, so that the simulated data sampled at the D time grid can beregarded as sampled from the continuous time model. The first 3,000 datapoints for each simulation are discarded so that the simulated distribution ofz0 is close to the marginal distribution of z0 dictated by the model. For allthe experiments, we simulate and estimate 500 synthetic data sets.

3.1. Heston Model

The results for the simulations under the Heston model are given in Table 1.We use sample size n ¼ 2; 022, equal to the number of observations in realdata set considered in Section 4.5


It is well known that MLEs of the mean reversion parameter tend tobe biased toward faster mean reversion in finite samples in the context ofdiffusion processes with a linear drift (see, e.g., Phillips & Yu, 2005;Phillips & Yu, 2009b). For the CEV model specified in Eq. (1), a fastermean reversion rate corresponds to higher negative values of b. This is alsoseen under the Heston model both for observed and unobserved log-volatility. Interestingly, we find the effect is stronger for observedlog-volatility. Still, the bias under the Heston model is smaller than thecorresponding statistical standard errors, both for the observed and theunobserved log-volatility. Thus, it seems that all three sources of biasdiscussed above are controlled for this model and under the amount of data.

The loss of precision when using the integrated likelihood procedureranges by a factor from 2 to 10 in increased statistical standard errors.In particular, the estimation precision of the volatility-of-volatility para-meter s and the leverage parameter r is increased by a factor close to 10.

3.2. The GARCH Diffusion Model

Simulation results for the GARCH diffusion model are summarized inTable 2. As before we use n¼ 2,022.6 For both observed and unobserved

Table 1. Heston Model MC Study Results.

Parameter True Value Bias Std MSE

Observed volatility, n¼ 2,022

a 0.2109 0.0081 0.0320 0.0011

b �7.7721 �0.4366 1.4286 2.2272

s 0.3774 �0.0042 0.0059 0.0001

r �0.3162 0.0044 0.0190 0.0004

a 0.0591 �0.0072 0.0854 0.0073

b 1.6435 0.4320 4.0063 16.2035

Unobserved volatility, n¼ 2,022

a 0.2109 �0.0040 0.0601 0.0036

b �7.7721 �0.1068 2.4411 5.9577

s 0.3774 �0.0342 0.0493 0.0036

r �0.3162 0.0194 0.1209 0.0150

a 0.0591 0.0344 0.1277 0.0175

b 1.6435 �1.0805 5.5070 31.4291

Note: The bias is reported as E½y� ytrue�. ‘‘Std’’ denotes the statistical standard errors and MSE

denotes the mean square error around the ‘‘true parameter’’.


log-volatility, there is a negative bias in the mean reversion parameter b asone would expect. Once again we see that no biases are larger in magnitudethan the corresponding standard errors when the log-volatility isunobserved. A downward bias in s is seen for the observed log-volatilityto be larger in magnitude than the corresponding standard error, but thebias is still small compared with the ‘‘true’’ parameter. Again the loss ofprecision from using integrated MLE is the largest for the parameters s andr, where again about a tenfold increase in standard error is seen.

3.3. The CEV Model

For the CEV model, we have performed two simulation studies, which aresummarized in Tables 3 and 4. We denote these parameter settings as P1 andP2, respectively, corresponding to columns 3 and 4 in Table 5.7

Under P1 as ‘‘true parameters’’ are the estimates obtained using thefull set of S&P 500 returns, which include the October 1987 crash. Theexperiment is done using both n ¼ 2; 022 and n ¼ 5; 000 data points. FromTable 3 we see that the EIS-MLE procedure produces downward biasedestimates of g when the log-volatility is unobserved. When the log-volatilityis observed, this effect is negligible. The bias in g also leads to substantialbias in the other parameters governing the volatility, as we expect the MLE

Table 2. GARCH Diffusion Model MC Study Results.



a 0.2411 0.0015 0.0323 0.0010

b �9.3220 �0.3056 2.0270 4.1937

s 2.8202 �0.0715 0.0430 0.0070

r �0.2920 0.0020 0.0195 0.0004

a 0.1019 �0.0039 0.0881 0.0078

b 0.1139 0.3217 4.4037 19.4570


a 0.2411 0.0117 0.0756 0.0058

b �9.3220 �0.8100 3.6413 13.8878

s 2.8202 �0.0760 0.4254 0.1864

r �0.2920 0.0371 0.1156 0.0147

a 0.1019 0.0407 0.1320 0.0190

b 0.1139 �1.4421 6.1166 39.4159

Note: See the note of Table 1 for details.


estimates of them must have a strong correlation structure in distribution,even asymptotically. Increasing the sample size from 2,022 to 5,000decreases the biases slightly, but it seems that very long time series will beneeded to identify g with a decent precision for when the true parameter isin this range.

Table 3. CEV Model (P1) MC Study Results.



a 0.0434 0.0102 0.0232 0.0006

b �0.4281 �0.5903 1.5432 2.7252

s 13.6298 �0.3713 1.3716 2.0153

r �0.3317 0.0013 0.0188 0.0004

g 1.5551 �0.0070 0.0266 0.0008

a 0.0820 �0.0095 0.0923 0.0086

b 0.8716 0.5788 4.4829 20.3912


a 0.0434 0.0634 0.0530 0.0068

b �0.4281 �3.4599 2.6160 18.7998

s 13.6298 �8.6680 3.7726 89.3375

r �0.3317 0.0227 0.1355 0.0188

g 1.5551 �0.3539 0.2202 0.1736

a 0.0820 0.0030 0.1191 0.0142

b 0.8716 0.1687 5.4970 30.1819


a 0.0434 0.0036 0.0138 0.0002

b �0.4281 �0.2046 0.9445 0.9322

s 13.6298 �0.4053 0.8616 0.9051

r �0.3317 0.0014 0.0128 0.0002

g 1.5551 �0.0066 0.0169 0.0003

a 0.0820 �0.0053 0.0557 0.0031

b 0.8716 0.2240 2.5162 6.3687


a 0.0434 0.0480 0.0320 0.0033

b �0.4281 �2.6796 1.5569 9.5995

s 13.6298 �8.5886 2.5369 80.1870

r �0.3317 0.0309 0.0787 0.0071

g 1.5551 �0.3082 0.1444 0.1158

a 0.0820 0.0154 0.0720 0.0054

b 0.8716 �0.5007 3.1073 9.8867



To understand the impact of extreme observations (or ‘‘crashes’’) on thisbias, we use the logarithm of the maximal absolute log-return as a simpleproxy for ‘‘large’’ crash. In Fig. 1, a scatter plot of the estimated g againstlogarithm of the maximal absolute log-return is shown across all simulatedpaths under P1. A strong positive relationship is seen (correlation 0.52). This

Table 4. CEV Model (P2) MC Study Results.



a 0.0754 0.0097 0.0216 0.0006

b �3.4022 �0.5524 1.3048 2.0042

s 1.7587 0.0538 0.2565 0.0686

r �0.3912 0.0037 0.0191 0.0004

g 1.0804 0.0072 0.0359 0.0013

a �0.1811 �0.0204 0.1160 0.0138

b 13.8246 1.2152 6.2209 40.0973


a 0.0754 0.0231 0.0499 0.0030

b �3.4022 �1.3109 2.7270 9.1392

s 1.7587 �0.1653 1.6373 2.7025

r �0.3912 0.0030 0.1618 0.0261

g 1.0804 �0.1273 0.2449 0.0761

a �0.1811 �0.0220 0.1755 0.0312

b 13.8246 1.5312 9.2881 88.4301


a 0.0754 0.0025 0.0122 0.0002

b �3.4022 �0.1435 0.7612 0.5989

s 1.7587 0.0370 0.1511 0.0242

r �0.3912 0.0044 0.0120 0.0002

g 1.0804 0.0067 0.0213 0.0005

a �0.1811 �0.0078 0.0693 0.0049

b 13.8246 0.4200 3.6322 13.3426


a 0.0754 0.0044 0.0217 0.0005

b �3.4022 �0.2826 1.1640 1.4320

s 1.7587 �0.2897 0.8019 0.7257

r �0.3912 0.0201 0.0876 0.0081

g 1.0804 �0.0770 0.1500 0.0284

a �0.1811 0.0101 0.0936 0.0089

b 13.8246 �0.3163 4.7425 22.5458



plot suggests that when the log-volatility is unobserved, extreme values in asample are needed to identify high values of g. As a reference, the maximalabsolute log-return of the October 1987 crash is roughly expð�1:64Þ.In the simulated data sets, such extreme events occur in roughly 0.4% of thedata sets.

For the P2 ‘‘true parameters’’ obtained using the data excluding theOctober 1987 crash, we see much smaller biases, and the downward biasesin g decrease substantially when the sample size is increased from 1,800 to5,000. Still, all the biases are smaller in magnitude than the correspondingstatistical standard error.

Table 5. Parameter Estimation Results for the Standardand Poor’s 500 Data.

Parameter Heston GARCH CEV LN CEV1800 LN1800

a 0.2109 0.2411 0.0434 �29.5044 0.0754 �17.4672

[0.0035] [0.0020] [0.0028] [0.2304] [0.0027] [0.1234]

(0.0601) (0.0756) (0.0530) (8.7924) (0.0499) (8.5628)

b �7.7721 �9.3220 �0.4281 �7.6953 �3.4022 �4.4413

[0.1395] [0.0860] [0.1495] [0.0593] [0.1489] [0.0313]

(2.4411) (3.6413) (2.6160) (2.2791) (2.7270) (2.2166)

s 0.3774 2.8202 13.6298 2.4793 1.7587 1.2965

[0.0036] [0.0115] [0.3321] [0.0112] [0.3088] [0.0063]

(0.0493) (0.4254) (3.7726) (0.3313) (1.6373) (0.2664)

r �0.3162 �0.2920 �0.3317 �0.3146 �0.3912 �0.3934

[0.0017] [0.0006] [0.0023] [0.0007] [0.0010] [0.0010]

(0.1209) (0.1156) (0.1355) (0.0946) (0.1618) (0.1525)

g 0.5 1.0 1.5551 – 1.0804 –

– – [0.0075] – [0.0432] –

– – (0.2202) – (0.2449) –

a 0.0591 0.1019 0.0820 0.0683 �0.1811 �0.2021

[0.0013] [0.0005] [0.0011] [0.0005] [0.0044] [0.0008]

(0.1277) (0.1320) (0.1191) (0.1102) (0.1755) (0.1732)

b 1.6435 0.1139 0.8716 1.4183 13.8246 14.8801

[0.0459] [0.0209] [0.0419] [0.0196] [0.2251] [0.0409]

(5.5070) (6.1166) (5.4970) (4.9598) (9.2881) (8.7362)

Log-likelihood 6514.90 6541.42 6552.09 6538.23 5916.20 5916.44

[0.2457] [0.0823] [0.3494] [0.1502] [0.0457] [0.0646]

Note: Monte Carlo standard errors are given squared brackets and statistical standard

errors, taken from the MC experiments, are given in parentheses. For the Heston model

and the CEV model, 1 of the 100 estimation replications failed to converge. Columns 5 and 6

report parameter estimates obtained when including only the first 1,800 observations in

the sample


4. APPLICATION TO REAL DATA

For our application to real data, we use S&P 500 log-return data previouslyused in Jacquier, Polson, and Rossi (1994) and Yu (2005).8 The data coversthe period January 1980 to December 1987 and the sample has a total ofn¼ 2,022 log-return observations.

We also fit the following continuous time LN model using the proposedmethod:

d�st

�zt

" #¼

aþ b expð �ztÞ

aþ b �zt

" #dtþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1� r2Þ

pexpð �zt=2Þ r expð �zt=2Þ

0 s

" #dBt;1

dBt;2

" #.

(12)

There are two purposes for fitting the LN model. First, it is used to illustratethe flexibility of our estimation methods. Second, it would be empiricallyinteresting to compare the performance of the LN model with that of theCEV models. The discretization and the recipe for adapting the EISalgorithm to this model are given in Appendix B.

−3.5 −3 −2.5 −2 −1.5 −10.5

1

1.5

2

log (maxi |xi|)

γ∧

Fig. 1. Estimated g Values Explained by logðmaxijxijÞ from the MC Experiment

with Parameter Setting 1 and n¼ 2,022. The Sample Correlation Equals 0.52.


Parameter estimates along with statistical and MC standard errors forthe Heston model, the GARCH diffusion, the full CEV model, and the LNmodel are given in columns 1–4 in Table 5, respectively.

In addition, we have included parameter estimates for the CEV model andthe LN model when only the first 1,800 observations (and thus excluding theOctober 1987 crash) are used in column 5 and 6. The statistical standarderrors are taken from the MC experiments reported in Tables 1–4 for theCEV model instances. For the LN model, the statistical standard errors areMC estimates based on 100 synthetic data sets.

As both the Heston model and the GARCH diffusion model are specialcases of the CEV model, it is sensible to compare the maximum likelihoodvalues reported in the last row of the table. The likelihood ratio test suggeststhat there is strong empirical evidence against the Heston model. Thisempirical result reinforces what have been found when both the spot pricesand option prices are jointly used to estimate the CEV-SV model (Jones,2003; Aıt-Sahalia & Kimmel, 2007). Moreover, for the GARCH diffusionmodel, when the complete data set is used, the likelihood ratio test givesrejection for any practical p-value comparing with the complete CEV model.For the shorter data set, we see that the estimate for g is less than one-halfthe standard error from that of the GARCH diffusion model.

The estimates for the leverage effect parameter r are very much inaccordance with the estimates of Yu (2005) ðposterior mean ¼ �0:3179Þunder the log-normal stochastic volatility (LN SV) model. In all cases, weobtained a positive estimate of b, suggesting a positive risk-return relation,but the parameter estimates are statistically insignificant.

The parameter estimates of the CEV model with and without the October1987 crash differ significantly. This observation suggests a poor identifica-tion of g and that the influence of the crash to the estimate of g when the log-volatility is unobserved is substantial. The finding is consistent with that inJones (2003), even though he uses data from 1986–2000 and 1988–2000along with implied volatility data. For the data set including the October1987 crash, Jones (2003) obtains a posterior mean of 1.33 for the CEVparameter g. The corresponding estimated value for data excluding theOctober 1987 crash is 1.17. Our simulated maximum likelihoodestimates for g are 1.56 and 1.08, respectively. Jones (2003) argues that toaccommodate the large spike in volatility represented by the October 1987crash, higher values of g and s are needed. Still, since Jones (2003) used bothlog-return and implied volatility data, it is expected that his parameterestimates differ less then ours with and without the October 1987 crash inthe sample.


The estimation results based on the LN SV model are reported in column4 for the full sample and in column 6 for the subsample. The LN SV modelhas also been successfully estimated in both cases. For example, when thefull sample is used, the estimates for the leverage effect parameter r are verymuch in accordance with the estimates of Yu (2005). A comparison of thelog-likelihood values of the CEV model and the LN model reveals that forthe full sample the 7-parameter CEV model outperforms the 6-parameterLN model. However, for the subsample, the LN model has a slightly higherlikelihood value even with fewer parameters.

To estimate the errors induced by integrating out the log-volatility usingthe above-described EIS-MC method (comparing with the exact unknownintegral), we repeat the estimation process 100 times using different randomnumber seeds. These MC standard errors for the parameters and maximumlog-likelihood values are included in brackets in Table 5. It is seen that theMC errors are generally small compared with the statistical standard errors.Judging from the MC standard errors in maximum log-likelihood estimates,the EIS-MC method performs best when g is close to 1.

As references for the standard errors of the maximum log-likelihoods,we may mention that Liesenfeld and Richard (2006) obtains an MCstandard error of 0.11 (log-likelihood: 918) under a 3-parameter LN SVmodel with 945 latent variables using 30 paths in the importance sampler.For a 5-parameter time-discretized Heston model, Durham (2006) obtainsan MC standard error of 2.49 (log-likelihood: 18,473) using 1,024 draws ina Laplace importance sampler. As the latent process under considerationhere is both nonlinear and heteroskedastic, the standard errors reported inTable 5 are satisfactory. Comparing with the findings of Kleppe and Skaug(2009), much of this may be written back to constructing the importancesampler around the product of conditional transition densities, rather thanaround the natural sampler as is commonly done in other applications of theEIS algorithm to nonlinear state space models.

5. CONCLUDING REMARKS

This chapter outlines how the EIS algorithm may be applied to integrate outa latent process in an EM-discretized stochastic differential equation model.In terms of numerical precision, we find that the algorithm performs verywell when considering the nonlinear and heteroskedastic structure of thelatent process. In terms of the application to the CEV model, we find that


the integrated MLEs obtained perform well for moderate values of g, but theidentification is more difficult for higher values of g.

One direction for further research is to use the improved (relative to theEM) approximate continuous time TPDs proposed in Aıt-Sahalia (2008)and for jump diffusions in Yu (2007). Using a simple Taylor expansion ofthese approximations (in zi), one can obtain estimates of the conditionaltransition densities (i.e., conditional on xi) that stays within the locallyGaussian importance samplers. The inclusion of jumps in the model willprobably also improve the identifiability of the complete CEV model, aslarge returns may be regarded as jumps rather than be caused by large spikesin the volatility process. As a result, the volatility series will be smoother andhence it can be expected that the finite sample estimation bias of the meanreversion parameter will be more serious.

Moreover, it should be noted that this procedure is by no means restrictedto the CEV family of models. As shown in Appendix B, only the threemodel-dependent functions m0i ðzi�1; xiÞ, S0i ðzi�1Þ, and xiðzi�1;xiÞ need to berespecified to implement a different model. The EM scheme suggests thatany stochastic differential equation has an approximate Gaussian TPD forsufficiently short time-steps D. Thus, the technique of using the conditionalon-data EM-TPD can be applied provided that data are given over a fineenough time grid. In particular, due to the explicit nature of Gaussianconditional densities, multivariate extensions (toward both multipleobserved and unobserved processes) should also be straightforward.

It is also worth noting that the above outlined procedure is closely relatedto the Laplace accelerated sequential importance sampling (LASIS)procedure of Kleppe and Skaug (2009). In the setting of the EM-discretizedCEV model, their procedure would be equivalent to applying a Laplaceimportance sampler in w (which are standard normal) instead of z. Thisprocedure would then bypass the much problems of heteroskedasticityand nonlinearity in much the same manner as outlined here, but we do notmake further comparisons here.

NOTES

1. This is a slight abuse of notation, as the data are from the continuous timeprocess (Eq. (1)) and not the discrete time approximation.2. In general, both m and a depend on y and D, but we suppress this dependence in

our notation.3. 0i should be read as the ith row of a with the elements all equal to 0.4. The source code is available on request from the first author.


5. Under this simulation regime, 8% of the estimation replica for the unobservedlog-volatility failed to converge and were subsequently ignored.6. For this model, 2.8% of the simulation replications under unobserved log-

volatility failed to converge.7. For P1, 4.2% of the replications failed to converge, whereas for P2, 6.2% of the

replications failed to converge.8. The log-return data are multiplied by 0.01.

REFERENCES

Aıt-Sahalia, Y. (2002). Maximum-likelihood estimation of discretely-sampled diffusions:

A closed-form approximation approach. Econometrica, 70, 223–262.

Aıt-Sahalia, Y. (2008). Closed-form likelihood expansions for multivariate diffusions. Annals of

Statistics, 36(2), 906–937.

Aıt-Sahalia, Y., & Kimmel, R. (2007). Maximum likelihood estimation of stochastic volatility

models. Journal of Financial Economics, 134, 507–551.

Andersen, T. G., Benzoni, L., & Lund, J. (2002). An empirical investigation of continuous-time

equity return models. The Journal of Finance, 57(3), 1239–1284.

Andersen, T. G., & Lund, J. (1997). Estimation continuous-time stochastic volatility models of

the short-term interest rate. Journal of Econometrics, 77, 343–377.

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J. D.,

Greenbaum, A., Hammarling, S., McKenney, A., & Sorensen, D. (1999). LAPACK

Users’ Guide (3rd ed.). Philadelphia: Society for Industrial and Applied Mathematics.

Bakshi, G., Cao, C., & Chen, Z. (1997). Empirical performance of alternative option pricing

models. Journal of Finance, 52, 2003–2049.

Bauwens, L., & Galli, F. (2009). Efficient importance sampling for ml estimation of scd models.

Computational Statistics and Data Analysis, 53, 1974–1992.

Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory of the term structure of interest rates.


Duffie, D., Pan, J., & Singleton, K. J. (2000). Transform analysis and asset pricing for affine

jump-diffusions. Econometrica, 68, 1343–1376.

Durham, G. B. (2006). Monte carlo methods for estimating, smoothing, and filtering one and

two-factor stochastic volatility models. Journal of Econometrics, 133, 273–305.

Durham, G. B., & Gallant, A. R. (2002). Numerical techniques for maximum likelihood

estimation of continuous-time diffusion processes (with discussion). Journal of Business

and Economic Statistics, 20(3), 297–338.

Heston, S. (1993). A closed-form solution for options with stochastic volatility with applications

to bonds and currency options. Review of Financial Studies, 6, 327–343.

Hull, J., & White, A. (1987). The pricing of options on assets with stochastic volatilities. Journal

of Finance, 42, 281–300.

Jacquier, E., Polson, N. G., & Rossi, P. E. (1994). Bayesian analysis of stochastic volatility

models. Journal of Business & Economic Statistics, 12(4), 371–389.

Jones, C. S. (2003). The dynamics of stochastic volatility: Evidence from underlying and options

markets. Journal of Econometrics, 116, 181–224.

Kleppe, T. S., & Skaug, H., (2009). Fitting general stochastic volatility models using laplace

accelerated sequential importance sampling. Submitted for publication.


Kloeden, P. E., & Platen, E. (1999). Numerical solution of stochastic differential equations.

New York: Springer-Verlag.

Leong, P. H. W., Zhang, G., Lee, D. U., Luk, W., & Villasenor, J. (2005). A comment on the

implementation of the ziggurat method. Journal of Statistical Software, 12(7), 1–4.

Liesenfeld, R., & Richard, J.-F. (2003). Univariate and multivariate stochastic volatility models:

Estimation and diagnostics. Journal of Empirical Finance, 10, 505–531.

Liesenfeld, R., & Richard, J.-F. (2006). Classical bayesian analysis of univariate and

multivariate stochastic volatility models. Econometric Reviews, 25(2), 335–360.

Nelson, D. B. (1990). ARCH models as diffusion approximations. Journal of Econometrics, 45,

7–38.

Nocedal, J., & Wright, S. J. (1999). Numerical optimization. New York: Springer.

Phillips, P., & Yu, J. (2005). Jackknifing bond option prices. Review of Financial Studies, 18,

707–742.

Phillips, P., & Yu, J. (2009a). Maximum likelihood and Gaussian estimation of continuous time

models in finance. In: E. A. Torben Andersen (Ed.), Handbook of Financial Times series

(pp. 497–530). New York: Springer.

Phillips, P. C. B., & Yu, J. (2009b). Simulation-based estimation of contingent-claims prices.

Review of Financial Studies, 22(9), 3669–3705.

Rao, B. L. S. P. (1999). Statistical inference for diffusion type processes. Number 8 in Kendall’s

library of statistics. Arnold.

Richard, J.-F., & Zhang, W. (2007). Efficient high-dimensional importance sampling. Journal of

Econometrics, 141(2), 1385–1411.

Shephard, N., & Pitt, M. K. (1997). Likelihood analysis of non-Gaussian measurement time

series. Biometrika, 84, 653–667.

Wiggins, J. (1987). Option values under stochastic volatility: Theory and empirical estimate.

Journal of Financial Economics, 19, 351–372.

Yu, J. (2005). On leverage in a stochastic volatility model. Journal of Econometrics, 127,

165–178.

Yu, J. (2007). Closed-form likelihood approximation and estimation of jump-diffusions with an

application to the realignment risk of the chinese yuan. Journal of Econometrics, 141(2),

1245–1280.

APPENDIX A. EEXPLICIT EXPRESSIONS

The explicit expression for log wi is given as

log wiðzi�1;xi; aiÞ ¼1

2logðpÞ �

1

2log

1

2S0i ðzi�1Þ2� ai;2

!

�m0i ðzi�1;xiÞ

2

2S0i ðzi�1Þ2�

m0i ðzi�1; xiÞ

S0i ðzi�1Þ2þ ai;1

!2

4 ai;2 �1

S0i ðzi�1Þ2

! .


Moreover, log xi is given as

log xiðzi�1;xiÞ ¼ � logð2pDÞ þzi�1ð1� 2gÞ

2�

1

2logðs2ð1� r2ÞÞ

�ðxi � Dðaþ b expðzi�1ÞÞÞ

2

2D expðzi�1Þ.

APPENDIX B. THE LOG-NORMAL MODEL

In this chapter, we use the following specification of the continuous time LNSV model:

d�st

�zt

" #¼

aþ b expð �ztÞ

aþ b �zt

" #dtþ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1� r2Þ

pexpð �zt=2Þ r expð �zt=2Þ

0 s

" #dBt;1

dBt;2

" #.

(B.1)

The EM scheme yields the discrete time dynamics

xiþ1

ziþ1

" #¼

Dðaþ bexpðziÞÞ

ziþDðaþbziÞ

" #þ

ffiffiffiffiDp



0 s

" #ei;1ei;2

" #,

(B.2)

which with a¼ b¼ 0 is equivalent to the ASV1 specification in Yu (2005).To adapt the above-described EIS algorithm, the following functions need

to be altered:

m0i ðzi�1;xiÞ ¼ zi�1 þ Dðaþ bzi�1Þ þ sr expð�zi�1=2Þðxi � Dðaþ b expðzi�1ÞÞ,

(B.3)

S0i ðzi�1Þ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiDð1� r2Þ

p, (B.4)

log xiðzi�1; xiÞ ¼ � logð2pDÞ � zi�1=2� logðs2ð1� r2ÞÞ

�ðxi � Dðaþ b expðzi�1ÞÞÞ

2

2D expðzi�1Þ.

(B.5)

This highlights the fact that the above EIS algorithm is easily adapted toother models cast in the form of EM-discretized stochastic differentialequations. For the computations summarized in Table 5, we use M ¼ 32draws in the importance sampler.


EDUCATION SAVINGS ACCOUNTS,

PARENT CONTRIBUTIONS, AND

EDUCATION ATTAINMENT

Michael D. S. Morris

ABSTRACT

This chapter uses a dynamic structural model of household choices onsavings, consumption, fertility, and education spending to perform policyexperiments examining the impact of tax-free education savings accountson parental contributions toward education and the resulting increase inthe education attainment of children. The model is estimated viamaximum simulated likelihood using data from the National LongitudinalSurvey of Young Women. Unlike many similarly estimated dynamicchoice models, the estimation procedure incorporates a continuousvariable probability distribution function. The results indicate that theaccounts increase the amount of parental support, the percent contribut-ing and education attainment. The policy impact compares favorably tothe impact of other policies such as universal grants and general taxcredits, for which the model gives results in line with those from otherinvestigations.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)00000260010

165

dx.doi.org/10.1108/S0731-9053(2010)00000260010

1. INTRODUCTION

Parents in the United States commonly help pay for their children’s collegeeducation, and for many families this can be a major expense. The CollegeBoard (2006a) estimates that the average yearly total cost, including tuition,fees, room, and board, for a private 4-year school was $30,367 per year forthe 2006–2007 academic year, and $12,796 for a public 4-year school. Theseexpenses have been rising rapidly. Total costs at 4-year private schoolsincreased an inflation adjusted 28 percent over the past decade, and 4-yearpublic schools rose an even larger 38 percent. Certainly some of the increasehas been financed by student aid packages including grants, scholarships,and subsidized loans, and the U.S. Department of Education (2005) reportsthat 63 percent of students in 2004 received some form of financial aid.The average annual net price after all forms of aid, however, still averaged$21,400 for private 4-year universities and $9,700 for public 4-yearuniversities and while aid packages have grown along with rising collegecosts, the inflation adjusted net price has still increased annually at anaverage of 2.1 and 2.6 percent for private and public universities,respectively, over the last decade (College Board, 2006a). So it falls uponthe individual students and their families to help pay increasingly more forcollege expenses. Using data gathered by the National Center for EducationStudies, Choy, Henke, and Schmitt (1992) found that in 1989, 67 percent ofparents contributed to their children’s college education and the averageannual amount was $3,900 ($6,340 in 2006 dollars), a trend that hasincreased more as the relative cost of college education has risen.

In response to increasing prices and reliance of parental assistance forcollege funding, a variety of programs including grants, subsidized loans,and tax-sheltered targeted savings accounts have been introduced. However,there has been very little investigation as to their impact on actuallyincreasing the number of students attaining a college degree. Dynarski(2000, 2002), Ichimura and Taber (2002), and Keane and Wolpin (2001) findvarying degrees of evidence that direct subsidies and grants to studentsincrease college enrollments, but there remains little evidence as to whatextent this then leads to an increase in final degree attainments.Furthermore, there has been no investigation as to whether tax-shelterededucation savings accounts (ESA) actually lead to greater number ofchildren attaining a college degree.

The goal of this chapter is to examine the impact of ESA programs onthe educational attainment of children by developing and estimatinga structural dynamic programming model of household decisions on having

MICHAEL D. S. MORRIS166

children, saving money, and making transfers to children in the form ofeducational funding. The model is estimated using a simulated maximumlikelihood procedure and data from the National Longitudinal Survey (NLS)of Young Women cohort. The estimated model is then used to examine apolicy experiment aimed to replicate the creation of an ESA program inorder to determine the impact of such accounts on actually increasing thenumber of college degrees attained. The results here indicate that theaccounts should increase both contributions and the attainment of collegedegrees, and that the impact compares favorably to some other policyoptions such as universal grants and general tax credits. The results for thegrants, used for comparison, are also in line with other investigations.

Section 2 of the chapter presents a literature review of related research,Section 3 of the chapter explains the model and estimation specification,including the numerical techniques associated with solving the model andthe simulated maximum likelihood procedure used to estimate the model.Section 4 describes the data, both in terms of how the variables in the modelare constructed and descriptive statistics of the sample used. Section 5discusses the estimated parameter results and the fit of the estimated model.Section 6 presents the policy experiments and Section 7 concludes thechapter.

2. LITERATURE REVIEW

Given the high costs of education, the fact that many children need supportfrom their parents to attain a college degree, and the importance of a collegedegree in determining future earnings and productivity, there has been apush in the last decade to introduce a variety of tax breaks aimed at not onlylowering the cost of education, but also to help parents contributemore toward their children’s college expenses. For example, the TaxpayerRelief Act of 1997 introduced two tax credits that parents can claim fordependants. The Hope Scholarship Credit is a tax credit for 100 percent ofthe first $1,000 of qualified tuition expenses and 50 percent of the next$1,000 of qualified tuition expenses. This credit can be claimed for each childsupported, but is only available in the first 2 years of postsecondaryeducation and is phased out at higher incomes ($40,000–$50,000 forindividual taxpayers, twice that for married households). The LifetimeLearning Credit is similar except that it is available for any year ofeducation (not just the first 2 years), but is only for 20 percent of up to$10,000 (only $5,000 prior to 2003) in expenses and is a single credit per

Parent Contributions and Education Attainment 167

family, not per child. More directly to help parental contributions, theTaxpayer Relief Act of 1997 also created a federal educational savingsaccount, now called a Coverdell Account, which is a custodial account forchildren under 18 in which $2,000 (was $500 before 2002) a year may beplaced and for which the earnings and distributions are tax-free as long asthey are used for qualified education expenses. Similarly, Section 529 of theIRS tax code in 1996 also allowed states to offer their own favorable taxtreatment savings accounts to fund qualified state tuition programs. Themost popular of these savings vehicles have been state-sponsored 529 plans,of which there are two general types. The first is a prepaid tuition plan wheremoney is contributed in a child’s name and locks in an associated percentageof college expenses at current tuition rates at one of the state’s publicuniversities.1 The other type of plan is an ESA where contributions thatcan grow tax-free, and generally withdrawals are also tax-free as long asthe money is spent on education. The popularity of the state-run ESAsis due to their higher contribution limits, where the only limit is that theycannot exceed expected education expenses. These 529 plans have grownsubstantially over the decade since their creation in 1996: from less than500,000 accounts averaging $4,959 in 1996 to over 8.8 million accounts withan average balance of $10,569 by 2006 (College Board, 2006b). Of theaccounts in 2006, 81 percent are of the traditional tax-sheltered savingsaccount as opposed to prepaid tuition accounts.

Despite the wide use and political popularity of these savings accounts,there is very little evidence on their impact on actual educational attainment.Several studies have looked at the impact of aid programs in the form ofgrants and loans targeted directly at students as opposed to parents. Usingdata from the introduction of the Georgia HOPE scholarship programas a quasi-experiment, Dynarski (2000, 2002) finds the availability of anadditional $1,000 subsidy increases attendance by 4 percent, a figure shefinds consistent with previous findings. Ichimura and Taber (2002) estimatea 4.5 percent increase in attendance from the availability of a $1,000 subsidyusing a reduced-form estimation derived from a model used by Keaneand Wolpin (2001). They actually estimate that a $100 tuition increase(a negative subsidy) would lower enrollment rates of 18–24-year-olds by1.2 percent. Interestingly, while finding that borrowing constraints areindeed tight for students (they cannot even support 1 year from borrowingalone), Keane and Wolpin do not find that allowing for easier loan accesswill increase attendance. Instead, they find that the major impact ofreducing the borrowing constraint is in a reduction of working by students.Perhaps more importantly, Keane and Wolpin also find that parental


transfers contingent on college attendance are not only prevalent, but dosignificantly increase the educational attainment of children.

These finding, though, still leave in question the effectiveness of ESAprograms. These accounts may or may not actually increase educationalspending. They could just be a tax break for parents who would have senttheir children to college anyway. In addition, it is unclear exactly to whatextent additional parental contributions will increase attendance, let alone thenumber of people who get a college degree. The related literature on targetedsavings, namely IRAs for retirement savings, gives rather mixed results.2

Answering the question regarding ESAs is made even more difficult becauseclearly contributions toward education are contingent on having children inthe first place, and if parents do save or adjust their spending behavior overtime to help make contributions, then to fully examine the impact of policieson parental contributions it is necessary to consider parents dynamic andinterdependent decisions on fertility, savings, and college spending together.

This chapter uses a modified version of the life-cycle model that is theworkhorse for economic research on household inter-temporal savings andconsumption. A very thorough review of the history of these models andtheir ability to deal with observed microeconomic facts can be found inBrowning and Lusardi (1996), including the enhancements used here such asliquidity constraints and precautionary savings. While these models arecertainly useful, it has been pointed out that life-cycle models really need toaccount for a whole range of related interdependent choices including,importantly for this analysis, fertility decisions.3 The model in this chaptermakes the choice to have children, the number of children to have, and thetiming of when to have children endogenous decisions within the life-cycleframework, along with decisions on consumption and savings in the face ofincome uncertainty and borrowing constraints. Furthermore, families alsochoose to make transfers to their children in the form of college educationalspending. Together, this allows for a rich relationship between the numberof children to have, how much to provide children with money for collegeeducation, savings, and consumption within the life-cycle.

3. THE MODEL

3.1. The Model

The model is a finite horizon dynamic programming problem where house-holds maximize their expected discounted utility over periods t ¼ 1,y,T.


In each period, households receive contemporaneous utility that dependsnot only on the level of consumption, but also directly on the total numberof children they have as well as the levels of education of those children.More specifically, the utility function allows for households to receivenonpecuniary utility from having children, and additional utility for thosechildren having a moderately high level of education, indicated by somecollege education beyond high school or a 2-year degree, and a high level ofeducation, indicated by a 4-year college degree or more. The contempora-neous utility at time t is

Uðct; nt; qmt; qht; �ct; �nt; �qtÞ ¼ðct�ctÞ

1�g

1� gþ lnt þ aqmt þ yqht (1)

l ¼ l1 þ l2nt þ l3ct þ �nt

a ¼ a1 þ a2qmt þ �qt

y ¼ y1 þ y2qht þ �qt

where ct is consumption at time t, nt is the total number of children at time t,qmt is the number of children with a moderately high education level (someeducation beyond high school) at time t, qht is the number of children witha high level of education (4-year college degree or more) at time t, ect, ent,and eqt are taste shocks to consumption, number of children, and educationlevels, respectively, and the rest (g, l, a, and y) are model parameters tobe estimated. This specification allows the marginal utility of the numberof children to depend on both the number of children and the level ofconsumption, along with a random shock. The marginal utilities ofadditional children attaining moderate and high education levels can alsodepend on the number of children at those levels and a random shock.4 Thefinal specification that is estimated allows some additional heterogeneity inthe utility parameters for children and education with respect to differentparental education levels, and the child parameters are allowed to furthervary in the first three periods, the details of which are shown in Appendix A.The shocks are assumed to be iid, with the multiplicative consumptionshock distributed log-normal (so the shock maintains positive consump-tion), and the other shocks distributed normal:

lnð�ctÞ � Nð0; s2cÞ; �nt � Nð0; s2nÞ; �qt � Nð0; s2qÞ (2)

A period in the model is set to last 6 years, which will allow for asmoother general evolution of assets than shorter periods.5 Households


choose a level of consumption and savings in each period. Youngerhouseholds, in periods 1 through 3 (corresponding to ages 18–36), alsochoose how many additional children to have in each of those 6-yearintervals. This in effect restricts households have all their children by anage of 36. In addition, households in the model are further restricted tohaving no more than four children in any one of the 6-year period, and nomore than five children in total.6 Children are assumed to cost an amountper child, per period, C, for three periods (i.e., until they are 18), and theamount is allowed to differ by household education level.

In the fourth period of a child’s life (when they are 19–24), householdshave the option to offer to help pay for that child’s college education, andit is assumed that parents only make these contributions to a particularchild in this period of the child’s life.7 At this time, the household choosesa one-time offer, ot, of a per-year amount to contribute toward the child’scollege education. Given this offer, the child then attains a certain level ofeducation. From the parent’s view, a child’s education level is a realizationof a stochastic process that depends on the amount of support offered, alongwith other possible demographic variables. Let djt be the education outcomefor child j in period t, where djtA{high school or less, some college or 2-yeardegree, 4-year college degree, or more}, as these are the categories that enterthe utility function. This outcome is given by

djt � DðdjtjxjtÞ (3)

so that djt is a realization of the conditional distribution D(djt|xjt), where xjtis a vector of characteristics. The stochastic process in Eq. (3) is specifiedas an ordered probit, and the vector of characteristics, xjt, includes aquadratic in the family contribution offer interacted with the level of theparents’ education, as shown in Appendix A. The amount the householdactually pays toward a child’s college education is then a product of the per-year offer made and the number of years of schooling the child attends.Furthermore, the offer amount is binding (i.e., the parents cannot renege onthe offer).

In order to identify the impact of offers on education attainment, themodel restricts parents to making the same offer to all children of eligibleage in a given 6-year period. This is necessary because the data only reportthe amount parents actually contribute toward their children’s education.As such, there is no data for children who did not go to college, and yet it isprobable that some of these children could have received support from theirparents. The identification here comes from assuming that those children


receive the same offer of support as their siblings attending school withinthe same 6-year period. When looking at the NLS data, this does not seemoverly restrictive as most families do not greatly vary the amount theycontribute toward their different children within a period. In fact, just overhalf made contributions that differed by less than $1,000 and almost 20percent of contributions differed by less than $50.

The restriction that children born within the same period receive the sameoffer, while providing identification for the impact of the offer, does implythat if offers positively impact the educational attainment of children,then one should observe that families with more children in college will onaverage be making larger contributions toward each of their children’seducation. It is not obvious that this should be the case. For example,a family with two children and both of them in college at the same time mayhave less money available to support either individually than if just one wasattending college. Looking at the NLS data, though, for families with twochildren born within 6 years of each other, the average per-child, per-yearcontribution in 1999 dollars is $5,721 for those with both children attendingcollege, and only $3,666 for those with just one child in college, which isconsistent with the assumed restriction.

Household income is assumed to be determined by

I t ¼ expfztbþ �wtg (4)

where the characteristic vector zt contains dummies for the levels ofeducation of parents interacted with a quadratic in age, as shown in moredetail in Appendix A. This gives a standard semi-log specification forincome. The income shocks is assumed to be distributed iid:8

�wt � Nð0;s2wÞ (5)

and the variance is further allowed to differ by household education asshown in Appendix A.

Because the impact of marital transitions is not a focus of this chapter,marital status is assumed to be constant over the agent’s life. Without thisassumption, the changes from single to married and vice versa will needto be modeled along with the associated changes to income and assets.Furthermore, the separate contributions toward their children’s educationfrom divorced couples would also need to be modeled. Not only would thisgreatly increase the complexity of the model, but also the relevant data fromone of the divorced parents (their contributions toward education as well asother data on asset accumulation and income) are not available, making it


impossible to estimate. Since the sample of stable continuously singleparents in the NLS is so small, the model here will actually only considercontinuously married families, which represent just under 60 percent of theNLS respondents. This assumption allows for a simpler model that focuseson the household savings and educational transfer decisions, and has thedata needed for estimation, though it clearly may limit to some extent theinferences drawn from the results. The level of education of the householdfor earnings consideration is also not modeled as a dynamic decision, and assuch is considered a constant household ‘‘type,’’ identified as the highestlevel attained.

At any period, t, then the household problem is to solve

maxfct;nt;otg

EXTt¼t

dt�tUðct; nt; qmt; qht; �ct; �nt; �etÞjOt

" #(6)

where d is the discount rate and Ot is the state space at time t (i.e., therelevant information at time t for making the decision). The maximization ismade subject to the following constraints:

ct ¼ ktð1þ rÞ þ I t �Cnht �XNt

n¼1

ant � ktþ1

ct � 0

kt � 0

nt þ 4 � ntþ1 � nt

nt � 5

4n4tot 2 ½0; ð1þ rkÞkt þ It �Cnht� ð7Þ

The first of these is the budget constraint where kt is the level of householdassets at time t, r is the real interest rate, It is household income at time t, nhtis the number of children living at home at time t, ant is the amount thehousehold pays toward child n’s college expenses at time t. The second andthird conditions imply that households have positive consumption and thatthey cannot take out uncollateralized loans, respectively. The fourth andfifth conditions restrict the child decisions as discussed above, namely thatno more than four additional children may be born in a period and that themaximum family size is five children. In the final restriction, where n4t isthe number of children of age 4 (i.e., of college age) in period t and ot is theper-year, per-child offer in period t, (so 4n4tot is the total amount spent ifall children of college age in the period attain a college degree), the offer is


restricted so that it must be nonnegative and, because the offer is binding,the parents are not allowed to offer more than can be coveredwithout taking out an uncollateralized loan (i.e., by keeping their net assetspositive).

The decision timing of the model is as follows. At the beginning of eachperiod the household receives a realization of ect, ent, eqt, and ewt. After therealization of these shocks, households receive income and pay expenses forchildren living at home and then make decisions for the period. In the firstthree periods, the choices of savings and the number of children to havein that period are made simultaneously. In periods 4–6, if parents havechildren of college age, the educational support offer and the savingsdecisions are made sequentially. First, parents choose a per-child, per-yearamount to offer their children in educational support. Parents then realizethe outcomes of their children’s education choices and make the appropriatepayments (to enforce the binding offer assumption) before proceeding tochoose a level of savings and consumption.

3.2. Model Solution

The household optimization problem defined in Eqs. (6) and (7) can berewritten in a dynamic programming framework based on the value ofentering period t with state space Ot, represented by the value function, Vt,that satisfies the Bellman (1957) equation as

VtðOtÞ ¼ maxct ;nt;ot½Uðct; nt; qmt; qht; �ct; �nt; �htÞ

þ dEðVtþ1ðOtþ1ÞjOt; ct; nt; otÞ�

The state space at time t consists of the beginning of period t assets, numberof children born in prior periods 1, 2, and 3, the number of children withmid- and high-levels education, parent education level, and the realizationof the income shock and taste shocks on consumption, children, andeducation. Given the current model assumptions and specification, however,the relevant choice set varies between different periods. For example,households only choose additional children in the first three periods.Furthermore, when children are going to college, households face anadditional, within-period, sequential choice of an education support offerbefore making consumption decisions. So, the value functions can be moreaccurately defined by considering different periods separately.


For the first three periods, families simultaneously choose the number ofchildren and consumption giving value functions

VtðOtÞ ¼ maxct;nt½Uðct; nt; 0; 0; �ct; �nt; �qtÞ

þ dEðVtþ1ðOtþ1ÞjOt; ct; ntÞ� t ¼ 1; 2; 3 ð8Þ

where the expectation in Eq. (8) is with respect to the stochastic shocks ect,ent, eqt, and ewt. In periods 4–6, families no longer make a choice on havingchildren. Instead, they first choose an offer of college support for childrenof college age, and after realizing the educational outcome and makingappropriate payments, choose consumption. As such, the value functionbecomes

VtðOtÞ ¼ maxot½Ed ðVt:5ðOt:5ÞjOt; otÞ�

Vt:5ðOt:5Þ ¼ maxct½Uðct; nt; qmt; qht; �ct; �nt; �qtÞ

þ dEðVtþ1ðOtþ1ÞjOt:5; ctÞ� t ¼ 4; 5; 6 ð9Þ

where the ‘‘.5’’ sub-period shows the sequential nature of the offer andconsumption decisions. The Ed expectation is taken over the educationaloutcomes of the children of college age in period t. That outcome andthe associated college payments are updated into state space point Ot.5,before the consumption decision is made. After period 6, the only choicehouseholds make in a period is an amount to consume, so the value functionis simply

VtðOtÞ ¼ maxct½Uðct; nt; qmt; qht; �ct; �nt; �qtÞ

þ dEðVtþ1ðOtþ1ÞjOt; ctÞ� t46 ð10Þ

In the final period, VT, consists only of the contemporaneous utility portion(i.e., VTþ1(.) ¼ 0).

The finite dynamic programming model laid out in Eqs. (8–10) can besolved using backward induction. However, since there is no analyticsolution for this model, the solution is done with numerical techniques. Themethod used here is based on one proposed and used by Keane and Wolpin(1994, 1997, 2001). To solve the model you must determine the expectationsof next period value functions, that is E(Vtþ1(Otþ1)), which following Keaneand Wolpin (1994) can be referred to as Emaxt. The expectation for Emaxtinvolves multiple integration and is approximated using Monte CarloIntegration. Emaxt needs to be solved for every possible combination of


state space values in Otþ1. In this model Otþ1 2 Rþ �N �N �N �Q�Qwith assets being a positive real number, the number of children in periods 1,2, and 3 an element of N ¼ {0,1,y, 4} and the number of children withmid-level and high-level education an element of Q ¼ {0,1,y, 5}. Since thehousehold education level does not change, it can be suppressed from thestate space and the model solved separately for each type.

The specification restrictions in the model on the fertility process limitsthe number of possible different combinations of children born at differentperiods and their educational attainment in the state space to a manageablenumber of state points for which to solve Emaxt. However, since assetsare a continuous variable, the solution still cannot be calculated for everystate variable combination and even if assets were to be discretized ina reasonable way, the resulting state space would still be very large. Thisdimensionality problem is exacerbated when the model is estimated becausethe solution must be recalculated many times during the estimationoptimization. To overcome this, the model will be solved for a subset ofasset values, along with all possible combinations of children and educationlevels, and least squares will be used to interpolate the remaining Emaxtvalues. For Vt.5(.) in periods 4–6 and for Vt(.) in periods other than 4–6, theinterpolation is based on the regression of directly solved Emax valueson the contemporaneous utility evaluated at the means of the stochasticcomponents and for Vt(.) in periods 4–6 on a quadratic in assets.9

Since there is no data on households beyond their late 50s, there is still aneed to specify and estimate a terminal condition of some sort to fit andmotivate the life-cycle decisions. The specification used here mimics a fulllife-cycle model, and has income after period 7 (after age 63) modeled asa portion of expected average lifetime income plus a random shock, and thefinal end point is set to T ¼ 11 (corresponding to a model end at age 84).The portion parameter and shocks are allowed to differ by householdeducation as shown in Appendix A, and these parameters dictating theterminal condition are estimated along with the others of the model.

3.3. Estimation

The model is estimated using a simulated maximum likelihood procedure.10

For a given set of parameter values, a likelihood function value can beconstructed based on simulations from the solution to the dynamicprogramming problem and from this the optimal parameters can be foundvia numerical optimization. Several previous studies have utilized a similar


estimation procedure for discrete choice models.11 Since the model in thischapter contains two continuous choices, the exact same procedurecannot be followed.12 However, the same concept applies, but instead of aprobability simulator, a nonparametric kernel density estimator is used withthe simulated samples to construct the elements of the likelihood function.

For a single household, i, the data provide an observable sequence of statepoint values for assets, children, spending on college, and children’seducational outcomes. Since the household decisions relating to theseitems in the model depend only on the current state space variables andexogenous, independently distributed shocks, i’s contribution to thelikelihood, Li can be written as a sequence of conditional densities:

Li ¼YT�1t¼0

f ðOitþ1jOitÞ (11)

where f(.) is the pdf of Otþ1 conditional on Ot. The sample likelihood is thencalculated as the product of these individual likelihoods. Calculating thelikelihood is still a problem because the functional form of f(.) is unknown.Given a set of the parameters, however, the model can be solved andtherefore a sample of values for Otþ1, given a value of Ot, can be simulated.From this simulated sample, a density estimator can be calculated and usedto estimate the value of f(Otþ1|Ot).

There is a wide literature on density estimation and techniques forcontinuous and mixed variable distributions and the chapter here adoptsa smooth kernel density estimator.13 Traditional kernel estimators can beinaccurate and difficult (if not impossible) to implement when used in higherdimension space. In the model here, however, no more than two observedstate space values change between any two observed periods in which theconditional density must be estimated. The remaining values are fixed.As such, it is never necessary to estimate a joint density for more thantwo variables, which can be done accurately. After period 6, the number ofchildren and their education levels are fixed and just assets change, requiringjust a one-dimensional conditional density of next period assets. In theeducation periods, 4–6, the number of children is fixed, but there is an offerdecision, and education levels and assets are changing. Since the decisionsare made (and outcomes realized) sequentially, though, there are really threeindependent conditional densities for these periods. First, a one-dimensionaldensity estimator is needed for the offer given the initial state. Next, atwo-dimensional estimator is used for the changing mid- and high-levelseducational attainment of children given the offer and initial state, and


finally a one-dimensional estimator of next period assets. In the first threeperiods, only the number of children born in the current period and assetsare changing, requiring only a bivariate density estimator.

For a univariate continuous density estimator, a standard Gaussiankernel is used:

Kðs;Si; hÞ ¼1ffiffiffiffiffiffi2pp exp �

1

2

s� Si

h

� �2 !

(12)

with Silverman’s (1996) plug-in value for an optimal window width for theGaussian kernel, h ¼ 1:06sn�1=5, as the smoothing parameter, where s isthe standard deviation of the variable in the sample data and n here is nowthe sample size.14 The estimated density is then

f ðsÞ ¼1

nh

Xni¼1

Kðs;Si; hÞ (13)

A bivariate product kernel form (see Scott, 1992) is used for bivariatedensities:

f ðs1; s2Þ ¼1

nh1h2

Xni¼1

K1ðs1;Si1; h1ÞK2ðs2;Si2; h2Þ (14)

The bivariate densities in the likelihood are for either two discrete variables(the two educational outcome categories in periods 4–6) or mixed with acontinuous variable (assets) and a discrete variable (new children in periods1–3). For the continuous portion I again use a standard Gaussian kernel inEq. (12). For the discrete values, the variables here are not just categorical,but also are ordinal. To take advantage of this, I utilize a variant of theHabbema kernel, which was found to be highly effective by Titterington andBowman (1985) with

Kðs;Si; hÞ ¼ ljs�Si j (15)

where l here is a smoothness parameter. In this kernel, h, is set toan appropriate weight by setting h ¼

PJ�1j¼0 l

j, where J is the number ofdiscrete distances possible between s and other data points in the sample.The amount of smoothing is then controlled by l. For estimation, l is setto 0.3.15

The estimation procedure works as follows. First, an initial guess ofthe parameters is made and the model is solved for these parameters. Thevalue of the likelihood function is constructed using the model solution and


simulated density estimators Eqs. (12–15). The likelihood is checked to see ifit is maximized, and if not, the guess is updated and the procedure repeatsitself. See Appendix B for a more detail on these estimation steps.

4. THE DATA

4.1. Sample Construction and Variable Definitions

The data are taken from the Young Women cohort of the NLS, whichconsists 5,159 women who were 14–24 years old in 1968. Surveys wereadministered every year from 1968 through 1973 and basically bienniallysince. The results here use a sample including only the years through 1999in order to allow for the baseline parameters of the model to be estimatedfor a time period where the existence of recent education policy initiativeswould not have had a sizeable impact. When matching the 6-year periodin the model with the data, the first period is matched to ages 18–23 ina household’s life, the second to ages 24–29, and so on. The age of thehousehold is measured as the age of the women followed in the NLS data.

While the NLS collects data on a wide variety of topics, the relevant dataused in this chapter are on assets, income, children, education, maritalstatus, children’s education, and spending on children’s education. Assetsare measured as total net household assets, excluding vehicles. Comprehen-sive questions on assets were only asked in 1968, 1971–1973, 1978, 1983,1988, 1993, 1995, 1997, and 1999. If more than one observation of assets isavailable within a 6-year period of the model, the earlier dated value is usedbecause the model treats the assets as the assets available when entering aperiod. Assets are frequently subject to measurement error when informa-tion is collected by survey, so a random measurement error is specified asshown in Appendix A.

Income is measured as the total income of a woman and her spouse.Income information is collected with every survey, though with somevariation in detail. To fit the 6-year periods, the income amount is calculatedas six times the average of the annual income observations within theappropriate age range.

Education information for the respondents and spouses is updated inevery survey, along with marital status. The parent’s education enters themodel in the wage equation and as a determinant of the educationattainment of children. Since the model assumes that the level of educationis constant for a household’s lifetime, the household education level is


measured as the highest level of education attained by either parent in ahousehold. The education level is then grouped into one of two categories,those with a college degree, and those without, giving two education ‘‘types’’of households, with 56 percent having no degree. The limit to two educationcategories is because the sample sizes within more subdivided educationgroups become very small. Marital status, like parental education, is alsoassumed to be a constant in the model. As explained previously, the sampleis restricted to continuously married households, which are defined as therespondent being married before age 36 and in no subsequent interview aftergetting married are they no longer married. This should be taken intoaccount when trying to draw too great an implication from the exact pointestimates of the model, though perhaps general implications would berobust even to some less stable households.

In every interview, information is collected for each of the respondent’schildren living in the household, including age but not education.Furthermore, in 1999 a full child roster was collected, which includeschildren’s birth-dates, and level of education. From this information,children can be identified as being born in the first, second, or third period ofthe model and the appropriate educational attainment category can also beassigned for each child. As previously mentioned, the model assumeshouseholds can only have children in the first three 6-year periods, and thatthey have no more than five children, and no more than four within a givenperiod. Only 10.78 percent of the NLS sample violates one or more of thesethree restrictions and will not be used. This restriction, while perhaps nota large loss of data, should be kept in mind when considering the types ofhouseholds for which the results are estimated.

Beginning in 1991 and subsequently in 1993, 1995, 1997, and 1999, thesurvey includes questions about the college enrollment of children and theamount of financial support parents provide toward college for each childwithin the past 12 months, though there is no data on the total amount spenton a child’s postsecondary education. The offer in the model is measuredas the average annual contribution parents made toward postsecondaryeducation for children born in the appropriate period. The amount of totaleducation spending is then computed as four times the offer for childrenearning a 4-year college degree, and two times the offer for those childrenwith the some college. While this total is not matched to any particularfigure in the data, it is used to update the household’s evolving assets.

The original NLS of Young Women sample began with 5,159respondents. The sample used in this chapter is restricted to those womenwho remained in the data set through 1999 and reported information on all


relevant variables. One impact of this is that it somewhat overly excludesthe older women in the sample because data on contributions toward theirchildren’s education was only gathered starting in 1991, meaning some ofthese women had children already educated before any information on theircontributions was collected. However, there is no reason to suspect thatthese women are systematically different from the slightly younger womenin the cohort. The sample is further limited to those women who meet thefertility constraints and stable marriage criteria outlined above, and outlierobservations for assets were removed.16 This leaves a sample of 556households with 3,139 household-period observations.

4.2. Descriptive Statistics

Table 1 shows some descriptive statistics for the data. Asset accumulationshows a typical average increase with age and a large variance, withhouseholds with a college degree having higher assets. When looking at thenumber of children, less-educated households have more children, averaging2.4 per household versus 2.2 for households with a college degree.Not surprisingly, the timing of when to have children is also very different.College-educated households have very few children during ages 18 through23, averaging 0.5 per household. Conversely, this is when householdswithout a college degree have the most children, averaging 1.2 children perhousehold. At older age ranges, it is the college-educated households whohave more children.

For the entire sample, 82 percent of households with children attendingcollege helped pay for college. This rises to almost 95 percent for parentswith a college degree, and is still 68 percent for parents without a collegedegree. The average per-year contribution made by families to financiallysupport their children in college was $4,274 in 1999 dollars ($2,054 forparents without a college degree and $6,336 for those with a college degree).This indicates a significant amount of money going toward higher educa-tion, especially when multiplied over several years and several children.

Looking at the distribution of education attainment for children overall,42 percent do not attend college, 36 percent attend some college but do notearn a 4-year degree, and 22 percent earn a 4-year degree or more. Thisvaries by parental education, with the percent earning a 4-year degree risingto 37 percent for parents with a college degree, while falling to 13 percent forparents without a college degree. This relationship between parents and


children’s education has been consistently documented (see Haveman &Wolfe, 1995 for a review).

5. PARAMETER ESTIMATES AND MODEL FIT

The estimated model parameters are reported in Table 2. There are a total of49 parameters in the specification as summarized in Appendix A, estimatedover the 3,139 household-period observations in the sample. A few of the

Table 1. Descriptive Statistics.

All No College College

Number of households 556 311 245

Assets at the age:a,b

24–29 24,140 19,291 30,296

(12,140) (5,628) (17,887)

30–35 59,858 45,943 77,523

(41,957) (28,166) (56,912)

36–41 85,820 62,472 115,458

(56,331) (38,024) (80,958)

42–47 131,628 89,729 184,814

(85,364) (57,938) (137,546)

48–53 218,947 157,730 304,100

(133,470) (85,000) (213,700)

Childrena 2.313 2.395 2.208

(1.213) (1.293) (1.098)

Children born when 18–23 0.896 1.225 0.478

(1.040) (1.110) (0.761)


(0.878) (0.863) (0.894)


(0.826) (0.714) (0.898)

Contributions to collegea,b 4,273 2,054 6,336

(5,649) (3,157) (6,603)

Percent contributing to college 81.71% 67.72% 94.71%

Children education attainment

Percent with no college 42.37% 52.49% 23.10%

Percent with some college 35.73% 33.72% 39.56%

Percent with 4-year degree or more 21.90% 13.79% 37.34%

aMeans with standard deviation in parenthesis below.bDollar amounts are 1999 dollars.


parameter estimates have a direct interpretation of some interest andprovide an indication of the appropriateness and fit of the model. Thediscount rate for a 6-year period, d, is estimated at 0.8185 which is theequivalent of 0.967 per year, a reasonable discounting rate. The child-costestimates, Ch for parents without a college degree and ChþCc for parentswith a college degree, are $42,403 and $79,396, respectively, per 6-year

Table 2. Parameter Estimates.

Utility function

g l1h l11h l12h l13h l1c l11c l12c l13c2.7463 0.5350a 6.2868a 8.6061a 1.7350a 0.2882a �0.8445a �3.1622a 4.2669a

(0.0262) (0.3537) (0.0557) (1.7222) (0.2395) (0.0812) (0.1068) (0.3211) (1.4495)

l2h l2c l3 a1h a1c a2h a2c y1h y1c�0.7749a �0.1622a 0.0011b 0.0099a �0.0059a �0.9071b 0.5393b 0.0228a �0.0103a

(0.5963) (0.4515) (0.0054) (0.0000) (0.0001) (1.1773) (0.5104) (0.0001) (0.0002)

y2h y2c1.3717b 0.7994b

(0.6621) (1.1186)

Children’s education

m1 m2 k1 k2 k3 k4 k50.0722 1.3001 1.1021c �3.8651a 0.4039 0.7197c 0.0989a

(0.0123) (0.0155) (0.0869) (1.5151) (0.0157) (0.0533) (0.7014)

Income

b0h b1h b2h b30h b0c b1c b2c b3c11.1928 0.3275 0.9272 �0.3938 �0.1705 0.0068 �0.9460 0.1911

(0.2754) (0.0382) (0.0025) (0.0952) (0.1891) (0.0303) (0.0002) (0.0734)

bh bc0.9663 �0.0101

(0.3240) (0.1945)

Other parameters

d ch cc

0.8185 42403 36993

(0.0113) (9504) (5827)

Error distribution

sc sn sq swh swc srh src sZ0.0998 8.0146a 1.4151c 0.5451 0.8381 22896 1.8436 0.0301

(0.0068) (0.9691) (0.8247) (0.0109) (0.0023) (4031) (0.2690) (0.0911)

Parameter estimate with standard error in parenthesis below.aParameter multiplied by 109.bParameter multiplied by 1012.cParameter multiplied by 104.


periods. This is reasonably in line with other estimates. For example,adjusting the estimates based on USDA surveys (see Lino, 1996) to 6-yearamounts gives $35,630 for households in the lowest third of income, $48,410for households in them middle-third of income, and $70,610 for householdsin the highest third of income. The estimate for g, the coefficient of relativerisk aversion, is 2.8, which is within range of prior estimates, thoughthose estimates generally range quite widely. Last, the estimated tasteparameters for children’s education, while not of interest in size themselves,are suggestive in their relation between college-educated households andless-educated households. The utility of additional education is greater forless-educated households (i.e., a1co0, y1co0), but the decrease in marginalutility is more rapid for such households (i.e., y2cW0).

Table 3 shows the predicted probabilities, based on the estimated modelparameters, for a child’s education level for different amounts of parentalfinancial support and different parental education levels. These fit with thewell-established link that more educated parents tend to have highereducated children at all level of financial support. For example, evaluated atthe entire sample annual average contribution of $4,273 per year, childrenwith a college-educated parent have a 31 percent chance of earning a 4-yeardegree versus 18 percent for children without a college-educated parent. Thegap is even wider when factoring in the fact that more educated householdscontribute more money toward their children’s education. For example, theprobability of a child earning a 4-year degree is only 14 percent for parentswithout a college degree contributing the annual average of $2,054 forthat group. For college-educated parents contributing the annual average of

Table 3. Estimated Education Outcome Probabilities.

Parental Contribution

$0 $2,054 $4,273 $6,336 $10,000

Parents without college degree

Probability no college 52.88% 44.52% 37.14% 31.88% 26.00%

Probability some college 37.44% 41.70% 44.45% 45.66% 46.06%

Probability 4-year degree 9.68% 13.78% 18.41% 22.45% 27.94%

Parent with college degree

Probability no college 37.00% 29.40% 23.21% 19.08% 14.75%

Probability some college 44.49% 45.97% 45.79% 44.72% 42.42%

Probability 4-year degree 18.51% 24.63% 31.00% 36.20% 42.84%


$6,366 for that group, the probability of a child earning a 4-year degree risesto 36 percent, and increase of 22 percentage points.

If increasing parental contributions are indeed to have an impact oneducational attainment, then the model estimates should indicate thatgreater financial support has a significant impact on the education outcomeprobabilities. As seen in Table 3, increasing the parental contribution doeschange the probability distribution of education outcomes and raises theprobability of a child earning a 4-year degree for all households. Evaluatedat annual support of $0, $2,054, $4,273, $6,366, and $10,000 (the means forno-college households, the entire sample, college households, and two moreextreme values) the probability of earning a college degree increases from10 to 13, 18, 22, and up to 28 percent for children without a college-educatedparent. A similar increase is seen at those support levels for children ofparents with a college degree, rising from 19 to 25, 31, 36, and 43 percent,respectively.17 This suggests that a policy that successfully increases parentalcontributions to education should also increase the final educationalattainment of children, though the marginal impact is nonlinear (noticethat the almost $4,000 increase from $6,366 to $10,000 increases theprobability of a college degree by about the same amount as the prior $2,000increase) and the overall impact of such policies would need to take intoaccount all of the related decisions in the model.

Before turning to the policy simulations to evaluate educational savingsaccounts, consider some information on the fit of the model. Table 4presents some summary statistics for the actual data and as predicted by asimulated sample of 10,000 households based on the model and parameterestimates. As the table shows, the model does fairly well in matching theaverage characteristics in the data. The model captures the increasingaccumulation of assets, though it slightly understates average assets levelsfor younger households, overstates them in middle age and understates themagain as households move into their 50s. Still, the mean levels are not too faroff in general, and only for young households with a college degree canwe reject the null hypothesis of equal means at a 10 percent level. The modelalso does fairly well in predicting the average number of children, 2.31 in thedata versus 2.28 in the simulation, and does equally well for different agesand education levels, and at no point can we reject the null hypothesis ofequal means.

The model does a reasonably good job in matching the percentage ofchildren with different levels of education, and statistically you cannot rejectthe hypothesis they are the same distribution. The model simulation alsoreasonably matches the average amounts contributed by households to their


Table 4. Actual and Predicted Outcomes.

Actual Predicted P-Value�

All households

Assetsa (mean by age)

24–29 24,140 21,537 (0.118)

30–35 59,858 57,374 (0.416)

36–41 85,820 86,872 (0.921)

42–47 131,628 138,429 (0.296)

48–53 218,947 216,318 (0.996)

Children (mean) 2.313 2.284 (0.279)

Children born when 18–23 (mean) 0.896 0.888 (0.523)



Contributions to collegea (annual mean) 4,273 4,414 (0.296)

Percent contributing 81.71 74.45 (0.000)

Children education attainment (percent) (0.378)

No college 42.37 40.56

Some college 35.73 36.28

4-year Degree 21.90 23.16

By parent education

Households without college degree

Assetsa (mean by age):

24–29 19,291 18,778 (0.912)

30–35 45,943 43,647 (0.423)

36–41 62,472 60,422 (0.576)

42–47 89,729 98,867 (0.196)

48–53 157,730 155,947 (0.889)

Children (mean) 2.395 2.370 (0.457)






Children education attainment (percent): (0.916)



4-year degree or more 13.79 13.35

All households

Assetsa (mean by age)

24–29 30,296 26,151 (0.081)

30–35 77,523 75,998 (0.734)

36–41 115,458 118,981 (0.719)

42–47 184,814 190,731 (0.501)

48–53 304,100 300,142 (0.772)


children’s college education. As Table 4 shows, overall the simulated sampleaverage is $4,414 per-child, per-year, while the actual data average is $4,273,but the difference is statistically not significant. The mean comparisonis equally good when looking within different education levels. However,the model does noticeably under-predict the percentage of parentscontributing (74 percent predicted vs. 82 percent in the data). Thediscrepancy is statistically significant and holds across parent educationlevels. A further investigation into the distribution of the offers shows highervariation in offers predicted versus the data. In particular, in many cases themodel tends to predict an offer level of zero when the data often have a verysmall, but positive offer, and the model tends to predict just slightly morehigh-end offers.

6. POLICY SIMULATION

This section uses the model parameter estimates and a simulation of 10,000households, proportioned by education as is in the data, to examine howtheir savings and educational contribution decisions would change, andthe impact of these changes on children’s education attainment, whendifferent policies are implemented.18 In particular, a tax-advantaged ESAis introduced. Money invested in the ESA is allowed to grow tax-freeand the earnings are not taxable as long as they are spent on education.19

Table 4. (Continued )

Actual Predicted P-Value�

Children (mean) 2.208 2.119 (0.404)






Children education attainment (percent) (0.618)



4-year degree 37.34 36.99

�P-values are probability of type-1 error for t-test with unequal variances for difference in

means and for chi-square test of same distribution for education attainment.aDollar amounts in 1999 dollars.


Any amount not spent on education becomes taxable. Since taxes are notdirectly modeled here, the impact of the tax break is modeled as an increasein the return on savings equal the household marginal tax rate.20 Thepenalty for assets in the ESA not spent on education is an amount equal tothe current marginal tax rate times the account balance. Parents can onlycontribute to these accounts if they have children and are not allowed tocontribute more than $7,500 per-child, per-year. For comparison, both adirect college subsidy and a college spending tax credit are also considered.The subsidy is a flat $1,000 education grant. The impact in the model isapproximated as a default offer to all children, so that, for example,when parents offer nothing, the child still has a $1,000 offer. Finally, atax credit available on all money spent on children’s education is simulated.With no direct taxes in the model, the impact is that parents don’tactually have to pay part of their offered contribution when children goto school, the difference being equal to their marginal tax rate times thecontribution. Table 5 presents simulated sample statistics for the differentpolicies, with the base simulation being the original model as estimated anddiscussed above.

The ESA actually gives the largest increase in contributions from parents.Over all households, the average annual, per-child parental contributionincreases by $2,097. The increase is larger for college-educated households,but is still a sizeable $1,603 increase for less-educated parents. The increase isnot just from giving a tax break to households already contributing towardcollege expenses. All households groups show an increase in the percentageof parents contributing: from 84 to 91 percent for parents with a collegedegree and from 62 to 76 percent for parents without a college degree.Furthermore, the accounts do actually generate a net increase in savings ascan be seen in Table 5, which also shows that the ESAs reach a sizeableaverage of $48,000 at their peak before parents enter their 50s. The policygoal, presumably though, is to increase education attainment, not justcontributions. With the higher contributions and higher contribution rate,the percentage of children earning a 4-year degree increases 4.78 percentagepoints, while the percentage without any college education falls by 4.68. Theimpact, particularly on the probability of a 4-year degree, is slightly larger forcollege-educated households, at a 5.69 increase. The cost of this program, inthe form of lost revenue from taxable earnings on the education savings,comes to almost $6,000 per household. However, much of this lost revenue ison earning from savings that were not present from the base scenario,making this estimate hard to compare. If the amount of savings is held to thebase simulated levels, the lost revenue is just $2,893 per household.


Table 5. Policy Simulation Outcomes.

Base ESA Grant Tax Credit

All households

Annual contributions to college (mean) 4,414 6,511 3,696 5,763

Percent of parents contributing 71.55 82.88 55.21 77.55

Children education (percent)

No college 40.56 35.88 35.81 38.10

Some college 36.28 36.18 37.67 36.64

4-year degree or more 23.16 27.94 26.52 25.35

Children (mean) 2.28 2.34 2.29 2.28

Total assets (mean by age)

24–29 21,573 24,379 21,086 22,015

30–35 57,374 75,125 57,934 59,101

36–41 86,872 101,277 84,956 88,854

42–47 138,429 149,472 139,246 138,022

48–53 216,318 224,015 217,818 218,997

Educational savings (mean by age)

24–29 1,511

30–35 13,447

36–41 31,638

42–47 48,573

48–53 36,216

By parent education

Household without college degree

Parental contributions to college (mean) 2,215 3,818 1,372 2,966


Children education (percent):

No college 52.41 47.90 46.15 49.26

Some college 34.24 35.23 37.13 35.32


Children (mean) 2.37 2.42 2.39 2.37


24–29 18,778 22,278 19,647 19,024

30–35 43,647 53,222 44,005 44,291

36–41 60,422 78,654 58,851 59,921

42–47 98,867 108,638 98,807 97,553

48–53 155,947 160,272 156,264 154,997


24–29 685

30–35 7,267

36–41 26,002

42–47 38,814

48–53 20,370


For a comparison of the magnitude of these results, consider also thegrant and tax credit. These policies are much more in line with thoseexamined in previous studies discussed in the introduction, though again themodel here more directly considers the impact on parental behavior. The$1,000 grant has the smallest impact on the average amount contributed byparents, and the average is lower than in the base simulation. However, thatis to be expected since every child is in effect already receiving a $1,000 per-year contribution. Interestingly, the average contribution from parents doesnot decline by a full $1,000 and on net the annual per-child contributionincreases about $300. The percentage of parents contributing also fallsnoticeably, but again 100 percent of children are receiving a $1,000contribution from the new grant. So while only 55 percent of parentscontribute above and beyond the grant, a decline of just over 16 percentagepoints from the percentage contributing before the grant, there is still asizeable impact on the distribution of education outcomes for children. Thepercentage of children with no college education falls by almost 5 percentagepoints. This is consistent with the 4 and 4.5 percent attendance increases


Base ESA Grant Tax Credit

Household with college degree

Parental contributions to college (mean) 6,410 8,939 5,924 8,007


Children education (percent)

No college 24.07 20.11 21.48 23.31

Some college 38.40 37.21 38.65 37.51


Children (mean) 2.12 2.20 2.11 2.12


24–29 26,151 27,011 25,656 26,959

30–35 75,998 98,129 76,624 79,824

36–41 118,981 128,518 117,644 122,351

42–47 190,731 202,620 191,449 191,728

48–53 300,142 312,605 301,333 302,975


24–29 2,202

30–35 18,531

36–41 38,532

42–47 62,041

48–53 51,268


estimated by Dynarski (2002) and Ichimura and Taber (2002), respectively,and gives some additional validation for the model. The impact is actuallygreater for children from less-educated households, where percentage with nocollege education falls over 6 percent, while the percentage with a 4-yearcollege degree rises by 3.4 percentage points. The impact on educationoutcomes here does differ from the ESA. In particular, the grant appearsslightly more effective at reducing the probability of children having nocollege education at all, particularly for less-educated households. However,the ESA has a larger impact on increasing the probability of earning a 4-yeardegree, as it increased the number of households making sizeable contribu-tions of support. The expected cost of the grant here comes to $4,146 perhousehold, equal to the middle of the range of the estimated costs of the ESA.

The tax credit generates a larger increase in average contributions thanthe grant, but still less than the ESA. It also increases the percentage ofparents contributing, but again not by as much as the ESA policy and thenet impact on education outcomes is the lowest here. Compared to thegrant, which had a somewhat larger impact on less-educated households,the tax credit has a larger impact on households where parents have acollege degree. Still, the impact is less on average for all households, and theexpected cost of the credit is $5,775 per household, as costly as the highestcost range estimate for the ESA policy.

7. CONCLUSION

The model presented here gives structural estimates of a dynamic, life-cyclemodel with endogenous choices of having children, spending on children’scollege education, savings, and consumption. The model also allows forborrowing constraints, uncertain lifetime income, and allows for hetero-geneity between parents with different levels of education. The model issolved numerically and estimated with a simulated maximum likelihoodprocedure using data from the NLS. The estimated model generally capturesthe feature of the data, including that most parents contribute toward theirchildren’s college education and that the amounts are sizeable, averagingover $4,000 per-year, per-child. The estimated model is then used to runpolicy experiments to gauge the impact on children’s educational attainmentof programs aiming to increase parental support.

The policy simulations suggest that a tax-advantaged ESA wouldgenerate new savings and have a sizeable impact on both parentalcontributions and education attainment. The average increase over the


base simulation for contributions was over $2,000 per-year, per-child, andthere was an increase in the probability of earning a 4-year degree of over4 percent on average. The impact was slightly greater for households wherethe parent had a college degree, but the finding was still a 3.36 percentincrease for less-educated households. The impact of the savings accountwas generally greater than a $1,000 universal grant or a traditional tax crediton money spent on education. The universal grant did have a slightly largerimpact at reducing the percentage of children having no college education,particularly among less-educated households, though the ESA still had aslarge an impact, if not larger, on the probability of earning a 4-year degree,and the estimated cost of the ESA program was not greater than the grant.A traditional tax credit for education expenses underperformed the othertwo policies on all measures.

The limitations of the model used here, though, should be factored inwhen considering further implications of these results. Due to datalimitations and computational complexity, the model was only estimatedfor two-parent households and so the results, particularly any exact estimatesof the impact of the policies considered, are not universal. However, it seemslikely that the general relative implications would be robust to a broadersample and more complex model, with the results here being at least agood indication, especially considering the current lack of evaluation ofthe educational impact of ESAs. The model also holds constant a variety offactors that might also change as parents spend more on college education.In particular, there is no specific modeling of the impact of parentalcontributions on student financial aid or the price of education, outside ofthe generally estimated relationship between parental support and studenteducational attainment, which does show a diminishing marginal impact.In this aspect, the results could overstate the impact of increased parentalsupport. However, it is not clear that would be the case, as such effects mightalready be captured in the marginal impact estimated, and mightadditionally just be played out in the redistribution of college attendancebetween schools of different costs and qualities, as the analysis here doesnot allow for such differentiation of school type and there is evidence ofsubstitution between private and public 4-year schools in response to price.Of course, a more detailed interaction of the parents and children’s jointdecision would be a useful extension for future research. Still, the generalconclusions regarding the success of ESAs in generating new savings,increasing parental support for education both in amount and percentagecontributing, and the resulting increase in the probability of children earninga higher-education degree are worth noting, along with its generally strongperformance on these aspects relative to some other policy options.


NOTES

1. If you choose to use the money somewhere other than one of the state’s publicschools, you will get the amount you contributed to the account, but often withoutany earnings.2. For example, see Hubbard and Skinner (1996), Poterba, Venti, and Wise (1996)

and Engen, Gale, and Scholz (1996).3. See, for example, Browning and Lusardi (1996), Coleman (1992), and Keane

and Wolpin (2001).4. The random shocks in the utility function allow for unobserved variation in

behavior necessary for estimation, otherwise the model gives a unique choice for agiven set of state space values. The parameterization allows for complementaritiesbetween consumption and the number of children (i.e., l3), but in the interest ofparsimony, such complementarities were not included for the quality variables. Ina robustness check, inclusion of such parameters was found to be statisticallyinsignificant.5. The 6-year period closely corresponds with the frequency that asset information

is collected and the general biennial data collection for the NLS as will be explainedlater. Furthermore, the longer periods will greatly reduce the computationalcomplexity of the model solution.6. These restrictions impact only a small portion of the data and significantly ease

the computational burden of solving the model. Within the NLS data, only 7.18% ofhouseholds have children when older than 36, only 1.94% have more than 4 childrenin any single 6-year period and only 4.27% have more than 5 children. In all, only10.78% of the NLS sample violates one or more of these three restrictions.7. This assumption again significantly eases the computability of the solution and

does not impact very many households. In the NLS data only 14.45% of childrenattending school received support after age 24, and for most of these that supportcame within the next 2 years.8. While earnings certainly may be argued to show short-term persistence, the iid

assumption is not unreasonable for the longer 6-year periods of this model.9. Keane and Wolpin (1994) find this style of approximation works well, and

investigation in a simplified framework here suggested these interpolations fit well.10. For a nice summary of simulation-based estimation techniques, see Gourieroux

and Monfort (1993).11. Probability estimators utilizing simulated samples generated from the model are

used to estimate discrete choice probabilities of outcomes and, therefore, generate thelikelihood that is maximized. For examples of such simulators, see McFadden (1989),Geweke, Keane, and Runkle (1994), Stern (1994), and Keane and Wolpin (1997).12. Keane and Wolpin (2001) develop an expanded approach for a model that

maintains a discrete choice set but allows for additional continuous but unobservedoutcome variables.13. For a summary see Silverman (1986).14. This choice of a Gaussian kernel was just for its broad use and understanding,

and a value for the smoothing parameter (and slight variants) is also proposed anddiscussed in Hardle (1991) and Scott (1992). In a comparative simulation study of kernelmethods, Bowman (1985) finds that such a plug-in parameter selection, while simple,performs well even when compared to more advanced smoothing parameter selectionmethods. Given the computational difficulty in solving the model itself, no iterative


procedure to optimize the smoothing parameter was done, though estimation with a fewsimilar fixed alternatives did not qualitatively change the general results; for the samereason alternative kernels to the Gaussian were not systematically investigated.15. Setting l ¼ 0 results in a histogram and setting l ¼ 1 results in an equal

density estimate of 1/J for all values. 0.3 is utilized by Titterington and Bowman(1985) and as with their studies, changing this within a reasonable range does notqualitatively impact the estimates. For more discussion on kernel methods fordiscrete distributions refer to Aitken (1983).16. The upper and lower truncation points for outliers differed for each period.

In total, 21 observations from below and 28 from above were cut.17. The full probability distributions at the different offers shown in Table 3 are

statistically significantly different, though point probabilities for specific outcomes,most notably some college, may not be. The probability of some college may increaseor decrease as offers increase depending on the net impact of the decrease in theprobability of no college and the increase in the probability of a 4-year degree.18. While the tax reform act of 1997 did introduce some policies such as these,

their 6-year incremental life-cycle paths up to 1999 would not have been noticeablyimpacted, assuming that the policies changes were generally unexpected.19. For this experiment, there are now two types of assets: regular savings and

educational savings. The addition of the second asset class does create an additional,continuous, state variable in the model. This creates a much larger state space acrosswhich the problem must be solved, though the extended model does not need to beestimated. The same solution procedure is used as before, with the smoothingregressions applied across both asset classes.20. For tax rates, I use the federal 1999 tax brackets for the marginal rates on

married households filing jointly.

ACKNOWLEDGMENTS

The author would like to thank the participants of the 8th Annual Advancesin Econometrics Conference, and two anonymous reviewers, Ken Wolpinand seminar participants at the University of New Orleans and OklahomaState University for helpful comments in writing this chapter.

REFERENCES

Acton, F. S. (1990). Numerical methods that work. Washington, DC: Mathematical Association

of America.

Aitken, C. G. G. (1983). Kernel methods for the estimation of discrete distributions. Journal

of Statistical Computation and Simulation, 16, 189–200.

Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.

Bowman, A. W. (1985). A comparative study of some kernel-based nonparametric density

estimators. Journal of Statistical Computation and Simulation, 21, 313–327.


Browning, M., & Lusardi, A. (1996). Household saving: Micro theories and micro facts. Journal

of Economic Literature, 34, 1797–1855.

Choy, S. P., Henke, R. R., & Schmitt, C. M. (1992). Parental financial support for undergraduate

education. Washington, DC: U.S. Department of Education, National Center for

Education Statistics.

Coleman, A. (1992). Household savings: A survey of recent microeconomic theory and evidence.

Working paper No. 9808, New Zealand Treasury.

College Board. (2006a). Trends in college pricing. Washington, DC: The College Board.

College Board. (2006b). Trends in student aid. Washington, DC: The College Board.

Dynarski, S. (2000). Hope for whom? Financial aid for the middle class and its impact on

college attendance. National Tax Journal, 53, 629–661.

Dynarski, S. (2002). The behavioral and distributional implications of aid for college. American

Economic Review, 92, 279–285.

Engen, E. M., Gale, W. G., & Scholz, J. K. (1996). The illusory effects of saving incentives on

saving. Journal of Economic Perspectives, 10, 113–138.

Geweke, J., Keane, M., & Runkle, D. (1994). Alternative computational approaches to inference

in the multinomial probit model. Review of Economics and Statistics, 76, 609–632.

Gourieroux, C., & Monfort, A. (1993). Simulation-based inference: A survey with special

reference to panel data. Journal of Econometrics, 59, 5–33.

Hardle, W. (1991). Smoothing techniques with implementation in S. New York: Springer-Verlag.

Haveman, R., & Wolfe, B. (1995). The determinants of children’s attainments: A review of

methods and findings. Journal of Economic Literature, 33, 1829–1878.

Hubbard, R. G., & Skinner, J. (1996). Assessing the effectiveness of saving incentives. Journal of

Economic Perspectives, 10, 73–90.

Ichimura, H., & Taber, C. (2002). Semiparametric reduced-form estimation of tuition subsidies.

American Economic Review, 92, 286–292.

Keane, M. P., & Wolpin, K. I. (1994). The solution and estimation of discrete choice dynamic

programming models by simulation and interpolation: Monte Carlo evidence. Review of

Economics and Statistics, 76, 648–672.

Keane, M. P., & Wolpin, K. I. (1997). The career decisions of young men. Journal of Political

Economy, 105, 473–522.

Keane, M. P., & Wolpin, K. I. (2001). The effect of parental transfers and borrowing

constraints on educational attainment. International Economic Review, 42, 1051–1103.

Lino, M. (1996). Expenditures on children by families, 1995 annual report. Washington, DC: U.S.

Department of Agriculture, Center for Nutrition Policy and Promotion.

McFadden, D. (1989). A method of simulated moments for estimation of discrete response

models without numerical integration. Econometrica, 57, 995–1026.

Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer

Journal, 7, 308–313.

Poterba, J. M., Venti, S. F., & Wise, D. (1996). How retirement saving programs increase

saving. Journal of Economic Perspectives, 10, 91–112.

Scott, D. W. (1992). Multivariate density estimation: Theory, practice, and visualization. New

York: Wiley.

Silverman, B. W. (1996). Density estimation for statistics and data analysis. New York:

Chapman and Hall.

Stern, S. (1994). Two dynamic discrete choice estimation problems and simulation method

solutions. Review of Economics and Statistics, 76, 695–702.


Titterington, D. M., & Bowman, A. W. (1985). A comparative study of smoothing procedures for

ordered categorical data. Journal of Statistical Computation and Simulation, 21, 291–312.

U.S. Department of Education. (2005). 2003–2004 national postsecondary student aid study.

Washington, DC: U.S. Department of Education, National Center for Education Statistics.

APPENDIX A. MODEL SPECIFICATION

Utility function:

Ut ¼ct�ctð Þ

1�g

1� gþ ðl1 þ l2nt þ l3ct þ �ntÞnt

þ ða1 þ a2qmt þ �qtÞqmt þ ðy1 þ y2qht þ �qtÞqht

l1 ¼ l1h þ l11h1ðt ¼ 1Þ þ l12h1ðt ¼ 2Þ

þ l13h1ðt ¼ 3Þ þ l1c1ðed � collegeÞ

þ l11h1ðed � college; t ¼ 1Þ



l2 ¼ l2h þ l2c1ðed � collegeÞ

a1 ¼ a1h þ a1c1ðed � collegeÞ

a2 ¼ a2h þ a2c1ðed � collegeÞ

y1 ¼ y1h þ y1c1ðed � collegeÞ

y2 ¼ y2h þ y2c1ðed � collegeÞ

Education attainment:

Prðno collegeÞ ¼ Fðm1 � StÞ

Prðsome collegeÞ ¼ Fðm2 � StÞ � Fðm1 � StÞ

Prðcollege degreeÞ ¼ 1� Fðm2 � StÞ

St ¼ k1ot þ k2o2t þ k31ðed � collegeÞ

þ k4ot1ðed � collegeÞ þ k5o2t 1ðed � collegeÞ

Income function:I t ¼ exp½b0h þ b1htþ b2ht

2 þ b3h1ðt ¼ 1Þ

þ b0c1ðed � collegeÞ þ b1ct1ðed � collegeÞ

þ b2ct21ðed � collegeÞ þ b3c1ðed � college; t ¼ 1Þ

þ �wht lnð�wct1ðed � collegeÞÞ� t � 7


I t ¼ ðbh þ bc1ðed � collegeÞÞE1

7

X7t¼1

I t

" #þ �rht lnð�rct1ðed � collegeÞÞ t47

Other parameters:

Discount factor: dChild costs: C ¼ ChþCc1(edZcollege)

Measurement error:

assetsobsit ¼ assetstrueit expð�ZÞ

Error distributions:lnð�ctÞ � Nð0;s2cÞ; �nt � Nð0; s2nÞ; �qt � Nð0;s2qÞ; �wht � Nð0;s2whÞ,

lnð�wctÞ � Nð0;s2wcÞ; �rht � Nð0;s2rhÞ; lnð�rctÞ � Nð0;s2rcÞ; �Z � Nð0;s2ZÞ

ed is the household level of education, 1(.) is an indicator function equal to 1 if the expression in

parenthesis is true and zero otherwise.

APPENDIX B. ESTIMATION PROCEDURE

The estimation procedure begins with initializing values for the parametersof the model. Given these parameters, the expected next period valuefunctions, as a function of the state choice variables, can be solved for;that is Emaxt(Otþ1; Y ¼ E(Vtþ1(Otþ1)), as explained in Section 3.2, wherethe vector of model parameters, Y, is usually suppressed. The likelihoodfunction is then constructed by simulating the choices of the households. Todo so, first draw a set of the four shocks to the utility function (unobservedtaste variation in consumption, children, some college, and 4-year collegeattainment) shown in Eqs. (1) and (2). For the estimation in this chapter10,000 such draws were made. For each household observation, at each timeperiod, the data give an observable value of state variables, Oit and Oitþ1

(i.e., assets, number of children, education of children, offer of support,etc. as appropriate.) For each of the 10,000 shocks, the model gives thehousehold choice of next period state variables, ~Oitþ1, given this periodobserved state variables, Oit, as the solution to

max~Oitþ1

Uit þ Emaxtð ~Oitþ1Þ (B.1)

This gives a set of 10,000 simulated values of next period state variables,each of which is notated as ~Ojit, which along with the single observed data


value of next period state variables, Oitþ1, can be used to construct thesimulated likelihood of Oitþ1. As discussed in Section 3.3, only one or twoelements of O will change between two observation points. Denoting theseas O1 and O2, the simulated likelihood of Oitþ1 can be constructed usingEqs. (13) or (14) as appropriate by

f ðOitþ1Þ ¼1

10; 000h

X10;000j¼1

KðO1itþ1;

~O1

jitþ1; hÞ (B.2)

if one state value is changing, or if two are changing by

f ðOitþ1Þ ¼1

10; 000h1h2

X10;000j¼1

K1ðO1itþ1;

~O1

jitþ1; h1Þ K2ðO2itþ1;

~O2

jitþ1; h2Þ (B.3)

where the kernel functions K(.) are given by Eqs. (12) and (15) asappropriate, and the smoothing parameters h as set in Section 3.3. Thesimulated likelihood function is then

LðYÞ ¼Yni¼1

YT�1t¼0

f ðOitþ1Þ (B.4)

This likelihood function can then be numerically maximized over Y, wherefor each update of the parameters a new set of Emax functions is calculated,from which a new simulated likelihood value is calculated. In this study,parameters were updated using a version of a simplex algorithm originallyproposed by Nelder and Mead (1965) and further discussed in Acton (1990).


ESTIMATING THE EFFECT OF

EXCHANGE RATE FLEXIBILITY ON

FINANCIAL ACCOUNT OPENNESS

Raul Razo-Garcia

ABSTRACT

This chapter deals with the estimation of the effect of exchange rateflexibility on financial account openness. The purpose of our analysis istwofold: On the one hand, we try to quantify the differences in theestimated parameters when exchange rate flexibility is treated as anexogenous regressor. On the other hand, we try to identify how twodifferent degrees of exchange rate flexibility (intermediate vs floatingregimes) affect the propensity of opening the financial account. We arguethat a simultaneous determination of exchange rate and financial accountpolicies must be acknowledged in order to obtain reliable estimates oftheir interaction and determinants. Using a panel data set of advancedcountries and emerging markets, a trivariate probit model is estimated viaa maximum simulated likelihood approach. In line with the monetarypolicy trilemma, our results show that countries switching from anintermediate regime to a floating arrangement are more likely to removecapital controls. In addition, the estimated coefficients exhibit importantdifferences when exchange rate flexibility is treated as an exogenousregressor relative to the case when it is treated as endogenous.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)00000260011

199

dx.doi.org/10.1108/S0731-9053(2010)00000260011

1. INTRODUCTION

Coordinating the implementation of exchange rate (ER) and financialaccount (FA) policies has been a perennial challenge for policymakers.Factors such as the soundness and development of the financial system,the degree of currency mismatch, and the economic growth strategy(e.g., export-led growth) complicate the interaction of the two policies. Forexample, countries opening their financial markets to international capitalflows face the task of adapting their ER regime (ERR) to the resultingenvironment of greater capital mobility. Some assert that the removal ofcapital controls requires the implementation of a more flexible ERR toprepare the domestic market to deal with the effects of higher capital flows(e.g., Eichengreen, 2004, 2005; Prasad, Rumbaug, & Wang, 2005).1 At thesame time the liberalization of FA policies, coupled with a more flexibleERR, can pose significant risks for countries in which the financial system isweak or currency mismatch is high.2

Historically, we have observed many examples in which FA policy poseschallenges for the choice of the ERR and vice versa. In recent years,however, the debate on Chinese policy reforms can be considered one of thebest examples showing the complex interaction between ER and capitalcontrol policies. This debate has centered on one question; how the Chineseauthorities will move toward a more open FA and a more flexible ERR.A more flexible regime would allow China to use monetary policy to bufferits economy against shocks. But greater ER flexibility would also increasethe foreign currency exposure of the financial and nonfinancial sectors andgenerate the need for instruments to hedge it. The dilemma is that removingcapital controls would allow agents to access the markets and instrumentsneeded to hedge foreign exchange exposure but simultaneously posesignificant risks for the Chinese economy given the weakness of its financialsystem. In a poorly regulated environment, capital inflows could bemisallocated, and currency mismatches on the balance sheets of financialand corporate sectors might rise to dangerous levels; this makes a moreflexible regime less attractive. In addition, a more flexible ERR that led to anappreciation of the Renminbi might diminish the competitiveness of theexport sector and damage the performance of the economy.3

The complex interaction between the two policies can be considered as acorollary of the monetary policy trilemma, which states that policymakers inopen economies have to choose two out of the three desirable objectives:(1) ER stability, (2) international capital mobility, and (3) monetary policyoriented toward domestic goals. Therefore, if the trilemma has indeed

RAUL RAZO-GARCIA200

constrained the actions of policymakers throughout history, as Obstfeld,Shambaugh, and Taylor (2004) and Shambaugh (2004) have shown, the ERpolicy should not be considered an exogenous determinant of FA openness,and this, in turn, cannot be assumed as an exogenous regressor of the ERR.As a consequence, the simultaneous determination of ER and FA policiesmust be acknowledged in order to obtain reliable estimates of theirinteraction and determinants.

Yet despite the clear connection between the ER policy and FA openness,the existent studies on the determinants of capital controls have, in general,disregarded the simultaneous determination of the two policies. One strandof the empirical literature has treated the endogeneity of ER and FA policieswithin a simultaneous equations framework (von Hagen & Zhou, 2005,2006; Walker, 2003). Another strand of the literature has simply ignored thesimultaneity problem, limiting the analysis to univariate probit or logitmodels and relying on econometric techniques that are inappropriate underthe presence of discrete endogenous regressors (e.g., Alesina, Grilli, &Milesi-Ferretti, 1994; Collins, 1996; Leblang, 1997; Quinn & Inclan, 1997).4

Another alternative to deal with the endogeneity of ERR on FA policies isto rely on instrumental variables. Nevertheless, the main challenge of theinstrumental variables approach is that the presence of dummy endogenousregressors in a discrete choice model makes the analysis differ substantiallyfrom the continuous regressor models. As pointed out by Carrasco (2001),the presence of endogenous dummy regressors leads to an inconsistencywith the statistical assumptions of the nonlinear discrete model if thetraditional two-stage method is applied.

The objective of this chapter is twofold. First, it investigates the impactof exogenous changes of ER flexibility on financial openness. Second, itexamines how the coefficients associated with ER flexibility are affectedwhen the ERR is treated as an exogenous regressor of FA openness.In our attempt to quantify the effect of ER flexibility on the removal ofcapital controls (i.e., financial openness), we face an identification problem:the data alone are not sufficient to identify this effect. Hence, the inferencedepends on prior information available to the researcher about theprobability distribution of the endogenous variables. In this chapter,using a panel data set of advanced and emerging countries, we estimatea trivariate probit model to account for the simultaneity of ER and FApolicies. The trivariate probit model is composed of a switching probitmodel to identify the impact of ERR on the propensity of removing capitalcontrols and a multinomial probit to study the choice of the ERarrangement. Since rich and emerging countries have different economic,

Estimating the Effect of Exchange Rate Flexibility on Financial Account 201

political, and institutional circumstances, we estimate the model separatelyfor these two types of countries.

To identify the effect of exogenous changes of ER flexibility on FAopenness, we propose, along the geographical concentration of trade,measures of the world’s acceptance of intermediate and floating regimesas instruments. One of the contributions of this chapter is being able toassess how two levels of ER flexibility affect the propensity to removecapital restrictions. Another contribution of this chapter is the introductionof interactions between ER flexibility and the determinants of capitalcontrols, which help to identify nonlinear relationships between theindependent variables and the propensity to liberalize FA.

Our estimation strategy departs from previous work in this area in at leastthree ways: (i) we use the maximum simulated likelihood (MSL) approachto estimate the trivariate probit model; (ii) we assume the residuals to beindependent and identically distributed (i.i.d.) normal random variables;and (iii) we rely on Halton draws and the Geweke–Hajivassiliou–Keane(GHK) method to maximize the likelihood function. Assuming normalerrors instead of logistic errors, as in von Hagen and Zhou (2006), avoidsthe restrictive substitution patterns due to the independence of irrelevantalternatives (IIA).5 The GHK method and the Halton draws are used tosimulate the multidimensional normal integrals.

There are six main results. First, the degree of ER flexibility stronglyinfluences FA policy. In particular, a U-shape behavior is found between thepropensity to remove capital controls and the ER flexibility. Second, thecoefficients obtained when the ERR is treated as an endogenous regressordiffer substantially from the estimated coefficients obtained when the ERarrangement is treated as exogenous. As a matter of fact, the effect of ERflexibility on the propensity to liberalize FA is overestimated when theendogeneity of the ERR is not properly accounted for. This overestimationvaries across emerging and advanced countries and across ERRs. Third,interesting correlations between the exogenous variables and the degree offinancial openness are unmasked when the effect of the exogenous regressorsis allowed to vary across ERRs. Fourth, despite many external pressures,emerging countries have been more conservative in their processes toliberalize FA. Fifth, policymakers tend to adopt EERs with a higher degreeof flexibility when these are more globally accepted. Finally, relative toother emerging markets, Asian countries display a higher degree of ‘‘fear offloating.’’6

The rest of the chapter is organized as follows. In Section 2, we reviewthe empirical literature on the choice of ERR and the degree of financial

RAUL RAZO-GARCIA202

openness. Then, in Section 3, we describe the evolution of the bivariatedistribution of ERR and FA openness. Sections 4 and 5 describe theempirical model and the data, respectively. In Section 6, we present theestimation strategy and in Section 7, we comment on the results. Some finalremarks are included in Section 8.

2. LITERATURE REVIEW

The empirical literatures on both the choice of ERR and the removal orimposition of capital controls are vast and cannot be comprehensivelysummarized here. Different sample periods and country coverage have beenused to shed light on these two issues. Nevertheless, a common characteristiccan be observed: the econometric approach.

2.1. The Literature on the Choice of Capital Controls

The most common econometric specification used to analyze the factorsaffecting the level of FA openness is a discrete choice model in which anunobservable propensity, say, to remove capital controls is explained bysome exogenous covariates.

K�it ¼ X 0itbþ �it (1)

where Kit� is an unobservable latent index describing the propensity to open

the FA, Xit is a vector of exogenous regressors affecting the likelihood oflifting capital controls, b is a vector of parameters associated to with Xit, and�it is a random term representing all the factors affecting K�it not included inX. Depending on the distributional assumption imposed on �it, a logit orprobit model may be obtained.7 Although Kit

� cannot be observed, theeconometrician observes the following discrete variable Kit ¼ 1fK�it40g,where 1{A} is an indicator function assuming the value of 1 if event Aoccurs.8

Common arguments in favor of the removal of capital controls areas follows: (1) FA liberalization promotes a more efficient internationalallocation of capital, and boosts economic growth in developing countries;(2) it allows agents to smooth consumption (i.e., risk sharing); (3) itencourages capital inflows9; and (4) it allows the government to send a signalindicating that it will abstain from using inflation tax or it’s committedto policies that favor investment.10 Conversely, traditional motives for the


adoption of capital controls include: (1) to allow monetary independencewhen a fixed ERR is in place; (2) to reduce or eliminate volatile capitalflows; (3) to maintain domestic savings and inflation tax; (4) to limitvulnerability to financial contagion; (5) to reduce ER volatility causedby volatile short-run capital flows; and (6) to avoid excessive externalborrowing.11

The discrete choice model discussed above is commonly used to test thesepresumptions. Some of the most widely-cited studies are Alesina et al.(1994), Grilli and Milesi-Ferretti (1995), Leblang (1997), Quinn and Inclan(1997), Eichengreen and Leblang (2003), Leblang (2005), and (von Hagenand Zhou (2005, 2006). Although most of these studies have included theERR as a determinant of financial openness, the majority have disregardedthe simultaneous determination of these two policies.12 Central bankindependence, the political leaning of the government, political instability,degree of democracy, per capita income, size of the government, andopenness to trade are often used to explain the likelihood of opening the FA.

Common findings include: (1) countries with an independent central bankhave a higher likelihood of removing capital controls (Alesina et al., 1994;Grilli & Milesi-Ferretti, 1995; Quinn & Inclan, 1997; Walker, 2003; vonHagen & Zhou, 2005); (2) policymakers implementing a floating ERR aremore prone to open the FA (Alesina et al., 1994; Grilli & Milesi-Ferretti,1995; Leblang, 1997; von Hagen & Zhou, 2005); (3) countries with leftistgovernments and high levels of trade openness maintain more open FA(Quinn & Inclan, 1997; Walker, 2003)13; (4) large economies with a smallstock of international reserves are more likely to impose capital controls(Grilli & Milesi-Ferretti, 1995; Leblang, 1997); (5) countries with largergovernments exhibit lower levels of financial openness (Grilli & Milesi-Ferretti, 1995; von Hagen & Zhou, 2005); and (6) economies with largecurrent account deficits and high inflation levels present a higher propensityto impose capital controls (Walker, 2003; von Hagen & Zhou, 2005).

2.2. The Literature on the Choice of Exchange Rate Regime

The Mundell–Fleming model and Mundell’s (1961) optimum currency area(OCA) theory have been the main workhorses in the empirical literatureon the choice of ERR. In recent years, however, models incorporatingfrictions other than nominal rigidities (Lahiri, Singh, & Vegh, 2007),balance sheet exposure to ER volatility (Chang & Velasco, 2006;Hausmann, Panizza, & Stein, 2001), and political and institutional factors

RAUL RAZO-GARCIA204

(e.g., Simmons & Hainmueller, 2005; Bernhard & Leblang, 1999) haveenriched theoretical and empirical research on this topic.

A model similar to the one presented above, (Eq. (1)), has been used toanalyze the determinants of the ERR. In this context, an unobservablepropensity, say to peg, is explained by some exogenous regressors.14

Determinants of the ERR in previous research can be grouped into threecategories: variables related to OCA theory, political factors, andmacroeconomic performance. With regard to OCA theory, factors such astrade openness, geographical concentration of trade, size of the economy,and economic development are typically included in the models. Centralbank independence, democracy, political stability, proximity to an election,and the influence of partisan politics are used to control political factors.Finally, foreign exchange reserves, terms of trade volatility, economicgrowth, inflation, real ER volatility, and current account are included asmeasures of macroeconomic performance assumed to affect the likelihoodof pegging, floating, or implementing an intermediate regime.

The main results of previous studies are as follows: (1) a positiverelationship exists between the size of the economy and the likelihoodof floating; (2) countries with high levels of trade and highly dollarizedfinancial systems are more likely to implement a peg; and (3) inflation isfrequently found to be positively associated with freely floating rates.Evidence related to political factors suggests that democratic countries aremore likely to adopt floating rates (Broz, 2002; Leblang, 2005) and thatgovernments with both strong support in the legislature and fragmentedopposition are more inclined to peg (Frieden et al., 2000). Also, it seemsthere is a lower probability of exiting from pegs in the run-up to elections(e.g., Blomberg, Frieden, & Stein, 2005; Frieden et al., 2000). Table 1 showssome of the models utilized by previous studies on the determinants of ERRand capital controls.

3. EVOLUTION OF THE POLICY MIX

COMPOSED BY THE EXCHANGE RATE

AND CAPITAL ACCOUNT REGIMES

3.1. De Facto versus De Jure Classifications

A key issue in this research area is how to classify ER and FA regimes.For both there are two possibilities: de jure and de facto classifications.


The former are generated from arrangements reported by countries(i.e., official regimes), while the latter are constructed mainly on the basisof macroeconomic variables.15 Since our investigation deals with the effectof the implemented ER policy on the degree of financial openness, we use defacto arrangements when these are available.

Three de facto ERR classifications have been proposed recently. Bubulaand Otker-Robe (2002) (BOR) classification combines market ERs andother quantitative information with assessments of the nature of the regimedrawn from consultations with member countries and IMF country desk

Table 1. Some Studies on the Determinants of the Exchange RateRegime and the Openness of the Financial Account.

Author(s) Year Type of Model

Exchange rate regime

Collins 1996 Probit

Klein and Marion 1997 Logit

Bernhard and Leblang 1999 Logit

Frieden et al. 2000 Ordered Logit

Poirson 2001 Ordered Probit and OLS

Broz 2002 Ordered Probit

von Hagen and Zhou 2002 Ordered Logit

Juhn and Mauro 2002 Bivariate Probit and Multinomial Logit

Levy-Yeyati et al. 2002 Pooled Logit

Barro and Tenreyro 2003 Probit (Instrumental Variables)

Leblang 2003 Strategic Probit

Eichengreen and Leblang 2003 Bivariate Probit

Walker 2003 Simultaneous Equation Model (Probit)

Blomberg et al. 2005 Duration Model

Simmons and Hainmueller 2005 Logit, Probit and Markov Transition Model

von Hagen and Zhou 2005 Simultaneous Equation Model (Ordered Probit for ERR)

von Hagen and Zhou 2006 Simultaneous Equation Model (Logit)

Capital account openness

Alesina et al. 1994 Logit and Probit

Grilli and Milesi-Ferretti 1995 Logit and Probit

Leblang 1997 Probit

Quinn and Inclan 1997 OLS

Walker 2003 Simultaneous Equation Model (Probit)

Eichengreen and Leblang 2003 Bivariate Probit

Leblang 2005 Logit

von Hagen and Zhou 2005 Simultaneous Equation Model (Continuous Index for CA)

von Hagen and Zhou 2006 Simultaneous Equation Model (Logit)

Note: See Rogoff et al. (2004) for other studies on the determination of the ERR.

RAUL RAZO-GARCIA206

economists. BOR classify the ERR into 13 categories: (1) another currencyas legal tender, (2) currency union, (3) currency board, (4) conventionalfixed peg to a single currency, (5) conventional fixed peg to a basket,(6) pegged within a horizontal band, (7) forward-looking crawling peg,(8) forward-looking crawling band, (9) backward-looking crawling peg,(10) backward-looking crawling band, (11) tightly managed floating,(12) other managed floating, and (13) independent floating. Bubula andOtker-Robe (2002) classified the ER arrangements for all IMF membersfrom 1990 to 2001. This classification has been updated by IMF staffthrough 2008.

Levy-Yeyati and Sturzenegger (2005) (LYS) use cluster analysis andchanges and the volatility of changes of the ER and international reservesto construct a de facto ERR classification. LYS argue that fixed ERs areassociated with low volatility of ER (in levels and changes) and highvolatility in international reserves, while floating regimes are characterizedby low volatility of ER with more stable international reserves. Thisclassification covers 183 countries over the period 1974–2004. LYS classifythe ER arrangements into five categories: (1) inconclusive, (2) floats,(3) dirty floats, (4) crawling pegs, and (5) pegs. A country’s ER arrangementis classified as ‘‘inconclusive’’ when the volatility of the ER and inter-national reserves are low.16

Reinhart and Rogoff (2004) (RR) categorize the ERR based on thevariability of informal or black-market ERs and the official rate. Thisclassification is most appropriate for the purposes of this chapter. Whenthere are multiple ERs, RR use the information from black or parallelmarkets to classify the arrangement under the argument that market-determined dual or parallel markets are important, if not better, barometersof the underlying monetary policy.17 Two additional features of RR’sclassification are that it is available for a longer period and it includes a‘‘freely falling’’ regime.18 With the latter we generate a dummy variable,labeled CRISIS, to control periods of macroeconomic instability. Since weare interested in the ERR implemented even during macroeconomic stressperiods, we use information in RR’s detailed chronologies to reclassify the‘‘freely falling’’ countries into pegs, intermediate, or freely floating regimes.RR classified the ERR into 15 categories. BOR’s classification is notutilized due to its short coverage in terms of years. We do not use LYS’sclassification because of the ‘‘inconclusive’’ regimes, which makes it lessdesirable for our purposes.

Given the lack of de facto classifications for the openness of the FA,we rely on Brune’s financial openness index (BFOI) to classify the degree of


FA openness.19 A problem here is that the ER structure is one of thevariables Brune considers to construct the index. Since the aim of thefinancial openness index is to reflect openness of the economies to capitalflows, we exclude the ER structure from the index because this factor hasalready been taken into account in RR’s classification.20

3.2. Definition of Financial Account Opennessand Exchange Rate Regimes

Let Kit denote the degree of FA openness in country i (i=1,2,y,N) in year t(t=1,2,y,Ti). BFOI is recoded on a 1,2,3,4 scale representing four differentdegrees of FA openness: closed (K=1), relatively closed (K=2), relativelyopen (K=3), and open (K=4). Formally,

Kit ¼

1 if BFOI ¼ f0; 1; 2g

2 if BFOI ¼ f3; 4; 5g

3 if BFOI ¼ f5; 7; 8g

4 if BFOI ¼ f9; 10; 11g

8>>><>>>:

We collapse RR’s ‘‘natural’’ classification of ERR into three categories:pegs, intermediate, and flexible regimes. Pegs are regimes with no separatelegal tender, preannounced pegs or currency boards, preannouncedhorizontal bands that are narrower than or equal to plus or minus ð�Þ2%,and de facto pegs. Intermediate regimes include preannounced crawlingpegs, preannounced crawling bands that are narrower than or equal to�2%, de facto crawling peg, de facto crawling bands that are narrower thanor equal to �2%, preannounced crawling bands that are wider than or equalto �2%, de facto crawling bands that are wider or equal to �5%, movingbands that are narrower than or equal to �2% (i.e., allows for bothappreciation and depreciation over time), and de facto crawling bands thatare narrower than or equal to �2%. Finally, floats include managed floatingand freely floating arrangements. de facto pegs are classified as pegs in spiteof the fact there is no commitment by the monetary authority to keep theparity irrevocable.

We split the sample of countries into two groups: advanced and emergingmarkets. The definition of advanced countries coincides with industrialcountries in the International Financial Statistics data set. FollowingBubula and Otker-Robe (2002), countries included in the Emerging MarketBond Index Plus (EMBI+), the Morgan Stanley Capital International

RAUL RAZO-GARCIA208

Index (MSCI), Singapore, Sri Lanka, and Hong Kong SAR are defined asemerging markets. The resulting sample consists of 24 advanced countriesand 32 emerging markets.21 The model is estimated on annual data for theperiod 1975–2006.

3.3. Evolution of the Policy Mix 1975–2006

We first describe the evolution of ER and FA policies pooling advanced andemerging countries in the same group. The bivariate distribution of ERRand FA openness has displayed a clear path over the past 35 years. As Fig. 1shows, in 1975, 2 years after the breakdown of the Bretton Woods System,about three quarters of the advanced and emerging markets, or 36 out of49 countries, were implementing an ER arrangement with some degreeof flexibility (a soft peg or a floating regime). Nevertheless, most of thesecountries still kept capital controls.22 Preparing the ground for a more openFA took about 15 years. It was not until the early 1990s when a significantnumber of countries started to liberalize their FA.

Important differences are evident between advanced and emergingcountries. The former exhibit the path described above: after the breakdownof the Bretton Woods system the majority of the industrialized world movedto an intermediate regime first, after which they liberalized the FA andfinally shifted to either a flexible ER or a peg (see Fig. 2). Emerging marketshave been more reluctant to move in the direction of greater ER flexibilityand capital mobility. After the Bretton Woods system collapsed, someemerging economies moved to an intermediate ERR and just a few of themliberalized capital flows (see Fig. 3). Relative to the advanced countries, thedistribution of ERR and FA regimes in emerging markets has not changeddramatically in the past three decades. It is clear that in these countries theintermediate regimes are still a popular option. In fact, comparing Figs. 2and 3, we can conclude that the ‘‘bipolar view’’ can be rejected for emergingmarkets but not for advanced countries (see Eichengreen & Razo-Garcia,2006).23 Also, from Fig. 3, we can observe that capital restrictions are acommon practice among emerging countries in these days.

The main conclusion of this section is that during the post–BrettonWoods era, ER and FA reforms have not taken place at the same pace andtime in advanced and emerging countries. The initial reluctance displayedby emerging markets to move to more flexible arrangements after thebreakdown of the Bretton Woods system might be explained, for example,by the underdevelopment of their financial systems. In countries with less


Fixed Regimes

Intermediate Regimes

0

5

10

15

20

25

30

2003

Num

ber

of C

ount

ries

Closed FA Relatively Closed FA Relatively Open FA Open FA

5

10

15

20

25

30

Flexible Regimes

0

5

10

15

20

25

30


0

5

10

15

20

25

30

Num

ber

of C

ount

ries


0

5

10

15

20

25

30

Num

ber

of C

ount

ries

Closed FA Relatively Closed Relatively Open FA Open FA

1975 1979 1983 1987 1991 1995 1999

20031975 1979 1983 1987 1991 1995 1999

20031975 1979 1983 1987 1991 1995 1999

Fig. 1. Financial Account Policies Under Different Exchange Rate Regimes in

Advanced and Emerging Countries: 1975–2006. Note: These three figures report the

evolution of ERR and FA policies. The vertical axis measures the number of

advanced and emerging countries with ERR j (j=[pegs, intermediate, flexible]) that

are implementing FA regime l (l=[closed, partially closed, partially open, open]).

For example, in 1975, 13 countries implemented a fixed ER. Out of these 13

countries, 8 had a closed FA, 2 had a partially closed FA, 2 had partially opened

their FA, and only 1 had removed all capital restrictions. Darker colors are assigned

to FA regimes with more restrictions. Therefore, a figure dominated by dark colors

means that for that specific type of ERR the majority of the countries in the sample

had intensive capital restrictions. For definitions of the FA regimes see Section 3.2.

Fixed Regimes


0

2

4

6

8

10

12

14

16

Num

ber

of C

ount

ries


Flexible Regimes

0

2

4

6

8

10

12

14

16

0

2

4

6

8

10

12

14

16

Num

ber

of C

ount

ries


0

2

4

6

8

10

12

14

16

Num

ber

of C

ount

ries


1975 1979 1983 1987 1991 1995 1999 2003

1975 1979 1983 1987 1991 1995 1999 2003

1975 1979 1983 1987 1991 1995 1999 2003


Advanced Countries: 1975–2006.Note: These three figures report the evolution of ERR

and FA policies. The vertical axis measures the number of advanced countries with

ERR j ( j=[pegs, intermediate, flexible]) that are implementing FA regime l (l=[closed,

partially closed, partially open, open]). For example, in 1975, three advanced countries

implemented a fixed ER. Out of these three countries, one had a closed FA, none of

them had a partially closed FA, two had partially opened their FA, and no advanced

countries had removed all capital restrictions. Darker colors are assigned to FA regimes

with more restrictions. Therefore, a figure dominated by dark colors means that for that

specific type of ERR the majority of the countries in the sample had intensive capital

restrictions. For definitions of the FA regimes see Section 3.2.

Fixed Regimes



Flexible Regimes

0

2

4

6

8

10

12

14

16

18

20

2003

Num

ber

of C

ount

ries


0

2

4

6

8

10

12

14

16

18

20

Num

ber

of C

ount

ries


0

2

4

6

8

10

12

14

16

18

20

Num

ber

of C

ount

ries


1975 1979 1983 1987 1991 1995 1999

20031975 1979 1983 1987 1991 1995 1999

20031975 1979 1983 1987 1991 1995 1999


Emerging Countries: 1975–2006. Note: These three figures report the evolution of

ERR and FA policies. The vertical axis measures the number of emerging countries

with ERR j (j=[pegs, intermediate, flexible]) that are implementing FA regime l

(l=[closed, partially closed, partially open, open]). For example, in 1975, 10

countries implemented a fixed ER. Out of these 10 emerging markets, 7 had a closed

FA, 2 had a partially closed FA, none of them had partially opened their FA, and

only 1 had removed all capital restrictions. Darker colors are assigned to FA regimes

with more restrictions. Therefore, a figure dominated by dark colors means that for

that specific type of ERR the majority of the countries in the sample had intensive

capital restrictions. For definitions of the FA regimes see Section 3.2.

developed financial sectors, economic agents may not have the financialtools to hedge currency risks, which can reduce the attractiveness of flexiblerates. The institutional framework, ‘‘fear of floating,’’ and the inability toborrow funds denominated in domestic currency are other causes behind thedifferent sequences of ER and FA policies followed by industrializedcountries and emerging markets.

A situation in which the domestic currency cannot be used to borrowabroad, dubbed ‘‘original sin’’ by Eichengreen, Hausmann, and Panizza(2003), and the presence of a weak institutional framework (e.g., financialregulation) is another example of how the interaction of economic andinstitutional factors has derived in different sequences of ER and FA policiesacross advanced and emerging countries. In this specific example, whenemerging countries cannot borrow in terms of their domestic currency andthe country’s banks have made loans in U.S. dollars, then a depreciation ofthe currency against the dollar can hurt the balance sheet of the financialinstitutions and greatly injure the financial system. Under these circum-stances, the central bank is likely to display ‘‘fear of floating.’’ In thisexample, the three factors mentioned above, the institutional framework(e.g., poor prudential regulation), ‘‘fear of floating’’, and the inability toborrow in the international capital market in domestic currency, interact toreduce the likelihood of implementing a flexible rate in emerging economies.

Since advanced and emerging economies display important differencesin macroeconomic institutions, degree of access to international capitalmarkets, level of development, and other economic and political factors,we split the sample of countries into two groups: advanced and emergingmarkets. This will help us to obtain accurate measures of the interaction anddeterminants of ER and FA policies.

4. TRIVARIATE PROBIT MODEL

The econometric model consists of a random utility for the degree of FAopenness. Let Kit

� be the ith country’s unobservable latent index that guidesthe decision regarding the liberalization of the FA in period t. Theunderlying behavioral model that we assume is

K�it ¼ SitZsk þ FitZfk þ X1;itbk þ SitX1;itd

sk þ FitX1;itd

fk þ nit;k (2)

where the subscript k denotes parameters or variables associated with theFA equation and the superscripts s and f denote parameters associated with


the soft pegs (intermediate) and flexible regimes, respectively.24 X1,it is a rowvector of m exogenous regressors of country i in period t; Sit and Fit aredummy variables indicating the implementation of an intermediate or afloating regime, respectively; Zsk is a parameter capturing the effect ofintermediate regimes on the decision to liberalize the FA; Zfk captures theeffect of flexible ERR on the decision to open the FA; bk is a vector ofparameters associated with X1,it; dsk and dfk are vectors of parametersassociated with the interactions of X1,it with Sit and Fit, respectively; and nit;kis a residual term assumed to be i.i.d (over time and countries) normallydistributed, ðnit;kjX1;it ¼ x1;it;Sit;FitÞ � Nð0;okk ¼ 1Þ. Since the constantterm is included in the regression, the dummy variable associated withfixed ERRs is excluded to avoid perfect multicollinearity, making thisarrangement the base category against which the intermediate and flexibleregimes are assessed.25 With the normality assumption, the model describedin Eq. (2) becomes a probit model.

To estimate this switching probit model, we assume the model errorvariances are constant (i.e., we are assuming the errors are homoscedastic).We incorporate this assumption into the model by assuming that theconditional variance of nit;k is equal to one. The reason for fixing the varianceis that we cannot distinguish between a data-generating process withparameters Zsk, Z

fk, bk, d

sk, d

fk, and okk and one with parameter ðZsk=

ffiffiffiffiffiffiffiffiokkp

Þ,ðZfk=

ffiffiffiffiffiffiffiffiokkp

Þ, ðbk=ffiffiffiffiffiffiffiffiokkp

Þ, ðdsk=ffiffiffiffiffiffiffiffiokkp

Þ, ðdfk=ffiffiffiffiffiffiffiffiokkp

Þ, and 1. This is just anormalization, therefore, and not a substantive assumption. The homo-scedastic assumption, however, is critical for two reasons. First, if we do notassume a constant variance, the model will not be identified. Second, if theerrors are heteroscedastic, then the parameter estimates will be biased,inconsistent, and inefficient. Although the consequences of a misspecifiedmodel are serious, to avoid complicating the already cumbersome estimationprocess proposed in this chapter, we estimate a homoscedastic probit model.26

The degree of FA openness is modeled as a switching probit modelwith the ERR behaving as the switch. Hence, depending on the value ofthe switch, Eq. (2) have three states: one for pegs, one for intermediatearrangements, and another for floating arrangements.

K�it ¼

X1;itðbkþ nit;kÞ Fixed Regime if Sit ¼ 0 and Fit ¼ 0

ZskþX1;itðbkþ dskÞ þ nit;k Intermediate Regime if Sit ¼ 1 and Fit ¼ 0

ZfkþX1;itðbkþ dfkÞ þ nit;k Flexible Regime if Sit ¼ 0 and Fit ¼ 1

8><>:

Note that in Eq. (2) the effect of the explanatory variables varies amongcountries with different ERRs via the interaction terms. Although we do not

RAUL RAZO-GARCIA214

directly observe the propensity to open the FA, we do observe the followingdiscrete variable:

Kit ¼

1 if �1oK�it � a0 Closed FA

2 if a0oK�it � a1 Relatively Closed FA

3 if a1oK�it � a2 Relatively Open FA

4 if K�it4a2 Open FA

8>>>><>>>>:

(3)

where a ¼ ½a0; a1; a2� represents a vector of thresholds to be estimated.Here, for example, a1 is a positive threshold differentiating betweentwo degrees of financial openness: relatively closed and relatively open.Now, since the two endogenous dummy variables (S and F) cannot follow anormal distribution (by its discrete nature), we cannot apply the traditionaltwo-stage or instrumental variables approaches.27

In this chapter, we account for the endogeneity between the ERR and theFA openness by assuming a trivariate probit model. In particular, for thetwo discrete endogenous regressors, we specified a reduced-form multi-nomial probit

S�it ¼ X1;itPs1 þ X2;itPs

2 þ nit;s (4)

F�it ¼ X1;itPf1 þ X2;itP

f2 þ nit;f (5)

with

Sit ¼1 if S�it40 and S�it4F�it

0 otherwise

�(6)

Fit ¼1 if F�it40 and F�it4S�it

0 otherwise

�(7)

where Sit� is an unobservable latent index that guides the policymakers’

intermediate ERR decision of country i in period t (relative to a peg);Fit� is an unobservable latent index measuring the proclivity of country i

in time t to implement a floating ERR (relative to a peg); X2,it is a rowvector of l excluded regressors (from the FA equation) that affects Konly through S or F; and nit ¼ ½nit;snit;f nit;k�0 is a vector of residualsassumed to be i.i.d. (over time and countries) normal with zero mean and


variance–covariance matrix

On ¼

oss osf osk

� off ofk

� � okk

264

375 (8)

From Eqs. (6) and (7), it can be verified that the ERR with the highestpropensity will be adopted. For identification purposes, the coefficients ofthe random propensity for the pegs are normalized to zero, Pit

�=0.28 Thisnormalization is a consequence of the irrelevance of the level of utilities indiscrete choice models. In this case, neither all the regime-specific constantsnor all the attributes of the countries that do not vary across alternativescan be identified. Hence, the coefficients of the intermediate regimes areinterpreted as differential effects on the propensity to adopt a soft pegcompared to the propensity to implement a peg. A similar interpretation isgiven to the coefficients of the floating equation.29

The parameter vector to be estimated, y, is composed of two scaleparameters (Zsk and Zsk), five vectors of dimension m� 1 ðbk; d

sk; d

fk;P

s1;P

s1Þ,

two vectors of dimension l � 1 (Ps2 and Pf 0

2 ), one vector containingsix elements of the covariance matrix On ðVechðOnÞÞ, and one vector ofthresholds of dimension 3� 1 ða ¼ ½a0; a1; a2�0Þ.

30 Then y0 ¼ ðZsk; Zfk;b0k;

ds0k ; df 0k ;P

s01 ;P

s02 ;P

f 01 ;P

f 02 ;VechðOnÞ

0; a0Þ is a vector of dimension ð2þ 5mþ2l þ 6þ 3Þ � 1.

5. EXPLANATORY VARIABLES

The variables included as determinants of both the choice of ERR andthe degree of FA openness are inflation, international reserves normalizedby M2 (RESERVES/M2), financial development (FINDEV) proxiedby the monetary aggregate M2 and normalized by GDP, relative size(GDP relative to the U.S. GDP), trade openness, geographical concentra-tion of trade (SHARE), per capita income, and democratic level(POLITY).31

Previous research suggests that governments compelled to resort toinflation tax are more likely to utilize capital controls to broaden thetax base; hence, a negative correlation between inflation and the opennessof the FA is expected. Since financial deepening and innovationreduce the effectiveness of capital controls, countries with more developedfinancial systems should exhibit a higher propensity to open the FA.

RAUL RAZO-GARCIA216

More democratic polities are more prone to lift capital controls. Eichen-green and Leblang (2003) argue that democratic countries have greaterrecognition of rights, including the international rights of residents, whohave a greater ability to press for the removal of restrictions on theirinvestment options.

An empirical regularity found in previous studies is that advancedeconomies are less inclined to resort to controls. Previous authors havesuggested that the development of general legal systems and institutions,and not of those specific to financial transactions, is crucial for a countryto benefit from opening its financial markets; we use per capita income toproxy for this characteristic.32 While institutional development is difficultto measure, there is a presumption that it is most advanced in high-incomecountries (Eichengreen, 2002). Acemoglu et al. (2001) and Klein (2005) foundempirical evidence supporting this presumption (i.e., the link between percapita income and institutional quality).33 Trade openness is commonly seenas a prerequisite to open the FA, (McKinnon, 1993). Furthermore, opennessto trade can make capital controls less effective.34 Hence, the more open totrade a country is, the higher its propensity to remove capital controls.

Two variables, SPILL and SHARE, are used as instruments in thereduced form (Eqs. (4) and (5)).35 The former variable, SPILL, measures theproportion of countries in the world implementing either an intermediate ora floating ERR. The second, SHARE, measures the geographical concentra-tion of trade, which is proxied by the export share to the main tradingpartner.36 Frieden et al. (2000) and Broz (2002) use a variable similar toSPILL to control for the feasibility of the ER arrangement. The idea is tocapture the‘‘climate of ideas’’ regarding the appropriate ERR, in this casean arrangement with some degree of flexibility. That is, the choice of an ERarrangement may be related to the degree of acceptance of that regime in theworld. This being true, then if most countries are adopting regimes in whichthe currency is allowed to fluctuate, within or without limits, it would bemore feasible to adopt or maintain an intermediate or a floating regime.Then, SPILL is expected to be positively correlated with the latent indexesof the intermediate and floating regimes. Since countries with a higherdegree of concentration in exports can benefit more from fixed regimes,a negative coefficient associated with SHARE is predicted in the floatingERR equation.

Regarding the choice of ERR, we expect the following relationships.Countries with nontransparent political systems (e.g., autocracies) areexpected to have a higher propensity to peg relative to countries with moretransparent systems (e.g., democracies). The argument is that nondemocratic


countries may choose a less flexible ERR as a commitment device to helpthem maintain credibility for low-inflation monetary policy objectives.Countries with underdeveloped financial systems do not possess the instru-ments needed to conduct open market operations and, as a consequence, areexpected to adopt less flexible regimes. OCA theory holds that variables suchas low openness to trade, large size, and low geographical concentration oftrade are associated with more flexible regimes, since a higher volume andgeographical concentration of trade increases the benefits from a less flexibleER, reducing transaction costs. Smaller economies have a higher propensityto trade internationally, leading to a higher likelihood of pegging. To theextent that a high level of international reserves is seen as a prerequisite fordefending a less flexible regime, a negative association between the flexibilityof the ER and the stock of international reserves is expected.

To identify the effect of exogenous changes of ER flexibility on FAopenness, we need more than correlation between the two externalinstruments, SPILL and SHARE, and the two latent indexes, Sit

� and Fit�.

These two variables must be significant after controlling for the covariatesincluded in X1t. Previous studies support our argument that these twovariables significantly affect the choice of ERR even after controllingfor other factors. Regarding the feasibility of intermediate and floatingregimes, proxied by SPILL, Broz (2002) finds that the choice of a fixed ERis positively and significantly related to the general climate of opinionregarding pegging.37 Geographical concentration of trade, SHARE, hasbeen found to significantly affect the propensity to adopt fixed, inter-mediate, and floating regimes. Studying the choice of de jure ERR for agroup of 25 transition economies in the 1990s, von Hagen and Zhou (2002)find evidence that SHARE raises the chance to implement a fixed ERRamong the Commonwealth of Independent States (CIS), but increases theprobability of adopting a floating rate regime among non-CIS countries.38

For a larger set of developing countries, von Hagen and Zhou (2006)also find a significant relationship between geographical concentrationof trade and the choice of the ER arrangement.39 In a slightly differentbut interrelated area, Klein and Marion (1997) find that SHARE is animportant determinant of the duration of pegs in Latin American countries.

6. ESTIMATION

As we will show, the difficulty in evaluating the log-likelihood functionis that the probabilities require the evaluation of a three-dimensional

RAUL RAZO-GARCIA218

multivariate normal integral. We estimate, up to a constant, the parameterspresented in Eqs. (2), (4), and (5) by MSL. Specifically, to simulatethe multidimensional normal integrals, we use the GHK simulator andHalton draws.40

6.1. Maximum Simulated Likelihood

The log-likelihood function in this panel data context is

L ¼XIi¼1

logYTt¼1

PrðKit ¼ l;Sit ¼ s;Fit ¼ f Þ

!(9)

with

PrðKit ¼ l;Sit ¼ s;Fit ¼ f Þ ¼Y4l¼1

PðKit ¼ l;Sit ¼ 0;Fit ¼ 0Þð1�Fit�SitÞKlit

�Y4l¼1

PðKit ¼ l;Sit ¼ 1;Fit ¼ 0ÞSitKlit

�Y4l¼1

PðKit ¼ l;Sit ¼ 0;Fit ¼ 1ÞFitKlit

(10)

where l={1,2,3,4}, s, f={0,1}, Klit ¼ ifKit ¼ lg and ifAg is the indicator

function. Note that all the choice probabilities are conditioned on X1it

and X2it.41 These probabilities can be simulated using the Cholesky

decomposition of the variance–covariance matrix of the error terms (Train,2003, p. 127).

The normality assumption imposed on the residuals requires the applica-tion of the GHK simulator to approximate the integrals implicit in thesejoint normal probabilities. For illustrative purposes suppose that country ihas a closed FA and is not ‘‘treated’’ (neither a soft nor a floating ERR isadopted). Then the probability of adopting that policy mix is

PðKit ¼ 0;Fit ¼ 0;Sit ¼ 0; Þ ¼ Pð�1oK�it � 0;F�ito0;S�ito0Þ

¼ Pð�1� Lkitonit;ko� Lk

it; nit;fo� Lfit; nit;so� Ls

itÞ

¼ Pð�1� Lkitonit;ko� Lk

itjnit;fo� Lfit; nit;so� Ls

itÞ

� Pðnit;fo� Lfitjnit;so� Ls

itÞ

� Pðnit;so� LsitÞ

(11)


where Lkit ¼ SitZsk þ FitZ

fk þ X1;itbk þ SitX1;itd

sk þ FitX1;itd

fk, Ls

it ¼ X1;itPs1þ

Xs2;itP

s2, and L

fit ¼ X1;itP

f1 þ X

f2;itP

f2. In the third equality of Eq. (11),

the general rule of multiplication law is used to find the joint probabilityPð�1� Lk

itonit;ko� Lkit; nit;fo� L

fit; nit;so� Ls

itÞ. This law states thatwhen n events happen at the same time, and the events are dependent, thenthe joint probability PrðE1 \ E2 \ � � � \ EnÞ can be obtained by the multi-plication of n�1 conditional probabilities and one marginal probability(e.g., if n=3 then PðE1 \ E2 \ E3Þ ¼ PðE1 \ jE2 \ E3Þ � PðE2jE3ÞPðE3Þ).

42

One more transformation is needed to make the model more convenientfor simulation. Let Ln be the Cholesky factor associated with the variance–covariance matrix On ð¼ Ln � L0n)

Ln ¼

c11 0 0

c21 c22 0

c31 c32 c33

264

375

As is described in Train (2003, p. 127), using the Cholesky decompositionof On, the residuals of the three equations, which are correlated, can berewritten as linear combinations of uncorrelated standard normal variables:

nit;snit;fnit;k

264

375 ¼

c11 0 0

c21 c22 0

c31 c32 c33

264

375 �

�sit

�fit�kit

264

375 ¼ Ln � �it

where

�it ¼

�sit

�fit�kit

264

375!d N

0

0

0

264375;

1 0 0

0 1 0

0 0 1

264

375

0B@

1CA.

With this transformation the error differences nit;s, nit;f and nit;k arecorrelated because all of them depend on �sit. Now we can rewrite the latentindexes (utility differences) in the following way:

S�it ¼ Lsit þ c11�

sit

F�it ¼ Lfit þ c21�

sit þ c22�

fit

K�it ¼ Lsit þ c31�

sit þ c32�

fit þ c33�

kit

The probability described in Eq. (11) is hard to evaluate numerically interms of the n0s because they are correlated. However, using the Cholesky

RAUL RAZO-GARCIA220

decomposition of the variance–covariance matrix associated with theerror differences, On, this probability can be rewritten in such a way thatinvolves only independent random variables (�’ s). Hence, the probability inEq. (11) becomes a function of the univariate standard cumulative normaldistribution:

PðKit ¼ 0;Sit ¼ 0;Fit ¼ 0Þ

¼ P �kito�Lk

it � c31�sit � c32�fit

c33

��fito�L

fit � c21�

fit

c22; �sito�

Lsit

c11

!

� P �fito�L

fit � c21�sitc22

��sito�Lsit

c11

!� P �sito�

Lsit

c11

� �

¼

Z �Lfit�c21�

sit

c22

�1

Z �Lsit

c11

�1

F�Lk

it � c31�sit � c32�fit

c33

!fð�sitÞfð�

fitÞd�

sitd�

fit

�

Z �Lsit

c11

�1

F�L


!fð�sitÞd�

sit � F

�Lsit

c11

� �

To simulate this probability we use the GHK simulator described next.

1. Compute

P �sito�Lsit

c11

� �¼ F

�Lsit

c11

� �

2. Draw a value of �sit, labeled �s;qit , from a standard normal truncated atð�Ls

it=c11Þ. This draw is obtained in the following way:(a) Let mq1 be the qth element of the first Halton sequence of length Q(b) Calculate

�s;qit ¼ F�1 ð1� mq1ÞFð�1Þ þ mq1F�Ls

it

c11

� �� ¼ F�1 mq1F

�Lsit

c11

� ��

3. Compute

P �fito�L


��sit ¼ �s;qit

!¼ F

�Lfit � c21�

s;qit

c22

!

4. Draw a value of �fit, labeled �f ;qit , from a standard normal truncated atð�L

fit � c21�sit=c22Þ. This draw is obtained in the following way:

(a) Let mq2 be the qth element of the second Halton sequence of length Q


(b) Calculate

�f ;qit ¼ F�1 ð1� mq2ÞFð�1Þ þ mq2F�L


! !

¼ F�1 mq2F�L


! !

5. Compute

P �kito�Lk

it� c31�sit� c32�fit

c33

��sit ¼ �s;qit ; �fit ¼ �f ;qit

!¼ F

�Lkit� c31�

s;qit � c32�

f ;qit

c33

!

6. The simulated probability for this qth draw of �kit and �sit is calculated asfollows:

PðKit ¼ 0;Sit ¼ 0;Fit ¼ 0Þq ¼ F �Lsit

c11

� �� F

�Lfit � c21�

s;qit

c22

!

� F�Lk

it � c31�s;qit � c32�

f ;qit

c33

! (12)

7. Repeat steps 1–6 many times q=1,2,3,y,Q8. The simulated probability is

ePðKit ¼ 0;Sit ¼ 0;Fit ¼ 0Þ ¼1

Q

XQq¼1

PðKit ¼ 0;Sit ¼ 0;Fit ¼ 0Þq (13)

Four critical points need to be addressed when the GHK simulator isapplied for the maximum likelihood estimation (Train, 2003, p. 133). First,we have to make sure that the model is normalized for scale and levelof utility to ensure that the parameters are identified. Second, the GHKsimulator takes utility differences against the regime for which theprobability is being calculated, and so different differences must be takenfor countries choosing different regimes.43 Third, for a country choosing topeg its currency, the GHK simulator uses the covariance matrix On; for acountry choosing a soft peg it uses Os

n; while for a country with a floatingregime it needs Of

n . These three matrices are derived from a commoncovariance matrix O of the original errors (the nondifferentiated residuals inthe ERR equations). We must assure that the elements of On are consistent

RAUL RAZO-GARCIA222

with the elements of Osn and Of

n in the sense that the three matrices arederived from the O matrix. Fourth, the covariance matrices implied by themaximum likelihood estimates must be positive definite. We address theseissues in detail in the Appendix A.

6.2. Numerical Optimization

Initially we used the Davidon–Fletcher–Powell algorithm, a quasi-Newtonmethod, to maximize the objective function.44 However, in many cases,we obtained small changes in by and the objective function accompanied by agradient vector that was not close to zero, so this quasi-Newton numericalroutine was not effective at finding the maximum. Since Markov ChainMonte Carlo (MCMC) methods have been proposed to estimate problems,such as Powell’s least absolute deviation regression and the nonlinear instru-mental variables estimator, which represent a formidable practical challenge(see, e.g., Chernozhukov & Hong, 2003), the Metropolis–Hastings algorithm,a MCMC method, is implemented in this chapter to maximize the log-likelihood function. This estimation strategy is not new in Macroeconomics.Recently Coibion and Gorodnichenko (2008) used the MCMC methoddeveloped by Chernozhukov & Hong to estimate a dynamic stochasticgeneral equilibrium model to analyze firms’ price-setting decisions. They relyon this stochastic search optimizer to achieve the global optimum becausethe objective function is highly nonlinear in the parameters.

To ensure monotonicity of the cutoff points, we use the reparametrizationaj ¼ aj�1 þ expðajÞ for j=2,3 and estimate the unconstrained parameters a2and a3. In order to identify the constant term in the FA equation, we fixa1 ¼ 0. Additionally, since the probability described in the likelihoodEq. (11) is invariant to scale shifts, the three diagonal elements of thecovariance matrix On are arbitrary. One strategy is to set the parameters okk

and oss to unity.45 In terms of the Cholesky decomposition discussed above,this means the free parameters in Ln are now the three strict lower triangular

elements and off . Thus, cj;j ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1�

Pj�1i¼1c

2i;j

qfor j={k,s}. After imposing

these identification restrictions, the number of parameters to estimate is

reduced by three.To facilitate the convergence of the algorithm and to avoid the potential

bias that few hyperinflationary outliers can cause, we follow Ghosh, Gulde,and Wolf (2003) and von Hagen and Zhou (2002) and rescale the annualinflation by ðp=1þ pÞ.46 All covariates included in X1,it, except dummy


variables, POLITY and SPILL, are lagged by one period to attenuatepotential simultaneity of these variables. All the exogenous variables, exceptthe constant and discrete variables, are standardized. This significantlyimproves the performance of the optimization program. The source of thevariables and the list of countries used in the estimation are contained inthe Appendix C.

7. RESULTS

In this section, we report the estimates from different variants of the modeldescribed in Eqs. (2), (4), and (5). Our objective is twofold: (i) to investigatethe impact of exogenous changes of ER flexibility on financial openness,and (ii) to examine how the coefficients on the FA equation associatedwith ER flexibility are affected when the endogeneity of the ERR is notaccounted for. As mentioned above, we estimate the model first foradvanced economies and then for emerging markets because these twotypes of countries differ in their economic, political, and institutionalcircumstances.47 Table 2 presents the estimations of four different models.In model 1, we estimate the FA equation neglecting the potentialendogeneity of S and F and excluding the interaction terms.48 Model 2adds the interaction terms to model 1. These benchmark models will help usto assess how the estimated coefficients are affected when S and F aretreated as strictly exogenous. Model 3 treats the flexibility of the ERR(S and F) as endogenous and sets to zero all the interaction terms in Eq. (2).Finally, model 4 adds the interaction terms to model 3. In models 3 and 4,we use 750 Halton draws to simulate the multivariate normal integrals and200,000 MCMC draws to maximize the log-likelihood function.

7.1. Effects of Exchange Rate Flexibility on Financial Account Policies

Four major findings are based on model 4 (Table 2). First, the evidenceshows that the degree of ER flexibility strongly influences FA policies.After controlling for other factors influencing financial openness, a U-shapebehavior between the probability of lifting capital controls and ERflexibility is found. In other words, we find that the most intensive capitalrestrictions are associated with intermediate regimes. Consistent withthe trilemma of monetary policy, a country switching from a soft pegtoward a floating arrangement is more likely to remove capital controls

RAUL RAZO-GARCIA224

Table 2. Financial Account and Exchange Rate Regime Equations (1975–2006).

Advanced Countries Emerging Market Countries

Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4

y S.E. y S.E. y S.E. y S.E. y S.E. y S.E. y S.E. y S.E.

Financial account equation

Constant 0.604� 0.110 0.205 0.267 �0.335� 0.112 0.414� 0.160 0.710� 0.134 2.126� 0.366 1.108� 0.130 1.135� 0.229

Zsk �0.226 0.142 �1.717� 0.389 �0.055 0.109 �1.627� 0.223 �0.821� 0.107 �2.955� 0.539 �2.085� 0.084 �2.399� 0.302

Zfk 0.101 0.143 0.535 0.371 �0.306�� 0.135 �0.147 0.220 �0.835� 0.125 �3.144� 0.440 �0.701� 0.169 �0.646 0.416

FINDEV �0.077�� 0.042 �0.238 0.192 �0.087�� 0.040 �0.402 0.245 �0.388� 0.078 0.476� 0.137 �0.279� 0.051 0.383� 0.090

S*FINDEV 0.714� 0.229 0.829� 0.229 �1.100 � 0.207 �0.935 � 0.133

F*FINDEV 0.189 0.200 0.300 0.246 �1.068� 0.257 �0.452 � 0.149

RESERVES/M2 �0.228� 0.060 �0.075 0.283 �0.367� 0.081 �0.031 0.216 0.098�� 0.046 �0.525� 0.091 0.112� 0.033 �0.437� 0.074

S*(RESERVES/M2) �0.821�� 0.397 �0.631�� 0.299 1.056� 0.130 0.895� 0.097

F*(RESERVES/M2) �0.209 0.351 �0.103 0.239 0.714� 0.114 0.545� 0.082

GDP per capita 0.043 0.066 0.140 0.208 0.096 0.064 0.000 0.214 0.762� 0.141 3.742� 0.529 0.861� 0.087 2.534� 0.213

S*GDP per capita 0.151 0.288 0.382�� 0.213 �2.552� 0.566 �1.457� 0.255

F*GDP per capita �0.664� 0.237 �0.389 0.271 �4.939� 0.645 �2.934� 0.222

OPENNESS 0.472� 0.078 0.951� 0.174 0.632� 0.093 1.139� 0.194 0.324� 0.066 0.192 0.117 0.163� 0.038 0.311� 0.085

S*OPENNESS �0.672�� 0.323 �1.124� 0.295 �0.167 0.160 �0.270� 0.104

F*OPENNESS 0.346 0.425 �0.169 0.267 0.505�� 0.268 0.003 0.123

INFLATION �1.045� 0.104 �1.692� 0.432 �2.085� 0.122 �1.810� 0.348 �0.306� 0.065 �0.202 0.127 �0.167� 0.036 �0.260� 0.068

S*OPENNESS � 1.207�� 0.582 �1.025� 0.395 �0.313 0.194 �0.148 0.101

F*OPENNESS �0.275 0.508 �0.080 0.438 0.074 0.168 0.208� 0.071

RELATIVE SIZE 0.042 0.051 0.843�� 0.387 0.084� 0.033 0.795� 0.181 �0.375 0.327 �6.282� 1.187 �1.484� 0.304 �4.452� 0.732

S*(RELATIVE SIZE) �0.681�� 0.395 �0.536� 0.201 4.268� 1.655 2.155�� 1.027

F*(RELATIVE SIZE) �0.665�� 0.389 �0.607� 0.186 9.770� 1.476 4.489� 0.817

POLITY 0.022 0.044 �0.145 0.097 0.021 0.029 �0.242� 0.069

S*POLITY 0.140 0.119 0.222� 0.089

F*POLITY 0.623� 0.161 0.402� 0.082

ASIA �0.570� 0.102 �2.728� 0.249 �0.141�� 0.066 �2.398� 0.167

S*ASIA 2.537� 0.303 2.417� 0.179

F*ASIA 3.099� 0.311 2.801� 0.239

CRISIS �0.031 0.205 �0.158 0.288 �0.941� 0.124 �1.409� 0.156


Advanced Countries Emerging Market Countries

Model 1 Model 2 Model 3 Model 4 Model 1 Model 2 Model 3 Model 4

y S.E. y S.E. y S.E. y S.E. y S.E. y S.E. y S.E. y S.E.

Cutoffs

a2 0.354� 0.046 0.398� 0.053 0.339� 0.031 0.386� 0.032 0.657� 0.048 0.818� 0.063 0.497� 0.032 0.611� 0.042

a3 1.451� 0.081 1.634� 0.095 1.412� 0.056 1.586� 0.061 1.358� 0.074 1.764� 0.104 1.009� 0.058 1.425� 0.060

Intermediate exchange rate regime equation

Constant �0.236�� 0.134 0.124 0.175 0.796� 0.129 0.901� 0.107 �0.467�� 0.216 �0.467�� 0.216 �0.182 0.187 �0.285�� 0.123

SPILL �0.282 1.082 �0.274 1.082 2.251�� 1.088 1.960�� 1.023 0.923 1.034 0.923 1.034 3.809� 0.580 4.250� 0.544

SHARE 0.275� 0.044 0.288� 0.047 0.251� 0.049 0.247� 0.047 �0.061 0.052 �0.061 0.052 �0.115� 0.044 �0.045 0.044

FINDEV 0.077 0.066 0.069 0.063 0.026 0.063 �0.028 0.066 �0.159�� 0.080 �0.159�� 0.080 �0.351� 0.058 �0.195� 0.055

RESERVES/M2 �0.041 0.059 �0.102 0.123 0.618� 0.116 0.677� 0.103 0.043 0.051 0.043 0.051 0.174� 0.056 0.139� 0.043

GDP per capita 0.016 0.078 0.033 0.102 1.002� 0.088 0.929� 0.068 0.698� 0.136 0.698� 0.136 0.711� 0.158 1.028� 0.140

OPENNESS �0.244� 0.066 �0.306� 0.106 �1.095� 0.108 �1.091� 0.082 �0.153� 0.059 �0.153� 0.059 �0.168� 0.058 �0.260� 0.045

INFLATION 0.392� 0.082 0.790� 0.166 2.114� 0.153 2.031� 0.165 0.039 0.066 0.039 0.066 0.017 0.052 0.059 0.050

RELATIVE SIZE �0.543� 0.142 �0.377� 0.099 0.217� 0.072 0.224� 0.070 -2.518� 0.489 �2.518� 0.489 �1.186� 0.294 �2.093� 0.277

POLITY 0.009 0.045 0.009 0.045 0.041 0.035 0.051 0.036

ASIA 0.649� 0.106 0.64936� 0.106 0.7734� 0.085 0.741� 0.077

CRISIS �1.502� 0.233 �1.502� 0.233 0.917� 0.367 0.022 0.248

Floating exchange rate regime equation

Constant �0.747� 0.158 �1.017� 0.199 �4.516� 1.182 �5.256� 1.312 �0.952� 0.224 �0.952� 0.224 �3.689� 0.462 �1.694� 0.507

SPILL 0.843 1.157 0.843 1.157 5.491� 1.451 2.174�� 1.267 6.349� 1.304 6.349� 1.304 9.239� 0.664 6.615� 0.564

SHARE �0.179� 0.053 �0.189� 0.056 �0.991� 0.377 �1.552� 0.524 0.045 0.058 0.045 0.058 0.821� 0.246 0.251�� 0.147

FINDEV �0.114 0.071 �0.108 0.068 �0.658�� 0.293 �1.194� 0.424 �0.208�� 0.112 �0.208�� 0.112 �0.352 0.274 �0.380�� 0.200

RESERVES/M2 0.357� 0.068 0.728� 0.140 3.899� 1.134 5.420� 1.607 0.098�� 0.059 0.098�� 0.059 0.403�� 0.206 0.170 0.185

GDP per capita 0.348� 0.083 0.441� 0.106 3.245� 0.664 3.435� 0.610 �0.035 0.187 �0.035 0.187 0.278 0.564 0.343 0.613

OPENNESS �0.942� 0.140 �1.324� 0.197 �7.707� 1.332 �10.818� 2.573 �0.061 0.090 �0.061 0.090 �1.187� 0.261 �0.466� 0.172

INFLATION �0.030 0.070 �0.059 0.140 1.720� 0.459 1.071�� 0.514 �0.011 0.067 �0.011 0.067 0.032 0.214 0.244 0.187

RELATIVE SIZE 0.401� 0.149 0.282� 0.105 1.641� 0.478 2.249� 0.536 2.259� 0.465 2.259� 0.465 5.295� 0.573 5.050� 0.547

POLITY 0.143� 0.055 0.143� 0.055 0.906� 0.237 0.438� 0.146

ASIA �0.409� 0.128 �0.4089� 0.128 �0.8106� 0.292 �1.2336� 0.308

CRISIS 1.946� 0.209 1.946� 0.209 11.858� 1.023 7.299� 0.751

sff 35.298 52.459 28.828 0.000

sks 0.012 0.072 0.937 13.414

skf 1.672 2.352 �2.642 0.089

ssf 5.721 7.002 �0.843 �3.574

Memorandum

Observations 675 675 675 675 877 877 877 877

Endogeneity accounted for No No Yes Yes No No Yes Yes

log-likelihood �1358.70 �1300.37 �1166.20 �1109.00 �1514.83 �1406.11 �1323.70 �1218.70

Notes: *, **, *** denote coefficients statistically different from zero at 1%, 5%, and 10% significance levels, respectively.

Intermediate regimes include: preannounced crawling peg, preannounced crawling band that is narrower than or equal to �2%, de facto crawling peg, de

facto crawling band that is narrower than or equal to �2%, preannounced crawling band that is narrower than or equal to �2%, de facto crawling band that

is wider or equal to �5%, moving band that is narrower than or equal to �2% (i.e., allows for both appreciation and depreciation over time), de facto

crawling band that is narrower than or equal �2%.

Floating regimes include managed floating and freely floating arrangements.

(i.e., jZfkjojZskj). Our findings also show that a country switching from a peg

to a soft peg is less inclined to lift capital restrictions. The reluctance toremove capital controls is more prevalent among emerging countries.The intensification of capital controls can help to sustain the ERR.49 Usinga de jure ERR classification von Hagen and Zhou (2005, 2006) also find aU-shape relationship between the flexibility of the ER and the probabilityof lifting capital controls; however, this nonlinear relationship disappearswhen they use a de facto classification.

Second, the estimated coefficients are substantially different whenendogeneity is not accounted for from the estimates obtained when theERR is treated as an exogenous regressor (models 1 vs model 3; model 2 vsmodel 4).50 Regarding the effect of ER flexibility on the openness of the FA,the coefficient associated with floating regimes, Zfk, exhibits the largestdifference.51 In fact, for emerging markets, the coefficient estimated understrict exogeneity overestimates the (negative) effect of flexible regimesby a factor of 5 (�3.144 vs �0.646). Although advanced economies alsoexhibit a difference in that coefficient, this is not statistically significant.Moreover, for some of the exogenous variables, the difference is so largethat the estimated coefficients flip signs when S and F are treated asendogenous regressors. Among advanced countries, for example, the effectof international reserves on the propensity to adopt an intermediate regimechanges from a (insignificant) negative and unexpected effect to a positiveeffect when endogeneity is properly accounted for.

Third, the effect of the exogenous regressors on the propensity to openthe FA varies considerably with the intensity of ER flexibility (model 3 vsmodel 4). For advanced countries, for example, the coefficient associatedwith S in the FA equation increases by a factor of 29 when interactionterms are allowed.52 Therefore, the inclusion of interaction terms unmasksinteresting relationships between the exogenous variables, the degree of FAopenness, and the ERR. Fourth, the cutoffs points (i.e., the a0s) indicate thatemerging economies have been more cautious relative to advanced countriesin taking the first step to liberalize the FA.53

7.1.1. Determinants of the Financial AccountAdvanced Countries. According to Table 2, the degree of financial develop-ment, stock of international reserves, quality of the institutional framework,size of the economy, trade openness, and inflation affect the degree ofFA openness (model 4). The effect of these variables on the decision toremove capital restrictions, however, depends on the type of the ERRimplemented.54 The results of model 4 show important differences across

RAUL RAZO-GARCIA228

advanced and emerging countries. A major difference between these twotypes of countries is related to the interactions of F with the exogenousregressors. While the majority of these interactions are significant atconventional levels in the emerging market regressions, these are notstatistically significant in the advanced country estimations.55 This resultreflects the tendency of advanced economies to liberalize the FA underintermediate regimes (see Fig. 2).

Policymakers in advanced countries with developed financial systemshave displayed a higher propensity to remove capital restrictions when anintermediate regime is implemented than when a floating regime is in place.This result supports previous arguments that financial deepening andinnovation reduce the effectiveness of capital controls, and this, in turn,increases the propensity to liberalize the FA.

Per capita income is typically interpreted in this context as a measure ofeconomic development. A number of economists have found that moredeveloped countries are more likely to remove restrictions on capitalflows (see, e.g., Alesina et al., 1994; Grilli & Milesi-Ferretti, 1995; Leblang,1997). The observation that all of today’s high-income countries havelifted capital controls is consistent with the view that FA liberalization isa corollary of economic development and maturation (Eichengreen, 2002).Our results indicate that the effect of GDP per capita on the degree of FAopenness varies considerably between intermediate and flexible regimes.While the development of general legal systems and institutions, proxiedby per capita income, exhibits a positive correlation with the propensityto liberalize capital flows when an intermediate ERR is in place, thisvariable has no effect at the two ends of the ERR spectrum, pegs and flexibleregimes.56

Regarding the stock of international reserves, the results presented inTable 2 indicate that advanced countries with a high stock of foreignreserves and a soft peg have more closed FAs. The interaction between theERR and the degree of FA openness is one of the driving forces behindthis result. The reason is that advanced countries with large reserves offoreign currency are more keen to adopt intermediate regimes, and this,in turn, decreases the propensity to open the FA. As we mentioned above,a lower degree of FA openness may help, the monetary authorities tomaintain an intermediate ERR. Using a random effects probit model,Leblang (1997) obtains exactly the opposite correlation between reservesand financial openness.57 His estimates show that as countries run out ofinternational reserves, policymakers become more likely to impose capitalrestrictions.58 In the studies most closely related to our empirical model,


(von Hagen & Zhou, 2006; Walker, 2003), the stock of international reservesis excluded from the vector of determinants of financial openness.

Another important factor affecting FA policy is the degree of tradeopenness. There are important differences in the role played by this variableacross advanced and emerging countries and across ERR. Advancedeconomies with flexible rates exhibit a positive correlation between thedegree of trade and FA openness.59 Although advanced countries withintermediate regimes also exhibit a positive correlation, this is very close tozero (0.015=1.139�1.124). The positive correlation between trade and FAopenness is consistent with the idea that trade liberalization is a prerequisiteto liberalize the FA (McKinnon, 1993). The interaction of ER and FApolicies allowed in the model reinforces this correlation. The ERR equationsshow that advanced countries that are highly open to foreign trade havea stronger preference for fixed regimes. This preference for pegs is translatedinto a lower probability of choosing or maintaining an intermediate orfloating regime (i.e., S=0 and F=0 more likely), and this, in turn, increasesthe likelihood of removing capital controls.60

Based on our estimated model, we find that developed countries with highinflation are more inclined to impose capital controls. This finding supportsprevious arguments suggesting that governments compelled to resort toinflation tax are more likely to utilize capital controls to broaden the taxbase (e.g., Alesina et al., 1994; Leblang, 1997; Eichengreen & Leblang,2003). The asymmetric effects of the exogenous variables on the degreeof FA openness, caused by different degrees of ER flexibility, are clearlyobserved in this case again. When interactions between the endogenousand exogenous variables are included (model 4), the effect of inflationbecomes more negative in countries implementing an intermediate regimerelative to countries with flexible rates, �2.835 (=�1.81�1.025) and �1.890(=�1.890�0.08), respectively.61

It has been argued that small countries might benefit from risk sharing.The presumption involves the price of risky assets across countries. If theseprices differ across countries, there might be some benefits from allowingindividuals to trade assets (e.g., consumption smoothing). The gains fromrisk sharing may be larger for developing or small countries because thesecontribute less to global output than developed or large countries, making itless likely that their domestic output would be correlated with world output.Hence, more of their idiosyncratic risk can be eliminated by trading assetswith residents in other parts of the world. In addition, developing countries’GDP is often much more volatile than that of large or advanced countries,which means there is more scope to reduce the volatility of the output, and

RAUL RAZO-GARCIA230

this, in turn, may be translated into higher benefits from risk sharing.Our findings support this presumption for emerging markets but not foradvanced countries. Specifically, the estimated coefficient associated withthe relative size is negative for emerging markets but positive for advancedcountries. Hence, our results show that larger advanced economies tend tohave more open FA.

7.1.2. Emerging MarketsThe coefficients on S and F show, as they do for advanced economies,that emerging countries are less inclined to liberalize the FA when a soft pegis in place. Moreover, our results indicate a further tightening of capitalcontrols in emerging markets, relative to developed countries, when anintermediate regime is chosen. As is argued previously, countries followingthis practice can resort to capital restrictions to help maintain their ERRs.From Table 2 we conclude that the degree of financial development,the stock of international reserves, the quality of the institutional frame-work, the size of the economy, the degree of trade openness, inflation, anddemocratic level affect the degree of FA openness in emerging markets(model 4).

Emerging countries, implementing a soft peg or a flexible rate, show apositive and significant correlation between the stock of internationalreserves and financial openness. To rationalize this result it must kept inmind that the size of domestic financial liabilities that could potentially beconverted into foreign currency is equal to M2. Therefore, when the stock ofinternational reserves increases relatively to M2, the monetary authoritiesare in better shape to maintain an intermediate regime when there is asudden increase in the demand for foreign currency (e.g., a speculativeattack against the intermediate regime or a sudden capital flight). Also, ahigh stock of foreign reserves may provide the monetary authorities with thefunds necessary to reduce the excessive volatility of the ER when the countryis implementing a (managed) flexible regime. Thus, when the stock ofinternational reserves is sufficiently large the country has an additional tool,besides capital controls, to maintain a soft peg or to reduce the volatility ofthe ER. This finding, then, indicates that there might be some substitut-ability between capital controls and the stock of international reserves.Furthermore, policymakers in emerging markets react more to the stock ofinternational reserves than their counterparts in advanced economies. It canbe argued that our results might be biased due to the potential endogeneityof the stock of foreign exchange reserves and the degree of financialopenness.62 As mentioned above, this variable is lagged by one period in


order to mitigate this potential problem. Leblang (1997) also reports apositive relationship between foreign exchange and financial openness.

A developed financial system tends to make emerging countries with a peglift capital restrictions. On the contrary, emerging markets with developedfinancial systems and with either a soft peg or a flexible ER are less inclinedto open their financial markets to the world. These asymmetries in thecoefficients associated with financial development across ERR suggest thatthe quality of the institutional framework, including but not limited to thefinancial system, is what matters most to benefit from a more open FA.

Similar to advanced economies, emerging markets with developed generallegal systems and institutions are more likely to remove capital controlsunder an intermediate regime than when they allow their currencies to float(1.078 vs �0.400).63 A suggestive explanation of this result is the ‘‘fear offloating’’ hypothesis proposed by Calvo and Reinhart (2002). These authorsargue that the reluctance to allow the currency to float is more commonlyobserved among emerging market countries. Therefore, if an emergingcountry decides to open its FA, it would be more likely to do it under a softpeg than under a floating regime due to the ‘‘fear of floating.’’ The positivecorrelation between per capita income and financial openness is one of themost robust regularities in the literature on the determinants of capitalcontrols.64 This result is robust to the type of ERR classification, de jure orde facto, and sample period. Our contribution to the literature in this area isthe identification of a nonlinear effect of per capita income on FA opennessfor different degrees of ER flexibility. While previous studies find a higherpropensity to liberalize the FA when GDP per capita is high, we find thesame positive relationship for emerging markets with pegs and intermediateregimes but a negative correlation when a flexible ER is in place.

The positive correlation between trade openness and degree of FAopenness is consistent with McKinnon’s (1993) ordering theory, whichmaintains that the current account must be liberalized before the FA. Ourfindings also show that trade openness has its major impact on FA policiesat the two ends of the ERR spectrum, pegs and floating rates. Furthermore,if we consider the effect of trade openness on the ERR and how the latteraffects FA policy, we can conclude that emerging countries that are highlyopen to foreign trade are more likely to adopt a peg and, therefore, morelikely to remove capital restrictions. The results for inflation confirm ourexpectations that emerging markets experiencing high inflation are morereluctant to liberalize the FA. As matter of fact, among emerging countrieswith high inflation, the ones adopting a peg show a lower probability ofremoving capital controls. There are two suggestive explanations for this

RAUL RAZO-GARCIA232

result. The first is that capital restrictions in countries with high inflation canbe used to protect the tax base (e.g., Grilli & Milesi-Ferretti, 1995; Leblang,1997; Eichengreen & Leblang, 2003). The second is that those countriesmight be more reluctant to liberalize their FA in order to sustain their pegswhen inflation is high (to prevent capital flight).65 Small emerging marketswith either a peg or an intermediate regime tend to have more open FAs.This finding supports the risk-sharing idea. The evidence also indicates thatthe effect of relative size decreases, in absolute value, with the flexibility ofthe ER. In fact, when an emerging market switches from an intermediate toa floating regime, the correlation between the size of the economy and FAopenness changes from negative to positive.

Emerging markets with flexible ERs and democratic political systemshave more open FA. In particular, when an emerging economy moves froman intermediate to a floating regime, the democratic level has the expectedeffect on the liberalization of the FA and plays a more important role in thatprocess.66 This result contrasts dramatically with the null effect of thedemocratic level on the presence of controls found by von Hagen and Zhou(2006). The results for the dummy variables indicate that, relative to otheremerging countries, Asian economies with either a soft peg or a floatingregime maintain more open FA. Finally, we observe that emerging marketsexperiencing currency crises or hyperinflations (CRISIS) are more reluctantto liberalize the FA. This last result suggests that maintaining or imposingcapital controls helps emerging markets to cope with financial crises(e.g., Malaysia during the Asian crisis).

7.2. Determinants of the Exchange Rate Regime

Again we will focus on the coefficients of model 4 (Table 2). The main resultregarding the choice of ERR is related to the significance of the externalinstruments, SPILL and SHARE. Both variables have a strong and asignificant relationship with the ER arrangement. In all the estimationscontrolling for endogeneity, the coefficient associated with SPILL is, asexpected, positive and statistically significant. Based on these estimates,we could obtain evidence indicating that policymakers tend to adoptmore flexible ERRs when intermediate and floating regimes are widelyaccepted around the world (e.g., high SPILL). Regarding the secondexternal instrument, SHARE, we find policymakers in advanced countrieswith a high geographical concentration of trade are less likely to allowthe ER to float but more likely to implement an intermediate regime.


Unexpectedly, emerging markets display a tendency to implement floatingrates when the geographical concentration of trade is high. The coefficientsassociated with SPILL and SHARE support the relevance of the instru-ments and our assumption that these variables can be used to identifyexogenous changes of ER flexibility on financial openness.

7.2.1. Advanced CountriesConsistent with OCA theory, large advanced economies with low volumeof trade adopt more flexible ERR (intermediate or flexible regimes).Furthermore, all else being equal, an increase in the size of the economy byone standard deviation or a decrease in the volume of trade by one standarddeviation increases the preference for flexible regimes more than thepreference for soft pegs. Thus, a large advanced country with a low degreeof trade openness has a higher preference for flexible regimes. The argumentis that in large economies the tradable sector has a lower participation intotal production than in small economies. Therefore, the gains from fixingthe ER are lower in large countries.

Industrial countries with developed general legal systems and institutions(i.e., with high per capita income), high inflation, or a large stock ofinternational reserves are more likely to let the ER to fluctuate to someextent – implementing either a soft peg or a floating regime. The positivecorrelation between the probability of adopting an intermediate or a floatingregime and the stock of foreign reserves can be explained by the self-insurance motive of holding international reserves. This motive tells us thata buffer stock represents a guaranteed and unconditional source of liquiditythat can be used in bad states of nature (e.g., speculative attacks and capitalflight). Although advanced countries may have better access to internationalcapital markets during bad states of nature (e.g., speculative attacks),foreign reserves can be used immediately to maintain an intermediate ER orto prevent large swings of the ER in a (managed) floating environment.

Regarding per capita income, we found that advanced economies withweak institutional frameworks are, as expected, more likely to peg. Thisresult captures the tendency for countries with weak institutions to rely oncurrency boards, dollarization, and other forms of de facto pegs to solvecredibility problems. The positive association between inflation and theprobability of implementing an intermediate regime might be due to theneed by countries experiencing high inflation to attain low-inflationmonetary policy objectives. One common anti-inflationary strategy is touse the ER, either a peg or an intermediate regime, as a nominal anchor.Some European countries, for example, were required to attain price

RAUL RAZO-GARCIA234

stability to have access to the European Economic and Monetary Union(EMU).67 Also, the Exchange Rate Mechanism implemented in Europerequired European countries to attain ER stability through the implementa-tion of an intermediate ERR.68 Thus, the estimated correlation betweeninflation and the probability to adopt an intermediate regime might reflectthe policies implemented in Europe before the implementation of the euro.

As predicted by OCA theory, advanced countries with a lower geographicconcentration of trade are more prone to implement a floating regimesince a lower geographical concentration of trade decreases the benefitsfrom pegs or intermediate regimes. Moreover, industrial economies witha higher degree of concentration in exports show a stronger preferencetoward intermediate regimes than toward hard pegs or de facto pegs.Finally, financial deepening tends to make advanced countries select hard orde facto pegs. This might be the consequence of the European ExchangeRate Mechanism and the recent move of European countries to a singlecurrency.69

7.2.2. Emerging MarketsThe estimated coefficients of model 4 presented in Table 2 indicate thatemerging countries with high inflation, large foreign reserves holdings,transparent political systems (e.g., democracy), developed institutionalframeworks, and low levels of trade openness maintain more flexibleregimes. The last result is in line with OCA theory. A positive coefficient onforeign international reserves is expected since emerging countries andparticularly fast-growing Asian countries no longer see the IMF as a lenderof last resort, causing them to accumulate foreign exchange reserves as aself-insurance against currency crises and sometimes as a mechanism tosustain or to manage the ER (e.g., to maintain an artificially low value fortheir currency). This result is consistent with that in von Hagen and Zhou(2005). Also, as we mentioned above, countries can accumulate a large stockof international reserves to prevent large swings of the ER in a flexible ERenvironment.

Not surprisingly, emerging markets with nontransparent domesticpolitical systems are more inclined to peg relative to countries with moretransparent political systems (e.g., democracies). Nondemocratic countriescan choose a less flexible ERR as a commitment mechanism to help themmaintain credibility for low-inflation monetary policy objectives. Theability to adopt an ER with some flexibility is positively related to thelevel of development (i.e., the quality of the institutional environment).70

This finding captures the tendency for emerging countries with a poor


institutional framework to rely on pegs to solve credibility problems. In fact,the effect of institutional quality is bigger in the intermediate ER equationthan in the flexible ER equation. The last result indicates that emergingcountries with high GDP per capita prefer ERR with limited flexibility.Again, this is consistent with the ‘‘fear of floating’’ hypothesis.

As expected by OCA theory, larger emerging countries are more inclinedtoward floating rates, while smaller countries are more attracted toward softpegs. Like advanced economies, emerging markets with underdevelopedfinancial systems prefer flexible rates over intermediate regimes. This resultreflects the difficulties of maintaining a soft peg when the central bank doesnot count with the necessary tools to defend the currency. Surprisingly, theresults show a positive relationship between the degree of concentration inexports and the implementation of a flexible regime.

There are important differences in the selection of the ERR betweenAsian and non-Asian emerging economies. While the former are moreinclined to adopt soft pegs, the latter are more inclined to let the ER float.Hence, we can conclude that Asian emerging markets have shown higher‘‘fear of floating’’ relative to other emerging economies. This result supportsthe idea of Dooley, Folkerts-Landau, and Garber (2003) regarding theinformal Bretton Woods II system.71 Finally, emerging markets classified as‘‘freely fallers’’ by RR (CRISIS=1) are more inclined to float. This findingreflects the common tendency to move, voluntarily or not, to a freelyfloating regime at the onset of a currency crisis. Mexico in 1994, Thailand in1997, and Argentina in 2001 are examples of countries that ended upfloating after a speculative attack.

Finally, one question that can be answered with our model is how theresults and statistical inference are affected if advanced and emergingmarkets are pooled in one group. The results of the pooled analysis areshown in Appendix B (Table B1). Comparing the results of model 4 inTables 2 and B1, we verify that trends and preferences specific to emergingor advanced countries are masked when countries are pooled in one group.There are some cases in which the coefficient is statistically significant in thepooled regression but significant only for one type of country. There areother cases in which the estimated coefficients are not statistically significantwhen emerging and advanced countries are grouped in the same categorybut significant for emerging or developed countries. This evidence supportsthe argument that distinguishing countries by their stage of economic andfinancial development may help unmask trends that are specific to advancedor emerging markets.72

RAUL RAZO-GARCIA236

8. CONCLUDING REMARKS

In this chapter, we propose a trivariate probit model to investigate theeffect of exogenous changes of ER flexibility on the openness of the FA.To identify this effect, we use a measure of the world’s acceptance ofintermediate and floating regimes and the geographical concentration oftrade as instruments. Some of the major findings are the following: First,a U-shape behavior between the probability of lifting capital controlsand ER flexibility is found. While pegs and flexible rates do not imposeconstraints on FA policies, the adoption of an intermediate regimedecreases the probability to remove capital controls. Second, the effect ofER flexibility on the degree of FA openness changes considerably betweencountries implementing a regime with limited flexibility (intermediateregime) and economies with a flexible rate. Our results predict a tighteningof capital controls when an intermediate regime is adopted.

Third, treating ER flexibility as an exogenous regressor leads, in general,to an overestimation of the impact of ER flexibility on the degree ofopenness of the FA. Moreover, our findings indicate that the effect of theexogenous regressors varies considerably with the intensity of ER flexibility.For example, the negative correlation between inflation and the likelihoodof liberalizing the FA is lower (more negative) when ER fluctuations arelimited (i.e., intermediate regime) than that when the ER is fully flexible.Fourth, we found that a network effect plays a role on the choice of ERR.Specifically, we found that policymakers are more likely to adopt moreflexible ERRs when these regimes are widely accepted in the world. Finally,relative to other emerging markets, Asian countries displayed more ‘‘fear offloating.’’

Future extensions of this chapter may be the development of a de factoindex of financial openness, to analyze controls to capital outflowsseparately from controls to capital inflows, to include other developingcountries, and to develop an index of financial openness from BFOI usingweights for the different types of capital controls. Another interestingextension can be the the possibility of controlling time invariantunobserved heterogeneity and to allow for dynamic relationships amongthe variables (feedback effects from lagged dependent variables to currentand future values of the explanatory variables). This dynamics might becrucial in FA equations since the decision to imposed capital controls canalso be affected by lagged capital controls and by other macroeconomicvariables.


NOTES

1. A more flexible regime might generate the incentives to develop the foreignexchange market, produce an awareness of currency risk and the need for hedgingforeign exchange exposures, allow policymakers to tailor monetary policy to bufferthe economy against shocks, and avoid the creation of ‘‘one-way bets,’’ therebypreventing speculators from all lining up on one side of the market that creates lossesin the event that expectations of revaluations or devaluations are disappointed.2. Currency mismatch occurs when a country’s banks and operating companies

have their assets denominated in domestic currency, but their liabilities denominatedin foreign currency so that a large depreciation generates a significant decline in theeconomy’s net worth.3. The Japanese liberalization process implemented in the early 1960s is another

example of how FA policies might be affected when a more flexible regime isadopted. At that time Japan embarked on a gradual removal of capital flowrestrictions by implementing the Basic Plan for Liberalization of Trade and ForeignExchange. More than a decade later, in 1973, the yen moved from a peg to a moreflexible arrangement against the U.S. dollar, increasing the exposure of the Japaneseeconomy to international shocks. This, in turn, affected FA policy and the pace ofliberalization. In order to prevent short-term capital inflows and outflows fromdestabilizing the foreign exchange market during periods of international turmoil,caused by the two oil crises in the 1970s and the ‘‘learning to float’’ period, Japanthen implemented a dizzying series of changes in foreign exchange controls andregulations affecting FA transactions (Aramaki, 2006). Immediately after the secondoil shock occurred in 1979, the yen started to depreciate and the Japanese authoritiestook measures to encourage capital inflows to stabilize the ER.4. For example, in the analysis of the evolution and performance of ERR

presented by Rogoff, Husain, Mody, Brooks, and Oomes (2004), only 5 studies outof 14 controlled for the effects of FA openness on the choice of ERR (Holden,P, Holden, M, & Suss, 1979; Savvides, 1990; Edwards, 1996; Frieden, Ghezzi, &Stein, 2000; Poirson, 2001) and only one of them dealt with the endogeneity problem(Poirson, 2001).5. Red bus versus blue bus being the canonical example.6. Calvo and Reinhart (2002) dubbed ‘‘fear of floating’’ to fear of large currency

swings. They argue that this fear might arise from a combination of lack ofcredibility, a high pass through from ER to prices, and inflation targeting; fromliability dollarization; or from an output cost associated with ER fluctuations.7. We obtain a probit model if we assume that the probability density function of

�it is the standard normal distribution ð�it � Nð0; 1ÞÞ. If we rather assume that �itfollows a logistic distribution, then a logit model is obtained.8. Some variants of this specification are multinomial or ordered models.9. Many countries that have removed capital controls have experienced

substantial capital inflows when capital controls are lifted (e.g., Italy, New Zealand,Uruguay, and Spain).10. See Hanson (1995) and Bartolini and Drazen (1997a and 1997b).11. FA restrictions could be seen as a further form of prudential regulation. Those

types of restrictions preclude the banks from funding themselves offshore in foreign

RAUL RAZO-GARCIA238

currency, while prudential supervision and regulation prevent them from makingforeign currency-denominated loans to firms in the nontraded goods sector.12. Exceptions are Eichengreen and Leblang (2003) and von Hagen and Zhou

(2006). The former estimate a bivariate probit model, while the latter use asimultaneous equation model.13. Quinn and Inclan (1997) found that the effect of the political leaning of the

government on the decision to lift or impose capital controls depends on theabundance of skilled labor. In this respect, their evidence suggests that leftistgovernments support financial openness, where skilled labor is abundant.Conversely, leftist governments in nations without a strong advantage in skilledlabor tend to restrict capital flows.14. The same framework has been used to explain the choice of intermediate or

floating regimes.15. Examples are foreign portfolio investment, for the case of the FA, and

nominal ER and macroeconomic variables related to the foreign exchange marketfor de facto ERR classifications.16. Regarding the ‘‘inconclusive’’ regimes Bubula and Otker-Robe (2002) argue

that countries such as France and Belgium with obvious ERR (horizontal bands inIMF classifications and with de facto pegs in Reinhart and Rogoff (2004)) areclassified as ‘‘inconclusive.’’ In spite of this potential misclassification, in their latestupdate, less than 2% of the regimes are classified as ‘‘inconclusive.’’17. As Reinhart and Rogoff (2004) argue, under official peg arrangements dual or

parallel rates have been used as a form of back door floating.18. A country’s ER arrangement is classified as ‘‘freely falling’’ when the

12-month inflation rate is equal to or exceeds 40% per annum, or the 6 monthsfollowing an ER crisis where the crisis marked a movement from a peg or anintermediate regime to a floating regime (managed or freely floating). For moredetails on this classification, see the Appendix in Reinhart and Rogoff (2004).19. The index is the sum of 12 components related to capital flows: (1) controls on

inflows of invisible transactions (proceeds from invisible transactions and repatria-tion and surrender requirements); (2) controls on outflows of invisible transactions(payments for invisible transactions and current transfers); (3) controls on inflows ofinvisible transactions from exports; (4) controls on inflows pertaining to capital andmoney market securities; (5) controls on outflows pertaining to capital and moneymarket securities; (6) controls on inflows pertaining to credit operations; (7) controlson outflows pertaining to credit operations; (8) controls on inward direct investment;(9) controls on outward direct investment; (10) controls on real estate transactions;(11) provisions specific to commercial banks; and (12) ER structure. This 11 dummyvariables are obtained from the Annual Report on Exchange Arrangements andExchange Restrictions published by the IMF.20. Brune’s ER structure component is a binary variable assuming the value of 1

when a country has dual or multiple ER.21. A list of the countries included in the analysis is provided in the Appendix C.22. In terms of the sequencing of these two policies, this tendency suggests that

countries tried to learn how to live with a more flexible ERR before liberalizing the FA.23. The advocates of the ‘‘bipolar view’’ state that the intermediate ERRs are

disappearing in favor of the two corners, hard pegs and flexible arrangements.


24. For example, Zsk is the parameter associated with the effect of soft pegs(intermediate regimes) on financial openness. k Stands for capital (flows) that areintrinsically related to the FA.25. See Section 3.2 for definitions of pegs, intermediate and flexible regimes.26. In heteroscedastic probit models, the variance of the error term is assumed to

be a function of a vector of regressors z. For example, a common practice is toassume that ðnit;kjX1;it ¼ x1;it;Sit;FitÞ � Nð0; s2it;kÞ with

ffiffiffiffiffiffiffiffis2it;k

q¼ ðexpðz0itxÞ.

27. Provided we had an instrument for each endogenous dummy variable zj suchthat SjzS ;X1;it ¼ x1;it � NðmS ; s

2SÞ and F jzF ;X1;it ¼ x1;it � NðmF ; s

2F Þ the reduced

form for K would also be a probit model, and, therefore, the parameters in Eq. (2)could be estimated by using a two-stage method.28. The propensity to implement a peg, the third regime, is described by the

following latent index: P�it ¼ X1;itPp1 þ X

p2;itP

p2 þgnit;p.

29. Therefore, the first two residuals in nit are the difference in errors forintermediate regimes and pegs and the difference in errors for flexible regimes andpegs.30. Vech(M) is the vector obtained from stacking the lower triangular part of the

symmetric matrix M on top of each other.31. The polity variable ranges from 10 (strongly democratic) to �10 (strongly

autocratic).32. We admit that the per capita GDP can be a crude proxy for the development

of general legal systems and institutions. The reason is that there might be countrieswith high per capita income and indices of economic and institutional developmentbelow those from advanced economies. In spite of this concern, we use per capitaGDP as a proxy of the institutional development under two arguments: (i) onaverage a positive correlation is expected between per capita income and thedevelopment of general legal systems and institutions (see, e.g., the widely cited studyof Acemoglu, Johnson, & Robinson, 2001; and (ii) it is difficult to find othermeasures for this variable.33. Klein regresses the logarithm of per capita income on a composite index of

five series measuring institutional quality. The five components of the index areBureaucratic Quality, Control of Corruption in Government, Risk of Expropriation,Repudiation of Government Contracts, and Rule of Law.34. Typically overinvoicing of imports or underinvoicing of exports.35. Therefore, X2;it ¼ ½SPILLit;SHAREit�.36. For example, in 2000, about 88% of the Mexican exports went to the

United States. In this case, the variable SHARE for Mexico in period t=2,000 isequal to 0.88.37. In another study, Frieden et al. (2000) use the variable VIEWS to measure the

percentage of countries in the world under fixed ERRs. Since the correlation betweenthe VIEWS variable and a time trend included in their regression turned out to beextremely high (�0.96), they only present the results using the time trend.38. The members of the CIS included in their regressions are Armenia, Azebaijan,

Belarus, Georgia, Kazakhstan, Kyrgyz Republic, Moldova, Russia, Tajikistan,Turkmenistan, Ukraine, and Uzbekistan. The non-CIS group is composed ofBulgaria, Czech Republic, Hungary, Poland, Romania, Slovak Republic, Slovenia,Estonia, Latvia, Lithuania, Albania, Croatia, and Macedonia.

RAUL RAZO-GARCIA240

39. The sign of the estimated coefficients and their significance level are notinvariant across ERR classifications: de jure versus de facto. While the evidenceshows that countries with geographically concentrated foreign trade are less likely toadopt a fixed ERR when the de jure classification is used, the opposite effect is foundwhen the de facto classification is used.40. Halton draws obtained from a Halton sequence can provide better coverage

than simple random draws because they are created to progressively fill in the unitinterval evenly and even more densely. In fact, it is argued that for mixed logitmodels 100 Halton draws might provide more precise results than 1000 randomdraws in terms of simulated errors (Train, 2003, p. 231).41. In this context, a choice probability refers to the likelihood that a country

chooses a particular outcome from the set of all possible outcomes (e.g., probabilityof choosing a peg and a closed FA). To simplify notation the conditioning in theexogenous regressors, X1it and X2it, is omitted in all the probabilities described in thechapter.42. In fact, for each joint probability involving n events, there are n! alternative

combinations of conditional and marginal probabilities.43. We take utility differences only between the ERRs.44. We use the refinement of the updating formula for the approximation of the

inverse of the Hessian proposed by Broyden, Fletcher, Goldfarb, and Shannon. Thisnumerical algorithm is suitable when the evaluation of the Hessian is impractical orcostly and the log-likelihood function is close to quadratic.45. Note that c11 is the (1,1) element of the Cholesky factor of On. It is important

to mention that all the elements of the Cholesky factors of the intermediate andfloating regimes, Ls and Lf, are functions of the elements of the Cholesky factor Ln.In fact, all the elements of Ln are estimated along with the parameters of the randomutilities.46. This transformation measures the depletion of the real value of the currency.47. In the Appendix B, we show the results of a pooled model using data of both

advanced and emerging markets.48. In this model, the dummy variables S and F are treated as strictly

exogenous.49. See, for example, Begg et al. (2003) and von Hagen and Zhou (2006).50. This does not necessarily mean that the difference in the estimated coefficients

is the result of an endogeneity bias. In fact, the observed difference might be just asampling difference.51. Compared to the difference exhibited by Zsk.52. Notice that bZsk in model 4 changes from �0.055 to �1.627 for advanced

economies.53. Notice that the estimate of a2 for advanced economies is almost half of the

estimate for emerging markets. This indicates that, all else being equal, emergingeconomies need higher values of the explanatory variables to jump from the partiallyclosed state toward a partially open FA.54. Note that in this context the effect of an exogenous variable, say X, on the

propensity to open the FA depends on the ERR. For example, when we refer to theeffect of X under a float, we refer not only to the interaction term but also to the sumof the coefficients associated with X and (F � X).


55. Except for relative size, which is statistically different from zero in advancedcountries with floating regimes.56. Although the coefficient on the interaction of per capita income and F is

negative, it is not statistically significant.57. Of all the studies included in Table 1, only Leblang (1997) includes foreign

exchange in the vector of ‘‘exogenous’’ determinants of capital controls. There are atleast two reasons to explain why our results differ from Leblang’s findings. First, hedoes not control for the endogeneity of the ERR. Second, he estimates a univariatemodel.58. Leblang also obtains evidence that this effect is more intense in countries with

fixed ER.59. This effect is equal to 0.970=1.139�0.169.60. A negative relationship between trade openness and capital controls is

commonly found in the empirical literature. (See, e.g., Grilli & Milesi-Ferretti, 1995;von Hagen & Zhou, 2006).61. Surprisingly, von Hagen and Zhou (2006) and Walker (2003) do not include

inflation as one of the determinants of capital controls.62. The argument that the accumulation of foreign exchange reserves may

substitute for what would otherwise be private sector capital outflows in countrieswith capital controls supports the presumption that FA policy affects the demand ofreserves.63. The effect of per capita income on FA openness in an emerging country with

a soft peg is equal to 1.077=2.534�1.457, and equal to �0.4=2.534�2.934 when afloating regime is implemented.64. See, for example, Eichengreen and Leblang (2003), Leblang (1997), and von

Hagen and Zhou (2006).65. In our ERR equations, we find that countries with high inflation rates are

more prone to peg their currencies.66. POLITY was dropped from the regressions associated with advanced

economies because the between and within variations of this variable were very small.67. The Maastricht Treaty required members who wanted to become part of the

European Economic and Monetary Union to attain price stability: a maximuminflation rate 1.5% above the average of the three lowest national rates amongEuropean Union members. The Maastricht Treaty was a provision calling for theintroduction of the single European currency and European central bank no laterthat the first day of 1999. By 1993, all 12 countries then belonging to the EuropeanUnion had ratified the Maastricht Treaty: France, Germany, Italy, Belgium,Denmark, Ireland, Luxembourg, the Netherlands, Spain, Portugal, United King-dom, and Greece. Austria, Finland, and Sweden accepted the Treaty’s provisionsupon joining the European Union in 1995.68. The European Monetary System defined the Exchange Rate Mechanism

to allow most currencies to fluctuate +/�2.25% around target ERs (France,Germany, Italy, Belgium, Denmark, Ireland, Luxembourg, and the Netherlands).This mechanism allowed larger fluctuations (+/�6%) for currencies of Portugal,Spain, Britain (until 1992), and Italy (until 1990).69. After the Maastricht Treaty, many European members have been classified as

de facto fixers.

RAUL RAZO-GARCIA242

70. See Hausmann et al. (2001).71. It has been argued that Pacific Asian countries are formally or informally

managing their currencies with respect to the U.S. dollar in similar fashion as theydid during the Bretton Woods system.72. See, for example, Eichengreen and Razo-Garcia (2006). These authors showed

that splitting the countries into emerging and advanced countries helps reconcile the‘‘bipolar’’ and ‘‘fear of floating’’ views.

REFERENCES

Acemoglu, D., Johnson, S., & Robinson, J. A. (2001). The colonial origins of comparative

development: An empirical investigation. American Economic Review, 91(5), 1369–1401.

Alesina, A., Grilli, V., & Milesi-Ferretti, G. (1994). The political economy of capital controls.

In: L. Leiderman & A. Razin (Eds), Capital mobility: The impact on consumption,

investment and growth (pp. 289–321). Cambridge: Cambridge University Press.

Aramaki, K. (2006). Sequencing of capital account liberalization – Japan’s experiences and their

implications to China. Public Policy Review, 2(1).

Barro, R., & S. Tenreyro (2003). Economic effects of currency unions. NBER Working Paper

No. 9435. National Bureau of Economic Research Inc., Cambridge, MA.

Bartolini, L., & Drazen, A. (1997a). Capital account liberalization as a signal. American

Economic Review, 87(1), 249–273.

Bartolini, L., & Drazen, A. (1997b). When liberal policies reflect external shocks, what do we

learn? Journal of International Economics, 42(3-4), 249–273.

Begg, D., Eichengreen, B., Halpern, L., von Hagen, J., & Wyplosz, C. (2003). Sustainable

regimes of capital movements in accession countries. Policy paper no.10, Centre for

Economic Policy Research, London.

Bernhard, W., & Leblang, D. (1999). Democratic institutions and exchange rate commitments.

International Organization, 53(1), 71–97.

Blomberg, B., Frieden, J., & Stein, E. (2005). Sustaining fixed rates: The political economy of

currency pegs in Latin America. Journal of Applied Economics, VIII(2), 203–225.

Broz, L. (2002). Political economy transparency and monetary commitment regimes.

International Organization, 56(4), 861–887.

Bubula, A., & Otker-Robe, I. (2002). The evolution of exchange rate regimes since 1990:

Evidence from De Facto policies. IMF Working Paper no. 02/155. International

Monetary Fund, Washington, DC.

Calvo, G., & Reinhart, C. (2002). Fear of floating.Quarterly Journal of Economics, 47(2), 379–408.

Carrasco, R. (2001). Binary choice with binary endogenous regressors in panel data: Estimating

the effect of fertility on female labor participation. Journal of Business & Economic

Statistics, 19(4), 385–394.

Chang, R., & Velasco, A. (2006). Currency mismatches and monetary policy: A tale of two

equilibria. Journal of International Economics, 69(1), 150–175.

Chernozhukov, V., & Hong, H. (2003). An MCMC approach to classical estimation. Journal of

Econometrics, 115(2), 293–346.


Coibion, O., & Gorodnichenko, Y. (2008). Strategic interaction among heterogeneous price-

setters in an estimated DSGE model. Working Paper no. 14323. National Bureau of

Economic Research, Inc., Cambridge, MA.

Collins, S. M. (1996). On becoming more flexible: Exchange rate regimes in Latin America and

the Caribbean. Journal of Development Economics, 51, 117–138.

Dooley, M. P., Folkerts-Landau, D. & Garber, P. (2003). An essay on the revived Bretton Woods

system. NBER Working Paper no. 5756. National Bureau of Economic Research, Inc.,

Cambridge, MA.

Edwards, S. (1996). The determinants of the choice between fixed and flexible exchange rate

regimes. NBER Working Paper no. 5756. National Bureau of Economic Research, Inc.,

Cambridge, MA.

Eichengreen, B. (2002). Capital account liberalization: What do the cross country studies show

us? World Bank Economic Review, 15, 341–366.

Eichengreen, B. (2004). Chinese currency controversies. CEPR Discussion Paper no. 4375.

CEPR Discussion Papers.

Eichengreen, B. (2005). China’s exchange rate regime: The long and short of it. University of

California, Berkeley.

Eichengreen, B., Hausmann, R., & Panizza, U. (2003). Currency mismatches, debt intolerance

and original sin: Why they are not the same and why it matters. NBERWorking Paper no.

10036. National Bureau of Economic Research, Inc., Cambridge, MA.

Eichengreen, B., & Leblang, D. (2003). Exchange rates and cohesion: Historical perspectives

and political economy considerations. Journal of Common Market Studies, 41, 797–822.

Eichengreen, B., & Razo-Garcia, R. (2006). The international monetary system in the last and

next 20 years. Economic Policy, 21(47), 393–442.

Frieden, J., Ghezzi, P., & Stein, E. (2000). Politics and exchange rates: A cross-country approach

to Latin America. Research Network Working Paper no. R421. Inter-American

Development Bank, Washington, DC.

Ghosh, A., Gulde, A., & Wolf, H. (2003). Exchange rate regimes: Choices and consequences.

Cambridge, MA: MIT Press.

Grilli, V., & Milesi-Ferretti, G. (1995). Economic effects and structural determinants of

capital controls. IMF Working Paper WP/95/31. International Monetary Fund,

Washington, DC.

Hanson, J. (1995). Opening the capital account: Costs, benefits and sequencing. In: S. Edwards

(Ed.), Capital controls, exchange rates and monetary policy in the world economy.

Cambridge: Cambridge University Press.

Hausmann, R., Panizza, U., & Stein, E. (2001). Why do countries float the way they float?

Journal of Development Economics, 66(2), 387–414.

Holden, P., Holden, M., & Suss, E. (1979). The determinants of exchange rate flexibility:

An empirical investigation. The Review of Economics and Statistics, 61(3), 327–333.

Juhn, G., & Mauro, P. (2002). Long-run determinants of exchange rate regimes: A simple

sensitive analysis. IMF Working Paper no. 02/104, p. 31. International Monetary Fund,

Washington, DC.

Klein, M. (2005). Capital Account Liberalization, Institutional Quality and Economic Growth:

Theory and Evidence. NBER Working Paper no. 11112, p. 37. National Bureau of

Economic Research, Inc., Cambridge, MA.

Klein, M., & Marion, N. (1997). Explaining the duration of the exchange-rate pegs. Journal of

Development Economics, 54, 387–404.

RAUL RAZO-GARCIA244

Lahiri, A., Singh, R., & Vegh, C. (2007). Segmented asset markets and optimal exchange rate

regimes. Journal of International Economics, 72(1), 1–21.

Leblang, D. (1997). Domestic and systemic determinants of capital controls in the developed

and developing countries. International Studies Quarterly, 41(3), 435–454.

Leblang, D. (2003). To defend or to devalue: The political economy of exchange rate policy.

International Studies Quarterly, 47(4), 533–560.

Leblang, D. (2005). Is democracy incompatible with international economic stability? In:

M. Uzan (Ed.), The future of the international monetary system. London: Edward Elgar

Publishing.

Levy-Yeyati, E., Reggio, I., & Sturzenegger, F. (2002). On the endogeneity of exchange rate

regimes. Universidad Torcuato Di Tella, Business School Working Papers no. 11/2002.

Buenos Aires, Argentina.

Levy-Yeyati, E., & Sturzenegger, F. (2005). Classifying exchange rate regimes: Deeds vs.

Words. European Economic Review, 49, 1603–1635.

McKinnon, R. (1993). The order of economic liberalization. Baltimore: The John Hopkins

University Press.

Mundell, R. (1961). A theory of optimum currency areas. American Economic Review, 51(4),

657–665.

Obstfeld, M., Shambaugh, J., & Taylor, A. (2004). The trilemma in history: Tradeoffs among

exchange rates, monetary policies, and capital mobility. NBER, Working Paper no. 10396.

National Bureau of Economic Research, Inc., Cambridge, MA.

Poirson, H. (2001). How do countries choose their exchange rate regime?. IMF Working Paper

no. 01/46. International Monetary Fund, Washington, DC.

Prasad, E., Rumbaug, T. & Wang, Q. (2005). Putting the cart before the horse? Capital

account liberalization and exchange rate flexibility in China. IMF Policy Discussion

Paper no. 05/1.

Quinn, D., & Inclan, C. (1997). The origins of financial openness: A study of current and capital

account liberalization. American Journal of Political Science, 41(3), 771–813.

Reinhart, C., & Rogoff, K. (2004). The modern history of exchange rate arrangements:

A reinterpretation. Quarterly Journal of Economics, 119(1), 1–48.

Rogoff, K., Husain, A., Mody, A., Brooks, R., & Oomes, N. (2004). Evolution and performance

of exchange rate regimes. IMF Ocassional Paper 229. International Monetary Fund,

Washington, DC.

Savvides, A. (1990). Real exchange rate variability and the choice of the exchange rate regime

by developing countries. Journal of International Money and Finance, 9, 440–454.

Shambaugh, J. C. (2004). The effect of fixed exchange rates on monetary policy. Quarterly

Journal of Economics, 119(1), 301–352.

Simmons, B., & Hainmueller, J. (2005). Can domestic institutions explain exchange rate regime

choice? The political economy of monetary institutions reconsidered. International

Finance 0505011, EconWPA.

Train, K. (2003). Discrete choice methods with simulation. Cambridge Books: Cambridge

University Press.

von Hagen, J., & Zhou, J. (2002). The choice of the exchange rate regimes: An empirical analysis

for transition economies. ZEI Working Paper no B02-03. University of Bonn. Center for

European Integration Studies (ZEI), Bonn, Germany.

von Hagen, J., & Zhou, J. (2005). The determination of capital controls: Which role do

exchange rate regimes play? Journal of Banking and Finance, 29, 227–248.


von Hagen, J., & Zhou, J. (2006). The interaction between capital controls and exchange rate

regimes: Evidence from developing countries. CEPR Discussion Paper no. 5537. CEPR

Discussion Papers.

Walker, R. (2003). Partisan substitution and international finance: Capital controls and exchange

rate regime choice in the OECD. Ph.D. thesis, Rochester University (draft), Rochester,

New York.

APPENDIX A. GHK SIMULATOR WITH

MAXIMUM LIKELIHOOD

As we mentioned in the main body of the chapter, we need to address fourcritical points when the GHK simulator is applied under the maximumlikelihood framework (Train, 2003, p. 133). First, the model has to benormalized for scale and level of utility to ensure that the parameters areidentified. Second, the GHK simulator takes utility differences againstthe regime for which the probability is being calculated, and so differentdifferences must be taken for countries choosing different regimes. Third,for a country choosing to peg their currency the GHK simulator usesthe covariance matrix On, for a country choosing a soft peg it uses Os

n,while for a country with a floating regime it needs Of

n . In other words, theelements of On are consistent with the elements of Os

n and Ofn in the sense that

the three matrices are derived from the O matrix. Fourth, the covariancematrices have to be positive definite. To address these issues we followed thenext steps:

1. To assure that the model is identified we need to start with the covariancematrix of the scaled utility differences with the differences taken againstthe peg regime, On. To assure positive definiteness of the covariancematrices, we have to parameterize the model in terms of the Choleskydecomposition of On

Ln ¼

c11 0 0

c21 c22 0

c31 c32 c33

264

375

Since On ¼ Ln � L0n then On is positive definite for any estimated values ofthe c0s.

RAUL RAZO-GARCIA246

2. The covariance matrix of the original residuals, O, is recovered using thefollowing lower-triangular matrix

L ¼

0 0 0 0

0 c11 0 0

0 c21 c22 0

0 c31 c32 c33

26664

37775

Therefore O ¼ L� L0. Be aware that the first row and column of thecovariance matrix will be equal to zero. This means that we aresubtracting from all the ERR residuals of country i in period tthe residual of the peg equation. With this O matrix we can derive Os

nand Of

n :

Osv ¼MsOM0s ¼ LsL

0sO

fv ¼MfOM0f ¼ LfL

0f

where Ls and Lf are the Cholesky factors of the covariance matrix wheneither a soft peg or a floating regime is chosen, respectively, and

Ms ¼

1 �1 0 0

0 �1 1 0

0 0 0 1

264

375 Mf ¼

1 0 �1 0

0 1 �1 0

0 0 0 1

264

375

This procedure takes into account the four issues mentioned above(Table B1).


APPENDIX B. ESTIMATION POOLING

ADVANCED AND EMERGING COUNTRIES

Table B1. Financial Account and Exchange Rate Regime Equations(1975–2006): Pooled Regression.

Advanced and Emerging Countries

Model 1 Model 2 Model 3 Model 4

y S.E. y S.E. y S.E. y S.E.

Financial account equation

Zsk �0.748� 0.079 �0.722� 0.209 �1.925� 0.081 �0.869 0.227

Zfk �0.523� 0.091 0.411�� 0.243 �1.040� 0.255 �0.145 0.560

Constant 0.422� 0.092 0.190 0.168 1.139� 0.094 0.410�� 0.234

FINDEV �0.043 0.035 0.082 0.091 �0.004 0.029 0.094 0.071

S*FINDEV �0.057 0.117 �0.053 0.096

F*FINDEV �0.134 0.110 �0.165� 0.084

RESERVES/M2 0.049 0.043 �0.384� 0.073 0.116� 0.036 �0.322� 0.106

S*(RESERVES/M2) 0.877� 0.105 0.831� 0.106

F*(RESERVES/M2) 0.442� 0.096 0.414� 0.122

GDP per capita 0.564� 0.059 0.775� 0.151 0.499� 0.040 0.731� 0.132

S*GDP per capita 0.131 0.178 0.173 0.163

F*GDP per capita �0.897� 0.179 �0.801� 0.137

OPENNESS 0.284� 0.044 0.551� 0.078 0.188� 0.035 0.499� 0.078

S*OPENNESS �0.645� 0.108 �0.632� 0.086

F*OPENNESS �0.487� 0.155 �0.493� 0.122

INFLATION �0.455� 0.072 �0.418 � 0.146 �0.277� 0.043 �0.357� 0.094

S*INFLATION �0.258 0.201 �0.290� 0.111

F*INFLATION 0.073 0.180 0.016 0.086

RELATIVE SIZE 0.055� 0.023 0.241 0.240 0.009 0.037 0.313� 0.112

S*RELATIVE SIZE 0.031 0.251 0.002 0.142

F*RELATIVE SIZE �0.190 0.242 �0.222�� 0.104

POLITY 0.058 0.044 �0.055 0.073 0.075� 0.026 �0.031 0.130

S*POLITY 0.020 0.099 �0.020 0.155

F*POLITY 0.371� 0.119 0.335�� 0.146

ASIA �0.748� 0.098 �2.529� 0.220 �0.676� 0.057 �2.606� 0.305

S*ASIA 2.100� 0.268 2.235 � 0.363

F*ASIA 2.496� 0.284 2.657� 0.372

EMERGING 0.359� 0.135 1.366� 0.277 0.389� 0.082 1.312� 0.210

S*EMERGING �0.818�� 0.357 �0.820� 0.206

F*EMERGING �2.592� 0.364 �2.505� 0.209

CRISIS 0.118 0.208 0.029 0.253 �0.407�� 0.209 0.286 0.283

Cutoffs

a2 0.470� 0.031 0.522� 0.035 0.395� 0.025 0.488� 0.029

a2 1.336� 0.053 1.490� 0.059 1.125� 0.041 1.428� 0.036

RAUL RAZO-GARCIA248

Table B1. (Continued )

Advanced and Emerging Countries

Model 1 Model 2 Model 3 Model 4

y S.E. y S.E. y S.E. y S.E.

Intermediate exchange rate regime equation

Constant 0.077 0.087 0.077 0.087 0.190� 0.056 0.217� 0.061

SPILL �0.296 0.716 �0.296 0.716 1.516� 0.392 1.085�� 0.493

SHARE 0.026 0.034 0.026 0.034 �0.096� 0.024 0.058�� 0.028

FINDEV 0.046 0.045 0.046 0.045 0.019 0.031 0.051�� 0.026

RESERVES/M2 0.073�� 0.042 0.073�� 0.042 0.242� 0.036 0.207� 0.039

GDP per capita �0.069 0.048 �0.069 0.048 �0.038 0.039 �0.001 0.034

OPENNESS 0.008 0.043 0.008 0.043 �0.114� 0.034 �0.083�� 0.036

INFLATION 0.225� 0.059 0.225� 0.059 0.253� 0.054 0.321� 0.042

RELATIVE SIZE �0.323� 0.087 �0.323� 0.087 �0.145� 0.053 �0.078 0.065

POLITY 0.034 0.040 0.034 0.040 0.071� 0.029 0.061�� 0.033

CRISIS �1.781� 0.214 �1.781� 0.214 �0.148 0.238 �0.652� 0.157

Floating exchange rate regime equation

Constant �1.224� 0.101 �1.224� 0.101 �2.465� 0.630 �1.097� 0.358

SPILL 3.762 � 0.795 3.762� 0.795 7.807� 2.340 4.391� 0.694

SHARE 0.015 0.038 0.015 0.038 0.080 0.082 0.026 0.061

FINDEV �0.065 0.052 �0.065 0.052 �0.133 0.113 �0.060 0.057

RESERVES/M2 0.231� 0.046 0.231� 0.046 0.690� 0.143 0.392� 0.089

GDP per capita 0.224� 0.058 0.224� 0.058 0.573� 0.153 0.268� 0.111

OPENNESS �0.427� 0.066 �0.427� 0.066 �1.280� 0.265 �0.630� 0.160

INFLATION �0.038 0.058 �0.038 0.058 �0.024 0.146 0.106 0.084

RELATIVE SIZE 0.446� 0.104 0.446� 0.104 1.033� 0.252 0.578� 0.164

POLITY 0.092�� 0.048 0.092�� 0.048 0.271� 0.104 0.202� 0.080

CRISIS 2.052� 0.200 2.052� 0.200 5.687� 1.234 2.795� 0.870

sff 5.565 1.268

sks 0.804 0.043

skf �0.077 0.293

ssf �0.503 �0.195

Memorandum

Observations 1552 1552 1552 1552

Endogeneity accounted for No No Yes Yes

log-likelihood �3094.08 �2979.08 �2757.20 �2655.40

Notes: �, ��, �� denote coefficients statistically different from zero at 1%, 5%, and 10%

significance levels, respectively.

Intermediate regimes include preannounced crawling peg, preannounced crawling band that is

narrower than or equal to �2%, de facto crawling peg, de facto crawling band that is narrower

than or equal to �2%, preannounced crawling band that is narrower than or equal to �2%, de

facto crawling band wider than or equal to �5%, moving band narrower than or equal to �2%(i.e., allows for both appreciation and depreciation over time), de facto crawling band that is

narrower than or equal �2%.

Floating regimes include managed floating and freely floating arrangements.


APPENDIX C. DATA

Table C1. Data sources.

Variable Source Definition or Transformation Units

POLITY Polity IV project Political regime Index (�10,10)

CPIa IFS Line 64 Consumer price index Index (2000=100)

INFLATIONb CPI Annual inflation D % over

previous yearM2c IFS Line 35 Money + quasi money National currency

RESERVES IFS Line 1L Total reserves – Gold U.S. dollars

EXPORTS IFS Line 90C Exports of goods and services National currency

IMPORTS IFS Line 98C Imports of goods and services National currency

GDP IFS Line 99 Gross domestic product National currency

FINDEV WDI and IFS M2/GDP %

OPENNESS WDI and IFS Exports plus imports over

GDP

%

RESM2 WDI and IFS Reserves/M2 %

GDP per capita WDI GDP per capita 2000 U.S.dollar

RELATIVE

SIZE

WDI Size relative to the U.S.

(GDP)

%

SHARE DOTSd Percentage of total exports

with main partner

%

ERR Reinhart and

Rogoff (2004)

de facto ‘‘Natural’’ exchange

rate regime classification

15 categories

CRISIS Authors’

calculation

=1 freely falling regime from

Reinhart and Rogoff

Dummy variable

BFOI Nancy Brune Financial openness index

(excluding exchange rate

regime)

Index (0,11)

IFS stands for International Financial Statistics.aFor missing observations, we use CPI from Global Financial Data.bThe inflation variable we use in the estimations is equal to ðp=1þ pÞ, where p is the annual

inflation.cFor Eurozone members, we use data from the Yearbook of International Financial Statistics.dDOTS stands for Direction of Trade Statistics (IMF).

RAUL RAZO-GARCIA250

Table C2. Classifications of Countries.

Advanced Countries (24)

Australia Greece Norway

Austria Iceland Portugal

Belgium Ireland San Marino

Canada Italy Spain

Denmark Japan Sweden

Finland Luxembourg Switzerland

France the Netherlands United Kingdom

Germany New Zealand United States

Emerging Markets Countries (32)

Argentina India Peru

Brazil Indonesia Philippines

Bulgaria Israel Poland

Chile Jordan Russian Federation

China Korea Rep. Singapore

Colombia Malaysia South Africa

Czech Republic Mexico Sri Lanka

Ecuador Morocco Thailand

Egypt Arab Rep. Nigeria Turkey

Hong Kong China Pakistan Venezuela RB

Hungary Panama


ESTIMATING A FRACTIONAL

RESPONSE MODEL WITH A COUNT

ENDOGENOUS REGRESSOR AND

AN APPLICATION TO FEMALE

LABOR SUPPLY

Hoa B. Nguyen

ABSTRACT

This chapter proposes M-estimators of a fractional response model withan endogenous count variable under the presence of time-constantunobserved heterogeneity. To address the endogeneity of the right-hand-side count variable, I use instrumental variables and a two-step procedureestimation approach. Two methods of estimation are employed: quasi-maximum likelihood (QML) and nonlinear least squares (NLS). Usingthese methods, I estimate the average partial effects, which are shownto be comparable across linear and nonlinear models. Monte Carlosimulations verify that the QML and NLS estimators perform better thanother standard estimators. For illustration, these estimators are usedin a model of female labor supply with an endogenous number of children.The results show that the marginal reduction in women’s working hoursper week is less as women have one additional kid. In addition, the effect





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)00000260012

253

dx.doi.org/10.1108/S0731-9053(2010)00000260012

of the number of children on the fraction of hours that a womanspends working per week is statistically significant and more significantthan the estimates in all other linear and nonlinear models considered inthe chapter.

1. INTRODUCTION

Many economic models employ a fraction or a percentage, instead of levelvalues, as a dependent variable. In these models, economic variables ofinterest occur in fractions such as employee participation rates in 401(k)pension plans, firm market shares, and fractions of hours spent workingper week. Even though fractional response models (FRMs) with the quasi-maximum likelihood estimation (QMLE) approach have been developed,they have not accounted for binary or count endogenous regressor. Thetraditional two-stage least squares approach, which accounts for theproblem of endogeneity, does not produce consistent estimators fornonlinear simultaneous equations models (Wooldridge, 2002). Maximumlikelihood (ML) techniques or two-stage method of moments are, therefore,needed (Terza, 1998; Wooldridge, 2002). In general, in simultaneousequations models in which the response variable or the endogenousregressor is a dummy, it is common to use the ML approach or two-stagemethod of moments because the computation task is not demanding.With a count endogenous regressor, which has more than two values of 0and 1, it is more difficult to specify and apply the ML approachfor many reasons. First of all, the ML approach in an FRM, whichrequires specifying the conditional distribution function, may lead to thepredicted values lying outside the unit interval. Second, in nonlinearmodels, the ML technique is more computationally intensive than thecorresponding QMLE approach under the presence of a count endogenousregressor. Third, it is desirable to specify a density function that admitsspecifications of different conditional moments and other distributioncharacteristics. These reasons motivate researchers to consider the methodof QMLE.

There are numerous studies that investigate the problem of count dataand binary endogenous regressor. Some researchers have proposed to usestandard assumptions such as a linear–exponential (LEF) specification.A Poisson or Negative Binomial distribution of the count responsevariable is considered in a model with binary endogenous regressors.

HOA B. NGUYEN254

Mullahy (1997) and Windmeijer and Santos-Silva (1997) use GeneralizedMethod of Moments (GMM) estimation based on an LEF specification anda set of instruments. Terza (1998) focuses on the two-stage method (TSM)and the weighted nonlinear least squares (WNLS) estimators using abivariate normal distributional assumption with respect to the joint distribu-tion of the unobserved components of the model. These models cannot beextended to count endogenous regressors unless a linear relationshipbetween the regressor and instruments as well as the error term is allowed.This means the count endogenous regressors are treated as a continuousvariable. The conditional mean of interest consequently has a closedform under this restricted assumption. Terza (1998) cited the Poisson/Negative Binomial full information maximum likelihood (FIML) but theestimation of these models is not carried out, nor are their propertiesexplored. In addition, the count variable in his model is a responsevariable instead of an explanatory variable. However, models with a countendogenous explanatory variable can benefit from his method. Theassumption of a bivariate normal distribution for unobserved componentssuggested in Terza (1998) can be relaxed to be non normal. Weiss (1999)considers simultaneous binary/count models where the unobserved hetero-geneity may be exponential gamma or normal so that the marginaldistribution of the dependent variable is Negative Binomial. However, theestimation of the model depends on the function of the unobservedheterogeneity which depends on unknown parameters in the distributionof the unobserved effect. The joint standard normal distribution of theunobserved components in simultaneous binary choice/count modelssuggested in Weiss (1999) is a restrictive assumption in an application toFRMs. Other alternative estimations are semi-nonparametric or nonpara-metric as introduced in Das (2005). However, his paper does not have anyapplication and the case of count endogenous variables is not thoroughlydiscussed. More importantly, semiparametric and nonparametricapproaches have a major limitation in this case. These approaches havean inability to estimate the partial effects or the average partial effects(APEs). If we are interested in estimating both the parameters and theAPEs, a parametric approach will be preferred.

In this chapter, I show how to specify and estimate FRMs with a countendogenous explanatory variable and a time-constant unobserved effect.Based on the work of Papke and Wooldridge (1996), I also use models forthe conditional mean of the fractional response in which the fitted value isalways in the unit interval. I focus on the probit response function sincethe probit mean function is less computationally demanding in obtaining

Estimating a Fractional Response Model 255

the APEs. In order to consistently estimate the effect of an endogenousexplanatory variable on the fractional response variable, I employinstrument variables to control for the endogeneity and then use the two-step estimation procedure. The count endogenous variable is assumed tohave a Poisson distribution. As discussed in Winkelmann (2000), the errorterm (unobserved heterogeneity) in a Poisson model can be presented interms of an additive correlated error or a multiplicative correlated error.However, the multiplicative correlated error has some advantage overthe additive correlated error on grounds of consistency. As a result, amultiplicative correlated error is used in this chapter. I focus on the methodof QMLE and NLS to get robust and efficient estimators.

This chapter is organized as follows. Section 2 introduces the specifica-tions and estimations of an FRM with a count endogenous explanatoryvariable and shows how to estimate parameters and the APEs using theQMLE and NLS approaches. Section 3 presents Monte Carlo simulations,and an application for the fraction of working hours for a female per weekwill follow in Section 4. Section 5 concludes.

2. THEORETICAL MODEL SPECIFICATION

AND ESTIMATION

For a 1� K vector of explanatory variables z1, the conditional mean modelis expressed as follows:

Eðy1jy2; z; a1Þ ¼ Fða1y2 þ z1d1 þ Z1a1Þ (1)

where FðÞ is a standard normal cumulative distribution function (cdf), y1 isa response variable ð0 � y1 � 1Þ, and a1 is a heterogeneous component.The exogenous variables are z ¼ ðz1; z2Þ where we need exogenous variablesz2 to be excluded from Eq. (1). z is a 1� L vector where L4K ; z2 is a vectorof instruments. y2 is a count endogenous variable where we assume that theendogenous regressor has a Poisson distribution:

y2jz; a1 � Poisson½expðzd2 þ a1Þ� (2)

then the conditional density of y2 is specified as follows:

f ðy2jz; a1Þ ¼½expðzd2 þ a1Þ�

y2 exp½� expðzd2 þ a1Þ�

y2!(3)

HOA B. NGUYEN256

where a1 is assumed to be independent of z, and expða1Þ is distributed asGammaðd0; 1=d0Þ using a single parameter d0, with Eðexpða1ÞÞ ¼ 1 andVarðexpða1ÞÞ ¼ 1=d0.

After a transformation (see Appendix D for the derivation), thedistribution of a1 is derived as follows

f ða1; d0Þ ¼dd00 ½expða1Þ�

d0 expð�d0 expða1ÞÞGðd0Þ

(4)

In order to get the conditional mean Eðy1jy2; zÞ, I specify the conditionaldensity function of a1. Using Bayes’ rule, it is:

f ða1jy2; zÞ ¼f ðy2ja1; zÞf ða1jzÞ

f ðy2jzÞ

Since y2jz; a1 has a Poisson distribution and expða1Þ has a gammadistribution, y2jz is Negative Binomial II distributed, as a standardresult.

After some algebra, the conditional density function of a1 is

f ða1jy2; zÞ ¼exp½P�½d0 þ expðzd2Þ�

ðy2þd0Þ

Gðy2 þ d0Þ(5)

where P ¼ � expðzd2 þ a1Þ þ a1ðy2 þ d0Þ � d0 expða1Þ.The conditional mean Eðy1jy2; zÞ, therefore, will be obtained as

follows:

Eðy1jy2; zÞ ¼

Z þ1�1

Fða1y2 þ z1d1 þ Z1a1Þf ða1jy2; zÞda1 ¼ mðy; y2;zÞ (6)

where f ða1jy2; zÞ is specified as above and h ¼ ða1; d1; Z1Þ.The estimators h of Eq. (6) are obtained by the QMLE or the NLS

approach.The Bernoulli log-likelihood function is given by

liðhÞ ¼ y1i ln mi þ ð1� y1iÞ lnð1� miÞ (7)

The QMLE of h is obtained from the maximization problem (see moredetails in Appendix A):

maxh2Y

Xni¼1

liðhÞ


The NLS estimator of h is attained from the minimization problem (seemore details in Appendix C):

minh2Y

N�1XNi¼1

½y1i � miðh; y2i;ziÞ�2=2

Since Eðy1jy2; zÞ does not have a closed-form solution, it is necessary to usea numerical approximation. The numerical routine for integrating out theunobserved heterogeneity in the conditional mean Eq. (6) is based on theAdaptive Gauss–Hermite quadrature. This adaptive approximation has provento be more accurate with fewer points than the ordinary Gauss–Hermiteapproximation. The quadrature locations are shifted and scaled to be under thepeak of the integrand. Therefore, the adaptive quadrature is performed wellwith an adequate amount of points (Skrondal & Rabe-Hesketh, 2004).

Using the Adaptive Gauss–Hermite approximation, the above integralEq. (6) can be obtained as follows:

mi ¼Z þ1�1

hiðy2i; zi; a1Þda1 �ffiffiffi2p bsiX

M

m¼1

w�m expfða�mÞ2ghiðy2; zi;

ffiffiffi2p bsia�m þ bwiÞ

(8)

where bsi and bwi are the adaptive parameters for observation i, w�m are theweights and a�m are the evaluation points, and M is the number ofquadrature points. The approximation procedure follows Skrondal andRabe-Hesketh (2004). The adaptive parameters bsi and bwi are updated in thekth iteration of the optimization for mi with

mi;k �XMm¼1

ffiffiffi2p

si;k�1w�m expfða�mÞ2ghiðy2i;zi;

ffiffiffi2p

si;k�1a�m þ oi;k�1Þ

oi;k ¼XMm¼1

ðti;m;k�1Þ

ffiffiffi2p

si;k�1w�m expfða�mÞ2ghiðy2i;zi; ti;m;k�1Þ

mi;k

si;k ¼XMm¼1

ðti;m;k�1Þ2

ffiffiffi2p

si;k�1w�m expfða�mÞ2ghiðy2i;zi; ti;m;k�1Þ

mi;k� ðoi;kÞ

2

where

ti;m;k�1 ¼ffiffiffi2p

si;k�1a�m þ oi;k�1

This process is repeated until si;k and oi;k have converged for thisiteration at observation i of the maximization algorithm. This adaptation is

HOA B. NGUYEN258

applied to every iteration until the log-likelihood difference from the lastiteration is less than a relative difference of 1e�5; after this adaptation, theadaptive parameters are fixed.

Once the evaluation of the conditional mean has been done for allobservations, the numerical values can be passed on to a maximizer in orderto find the QMLE h. The standard errors in the second stage are adjusted forthe first-stage estimation and obtained using the delta method (see AppendixA for derivation). Since the QMLE and NLS estimators in this chapterfall into the class of two-step M estimators, it is shown that these estimatorsare consistent and asymptotically normal (see Newey & McFadden, 1994;Wooldridge, 2002, Chapter 12).

2.1. Estimation Procedure

(i) Estimate d2 and d0 by using the step wise ML of yi2 on zi in the NegativeBinomial model. Obtain the estimated parameters d2 and d0.

(ii) Use the fractional probit QMLE (or NLS) of yi1 on yi2; zi1 to estimatea1; d1, and Z1 with the approximated conditional mean. The conditionalmean is approximated using the estimated parameters in the first stepand using the Adaptive Gauss–Hermite method.

After getting all the estimated parameters, h ¼ ða1; d1; Z1Þ0, the standard

errors for these parameters using the delta method can be derived with thefollowing formula:

dAvarðhÞ ¼

1

NA�1

1 N�1XNi¼1

ri1r0

i1

!A�1

1

For more details, see the derivation and matrix notation from Eq. (A.3) toEq. (A.10) in Appendix A.

2.2. Average Partial Effects

Econometricians are often interested in estimating the APEs of explanatoryvariables in nonlinear models in order to get comparable magnitudes withother nonlinear models and linear models. The APEs can be obtained bytaking the derivatives or the differences of a conditional mean equation withrespect to the explanatory variables. The APE cannot be estimated with the


presence of unobserved effect. It is necessary to integrate out the unobservedeffect in the conditional mean and take the average across the sample.Then we will take the derivatives or changes with respect to the elementsof ðy2; z1Þ.

In an FRM with all exogenous covariates, the model (1) with y2exogenous (see Papke & Wooldridge, 2008) is considered.

Let w ¼ ðy2; z1Þ, Eq. (1) is rewritten as

Eðy1ijwi; a1iÞ ¼ Fðwibþ Z1a1iÞ

where a1ijwi � normalð0;s2aÞ and then

Eðy1ijwiÞ ¼ FðwibaÞ

in which ba ¼ b=ffiffiffiffiffiffiffiffiffiffiffiffiffi1þ s2a

p. The APEs are consistently estimated by:

N�1XNi¼1

FðwibaÞ ¼ N�1XNi¼1

Fða1ay2i þ z1id1aÞ

Given the consistent estimator of the scaled parameters ba, the APEscan be estimated by taking the derivatives or changes with respect to theelements of ðy2; z1Þ of

N�1XNi¼1

Fða1ay2i þ z1id1aÞ

For a continuous z11, the APE is

N�1XNi¼1

d11afða1ay2i þ z1id1aÞ

For a count variable y2, the APE is

N�1XNi¼1

½Fða1ay21i þ z1id1aÞ � Fða1ay20i þ z1id1aÞ�

For example, if we are interested in obtaining the APE when y2 changesfrom y20i ¼ 0 to y21i ¼ 1, it is necessary to predict the difference in meanresponses with y2 ¼ 1 and y2 ¼ 0 and average the difference across all units.

In an FRM with a count endogenous variable, the model (1) is consideredwith the estimation procedure provided in the previous section. The APEsare obtained by taking the derivatives or the differences in

Ea1 ½Fða1y2 þ z1d1 þ Z1a1Þ� (9)

HOA B. NGUYEN260

with respect to the elements of ðy2; z1Þ. Since we integrate out a1 and get theconditional mean as in Eq. (6), the estimator for the conditional mean is

N�1XNi¼1

miðy; y2i; z1iÞ ¼ N�1XNi¼1

Z þ1�1

Fða1y2i þ z1id1 þ Z1a1Þf ða1jy2i; z1iÞda1

(10)

For a continuous z11, the APE is

w ¼ E

Z þ1�1

fðgihÞf ða1jy2i; ziÞda1

� �d11 (11a)

and it is consistently estimated by

w ¼ N�1XNi¼1

Z þ1�1


!d11 (11b)

where gi ¼ ðy2i; z1i; a1Þ and h ¼ ða1; d1; Z1Þ0.

For a count variable y2, its APE is

k ¼ Ea1 ½Fða1ykþ12 þ z1d1 þ Z1a1Þ � Fða1yk2 þ z1d1 þ Z1a1Þ� (12a)

and it is consistently estimated by

bk ¼ N�1XNi¼1

Z þ1�1

Fðgkþ1i hÞf ða1jy2i; ziÞda1 �

Z þ1�1

Fðgki hÞf ða1jy2i; ziÞda1

� �

(12b)

For example, in order to get the APE when y2 changes from yk2 ¼ 0 toykþ12 ¼ 1, it is necessary to predict the difference in the mean responseswith yk2 ¼ 0 and ykþ12 ¼ 1 and take the average of the differences across allunits. This APE gives us a number comparable to the linear model estimate.

The standard errors for the APEs will be obtained using the deltamethod. The detailed derivation is provided from Eq. (A.11) to (A.29) inAppendix A.

3. MONTE CARLO SIMULATIONS

This section examines the finite sample properties of the QML and NLSestimators of the population averaged partial effect in an FRM with a countendogenous variable. Some Monte Carlo experiments are conducted to


compare these estimators with other estimators under different scenarios.These estimators are evaluated under correct model specification withdifferent degrees of endogeneity, with strong and weak instrumentalvariables, and with different sample sizes. The behavior of these estimatorsis also examined with respect to a choice of a particular distributionalassumption.

3.1. Estimators

Two sets of estimators under two corresponding assumptions areconsidered: (1) y2 is assumed to be exogenous and (2) y2 is assumed to beendogenous. Under the former assumption, three estimators are used: theordinary least squares (OLS) estimator in a linear model, the maximumlikelihood estimator (MLE) in a Tobit model, and the quasi-maximumlikelihood estimator (QMLE) in a fractional probit model. Under the latterassumption, five estimators are examined: the two-stage least squares (2SLS)estimator, the maximum likelihood estimator (MLE) in a Tobit modelusing the Blundell–Smith estimation method (hereafter the Tobit BS), theQMLE in a fractional probit model using the Papke–Wooldridge estimationmethod (hereafter the QMLE-PW), the QMLE and the NLS estimatorsin a fractional probit model using the estimation method proposed in theprevious section.

3.2. Data-Generating Process

The count endogenous variable is generated from a conditional Poissondistribution:

f ðy2ijx1i;x2i; zi; a1iÞ ¼expð�liÞl

y2ii

y2i!

with a conditional mean:

li ¼ Eðy2ijx1i;x2i; zi; a1iÞ ¼ expðd21x1i þ d22x2i þ d23zi þ r1a1iÞ

using independent draws from normal distributions: z � Nð0; 0:32Þ;x1 �Nð0; 0:22Þ;x2 � Nð0; 0:22Þ and expða1Þ � Gammað1; 1=d0Þ where 1 and 1=d0are the mean and variance of a gamma distribution.

Parameters in the conditional mean model are set to be: ðd21; d22; d23;r1; d0Þ ¼ ð0:01; 0:01; 1:5; 1; 3Þ.

HOA B. NGUYEN262

The dependent variable is generated by first drawing a binomial randomvariable x with n trials and a probability p and then y1 ¼ x=n. In thissimulation, n=100 and p comes from a conditional normal distribution withthe conditional mean:

p ¼ Eðy1ijy2i;x1i;x2i; a1iÞ ¼ Fðd11x1i þ d12x2i þ a1y2i þ Z1a1iÞ

Parameters in this conditional mean are set at: ðd11; d12; a1; Z1Þ ¼ð0:1; 0:1;�1:0; 0:5Þ.

Based on the population average values of the parameters set above,the true value of the APE with respect to each variable is obtained.Since exogenous variables are continuous variables and in order to comparenonlinear models with linear models, y2 is first treated as a continuousvariable. The true value of the APE with respect to y2 is obtained bycomputing the derivatives of the conditional mean with respect to y2 andtaking the average as follows:

APE ¼ �1:0�1

N

XNi¼1

fð0:1� x1i þ 0:1� x2i � 1:0� y2i þ 0:5� a1iÞ

Now when y2 is allowed to be a count variable, the true values of theAPEs with respect to y2 are computed by taking differences in the condi-tional mean. The true values of the APEs are computed at interesting values.In this chapter, I will take the first three examples when y2 increases from0 to 1, 1 to 2, and 2 to 3, respectively, and the true values of the APEs are

APE01 ¼1

N

XNi¼1

Fð0:1� x1i þ 0:1� x2i � 1:0� 1þ 0:5� a1iÞ

�Fð0:1� x1i þ 0:1� x2i � 1:0� 0þ 0:5� a1iÞ

" #

APE12 ¼1

N

XNi¼1

Fð0:1� x1i þ 0:1� x2i � 1:0� 2þ 0:5� a1iÞ

�Fð0:1� x1i þ 0:1� x2i � 1:0� 1þ 0:5� a1iÞ

" #

APE23 ¼1

N

XNi¼1

Fð0:1� x1i þ 0:1� x2i � 1:0� 3þ 0:5� a1iÞ

�Fð0:1� x1i þ 0:1� x2i � 1:0� 2þ 0:5� a1iÞ

" #

The reported true values of the APEs with respect to y2 and otherexogenous variables are presented in Tables 1–4. The experiment isconducted with 500 replications and the sample size is normally set at1,000 observations.


3.3. Experiment Results

I report sample means, sample standard deviations (SD), and root meansquared errors (RMSE) of these 500 estimates. In order to compareestimators across linear and nonlinear models, I am interested in comparingthe APE estimates from different models.

3.3.1. Simulation Result with a Strong Instrumental VariableTables 1A–1C report the simulation outcomes of the APE estimates for thesample size N=1,000 with a strong instrumental variable (IV) and differentdegrees of endogeneity, where Z1 ¼ 0:1; Z1 ¼ 0:5, and Z1 ¼ 0:9. The IV isstrong in the sense that the first-stage F-statistic is large (the coefficient on zis d23 ¼ 1:5 in the first stage, which leads to a large F-statistic; the F-statistichas a mean at least equivalent to 100 in 500 replications for all three designsof Z1). Three different values of Z1 are selected which correspond to low,medium, and high degrees of endogeneity. Columns 2–10 contain thetrue values of the APE estimates and the means, SD, and RMSE of theAPE estimates from different models with different estimation methods.Columns 3–5 contain the means, SD, and RMSE of the APE estimatesfor all variables from 500 replications with y2 assumed to be exogenous.Columns 6–10 contain the means, SD, and RMSE of the APE estimates forall variables from 500 replications with y2 allowed to be endogenous.

First, the simulation outcomes for the sample size N=1,000 and Z1 ¼ 0:5(see Table 1A.1) are discussed. The APE estimates using the proposedmethods of QMLE and NLS in columns 9–10 are closest to the true valuesof the APEs when y2 is discrete (�.3200, �.1273, and �.0212). The APEestimate is also very close to the true value of the APE (�.2347) when y2 istreated as a continuous variable. It is typical to get these first three APEs asexamples in order to see the pattern of the means, SD, and RMSE of theAPE estimates. Table 1A.1 shows that the OLS estimate is about a half ofthe true value of the APE. The first source of large bias in the OLS estimateis the endogeneity of the count variable y2 (with Z1 ¼ 0:5). The secondsource of bias in the OLS estimate is the nonlinearity in both the structuraland first-stage Eqs. (1) and (2). The 2SLS approach also produces a biasedestimator of the APE because of the second reason mentioned in theOLS estimator even though the endogeneity is taken into account. TheMLE estimators in the Tobit model have smaller bias than the estimators inthe linear model but larger bias than the estimators in the fractional probitmodel because they do not consider the functional form of the fractionalresponse variable and the count explanatory variable. When the endogeneity

HOA B. NGUYEN264

Table 1A.1. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:5,500 Replications).

Model True

Value

Linear Tobit Fractional

Probit

Linear Tobit BS Fractional

Probit

Fractional

Probit

Fractional

Probit

Estimation

method

APE OLS MLE QMLE 2SLS MLE QMLE-PW NLS QMLE

y2 is assumed exogenous y2 is assumed endogenous

y2 Continuous �.2347 �.1283 �.1591 �.2079 �.1583 �.1754 �.2295 �.2368 �.2371

(.0046) (.0042) (.0051) (.0110) (.0064) (.0077) (.0051) (.0050)

{.0034} {.0024} {.0008} {.0024} {.0019} {.0002} {.00008} {.00008}

y2 Discrete 0–1 �.3200 �.2014 �.2763 �.2262 �.3109 �.3201 �.3204

(.0046) (.0051) (.0082) (.0099) (.0041) (.0030)

{.0038} {.0014} {.0030} {.0003} {.00005} {.00001}

1–2 �.1273 �.1610 �.1193 �.1716 �.1258 �.1280 �.1278

(.0027) (.0017) (.0031) (.0023) (.0020) (.0016)

{.0011} {.0002} {.0014} {.00005} {.00004} {.00001}

2–3 �.0212 �.0388 �.0259 �.0317 �.0224 �.0214 �.0212

(.0030) (.0012) (.0031) (.0014) (.0014) (.0010)

{.0006} {.0001} {.0003} {.00004} {.00001} {.000001}

x1 .0235 .0224 .0210 .0223 .0237 .0212 .0231 .0240 .0238

(.0181) (.0125) (.0130) (.0189) (.0131) (.0142) (.0159) (.0140)

x2 .0235 .0218 .0214 .0195 .0230 .0218 .0241 .0243 .0244

(.0181) (.0128) (.0129) (.0192) (.0131) (.0134) (.0153) (.0136)

Estim

atin

gaFractio

nalResp

onse

Model

265

Table 1A.2. Simulation Result of the Coefficient Estimates (N=1,000, Z1 ¼ 0:5, 500 Replications).

Model True Value Linear Tobit Fractional

Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method

Coefficient OLS MLE QMLE 2SLS MLE QMLE-PW NLS QMLE


y2 �1.0 �.1283 �.2024 �.8543 �.1583 �.2275 �.9387 �1.045 �1.044

(.0044) (.0046) (.0146) (.0089) (.0084) (.0255) (.0483) (.0424)

x1 .1 .0224 .0267 .0917 .0237 .0275 .0945 .1061 .1052

(.0181) (.0160) (.0534) (.0190) (.0171) (.0578) (.0702) (.0619)

x2 .1 .0218 .0272 .0956 .0231 .0282 .0987 .1071 .1073

(.0181) (.0163) (.0534) (.0192) (.0170) (.0548) (.0681) (.0600)

Note: Figures in brackets (){} are standard deviation and RMSE, respectively.

HOA

B.NGUYEN

266

Table 1B. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:1, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2461 �.1507 �.1854 �.2402 �.1600 �.1887 �.2442 �.2490 �.2491

(.0042) (.0037) (.0046) (.0102) (.0053) (.0056) (.0044) (.0043)

{.0031} {.0019} {.0002} {.0027} {.0018} {.0001} {.0001} {.0001}

y2 Discrete 0–1 �.3383 �.2390 �.3289 �.2445 �.3355 �.3384 �.3385

(.0042) (.0031) (.0066) (.0051) (.0019) (.0020)

{.0032} {.0003} {.0030} {.00009} {.000005} {.000007}

1–2 �.1332 �.2001 �.1319 �.2022 �.1331 �.1332 �.1332

(.0018) (.0011) (.0025) (.0013) (.0008) (.0010)

{.0021} {.00004} {.0022} {.000004} {.000001} {.000001}

2–3 �.0208 �.0193 �.0219 �.0177 �.0212 �.0208 �.0208

(.0029) (.0007) (.0032) (.0008) (.0007) (.0007)

{.00005} {.00003} {.0001} {.00001} {.000001} {.000001}

x1 .0246 .0267 .0210 .0250 .0265 .0234 .0250 .0255 .0253

(.0168) (.0089) (.0063) (.0170) (.0090) (.0065) (.0072) (.0066)

x2 .0246 .0241 .0214 .0246 .0242 .0222 .0246 .0252 .0249

(.0178) (.0100) (.0070) (.0182) (.0100) (.0070) (.0077) (.0072)

Estim

atin

gaFractio

nalResp

onse

Model

267

Table 1C. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:9, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2178 �.1104 �.1368 �.1777 �.1548 �.1637 �.2144 �.2208 �.2205

(.0042) (.0039) (.0054) (.0148) (.0096) (.0124) (.0052) (.0045)

{.0034} {.0026} {.0013} {.0020} {.0017} {.0001} {.0001} {.0001}

y2 Discrete 0–1 �.2973 �.1706 �.2307 �.2100 �.2871 �.3000 �.2994

(.0049) (.0069) (.0136) (.0196) (.0054) (.0040)

{.0040} {.0021} {.0028} {.0003} {.00008} {.00006}

1–2 �.1281 �.1303 �.1110 �.1491 �.1232 �.1288 �.1291

(.0031) (.0024) (.0060) (.0047) (.0025) (.0020)

{.00007} {.0005} {.0007} {.0002} {.00002} {.00003}

2–3 �.0253 �.0532 �.0319 �.0452 �.0258 �.0247 �.0249

(.0022) (.0019) (.0030) (.0023) (.0021) (.0015)

{.0009} {.0002} {.0006} {.00002} {.00002} {.00001}

x1 .0218 .0327 .0276 .0291 .0263 .0273 .0305 .0318 .0313

(.0222) (.0176) (.0182) (.0169) (.0184) (.0201) (.0237) (.0208)

x2 .0218 .0215 .0212 .0236 .0244 .0199 .0233 .0200 .0203

(.0201) (.0170) (.0179) (.0184) (.0187) (.0206) (.0216) (.0187)


HOA

B.NGUYEN

268

is corrected, the MLE estimator in Tobit model using Blundell–Smithmethod has smaller bias than the counterpart where y2 is assumed to beexogenous. Among the fractional probit models, the QMLE estimator, wherey2 is assumed to be exogenous, (column 5) has the largest bias because itignores the endogeneity of y2. However, it still has a smaller bias than otherestimators of the linear and Tobit models. The QMLE-PW estimator(column 8) provides useful result because its estimates are also very close tothe true values of the APEs but it produces a larger bias than the QMLE andNLS estimators proposed in this chapter. Similar to the MLE estimator inTobit model using Blundell–Smith method, the QMLE-PW estimator adoptsthe control function approach. This approach utilizes the linearity in the first-stage equation. As a result, it ignores the discreteness in y2 which leads to thelarger bias than the QMLE and NLS estimators proposed in this chapter.

The first set of estimators with y2 assumed to be exogenous (columns 3–5)has relatively smaller SDs than the second set of estimators with y2 allowedto be endogenous (columns 6–10) because the methods that correct forendogeneity using IVs have more sampling variation than their counterpartswithout endogeneity correction. This results from the less-than-unitcorrelation between the instrument and the endogenous variable. However,the SDs of the QMLE and NLS estimators (columns 9–10) are no worsethan the QMLE estimator, where y2 assumed to be exogenous (column 5).

Among all estimators, QMLE and NLS estimators proposed in thischapter have the smallest RMSE, not only for the case where y2 is allowedto be a discrete variable but also for the case where y2 is treated as acontinuous variable using the correct model specification. As discussedpreviously, the QMLE estimator using Papke–Wooldridge method has thethird smallest RMSE since it also uses the same fractional probit model.Comparing columns 3 and 6, 4 and 7, and 5 and a set of all columns 8–10, itis found that the RMSEs of the methods correcting for endogeneity aresmaller than those of their counterparts.

Table 1A.2 reports simulation result for coefficient estimates. Thecoefficient estimates are useful in the sense that it gives the directions ofthe effects. For studies which only require exploring the signs of the effects,the coefficient tables are necessary. For studies which require comparing themagnitudes of the effects, we essentially want to estimate the APEs. Table1A.2 shows that the means of point estimates are close to their true values forall parameters using the QML (or the NLS) approach (a1 ¼ �1:0; d11 ¼ :1and d12 ¼ :1). The bias is large for both 2SLS and OLS methods. Theseresults are as expected because the 2SLS method uses the predicted valuefrom the first-stage OLS so it ignores the distributional information of the


right-hand-side (RHS) count variable, regardless of the functional form ofthe fractional response variable. The OLS estimates do not carry theinformation of endogeneity. Both the 2SLS and OLS estimates are biasedbecause they do not take into account the presence of unobservedheterogeneity. The bias for a Tobit Blundell–Smith model is similar to thebias with the 2SLS method because it does not take into account thedistributional information of the RHS count variable, and it employsdifferent functional form given the fact that the fractional response variablehas a small number of zeros. The biases for both the QMLE estimatortreating y2 as an exogenous variable and for the QMLE-PW estimator arelarger than those of the QMLE and NLS estimators in this chapter. In short,simulation results indicate that the means of point estimates are close to theirtrue values for all parameters using the QMLE and the NLS approachmentioned in the previous section.

Simulations with different degrees of endogeneity through the coefficientZ1 ¼ 0:1 and Z1 ¼ 0:9 are also conducted. Not surprisingly, with lessendogeneity, Z1 ¼ 0:1, the set of the estimators treating y2 as an exogenousvariable produces the APE estimates closer to the true values of the APEestimates; the set of the estimators treating y2 as an endogenous variablehas the APE estimates further from the true values of the APE estimates.With more endogeneity, Z1 ¼ 0:9, the set of the estimators treating y2 as anendogenous variable has the APE estimates getting closer to the true valuesof the APE estimates; and the set of the estimators treating y2 as anexogenous variable gives the APE estimates further from the true values ofthe APE estimates. As an example, it is noted that, as Z1 increases, the APEestimates of the 2SLS method are less biased, whereas the APE estimates ofthe QMLE estimator treating y2 as an exogenous variable are more biasedand the difference between these two APE estimates is smaller since theendogeneity is corrected.

All other previous discussions on the bias, SD and RMSE still hold withZ1 ¼ 0:1 and Z1 ¼ 0:9. It confirms that the QMLE and NLS estimatorsperform very well under different degrees of endogeneity.

3.3.2. Simulation Result with a Weak Instrumental VariableTable 2 reports the simulation outcomes of the APE estimates for thesample size N=1,000 with a weak IV and Z1 ¼ 0:5. Using the rule of thumbon a weak instrument (suggested in Staiger & Stock, 1997), the coefficient onz is chosen as d23 ¼ 0:3 which corresponds to a very small first-stageF-statistic (the mean of the F-statistic is less than 10 in 500 replications).Columns 2–10 contain the true values of the APE estimates, and the means,

HOA B. NGUYEN270

Table 2. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:5; 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2402 �.1352 �.1618 �.2094 �.1661 �.1823 �.2327 �.2405 �.2407

(.0047) (.0037) (.0050) (.0621) (.0363) (.0366) (.0046) (.0043)

{.0033} {.0025} {.0010} {.0023} {.0023} {.0002} {.00001} {.00001}

y2 Discrete 0–1 �.3202 �.1992 �.2724 �.2301 �.3157 �.3199 �.3202

(.0044) (.0055) (.0522) (.0099) (.0037) (.0029)

{.0038} {.0015} {.0018} {.0001} {.00001} {.000001}

1–2 �.1275 �.1605 �.1195 �.1676 �.1248 �.1280 �.1279

(.0026) (.0016) (.0189) (.0023) (.0016) (.0015)

{.0010} {.0003} {.0029} {.00009} {.00002} {.00001}

2–3 �.0213 �.0386 �.0268 �.0332 �.0227 �.0215 �.0214

(.0027) (.0011) (.0108) (.0014) (.0012) (.0010)

{.0005} {.0002} {.0013} {.00005} {.000001} {.000003}

x1 .0240 .0224 .0114 .0117 .0237 .0235 .0253 .0261 .0252

(.0181) (.0118) (.0130) (.0189) (.0225) (.0142) (.0146) (.0127)

x2 .0240 .0218 .0104 .0109 .0230 .0227 .0227 .0244 .0250

(.0181) (.0123) (.0129) (.0192) (.0244) (.0134) (.0135) (.0119)


Estim

atin

gaFractio

nalResp

onse

Model

271

SD, and RMSE of the APE estimates from different models with differentestimation methods. Columns 3–5 contain the means, SD, and RMSE of theAPE estimates for all variables from 500 replications, with y2 assumed to beexogenous. Columns 6–10 the means, SD contain and RMSE of the APEestimates for all variables from 500 replications with y2 allowed to beendogenous. The simulation results show that, even though the instrumentis weak, the set of estimators assuming y2 endogenous still has smaller biasthan the set of estimators assuming y2 exogenous. The QMLE and NLSAPE estimates are still very close to the true values of the APEs for bothcases in which y2 is treated to be a continuous variable and y2 is allowed tobe a count variable. Their SD and RMSE are still the lowest among theestimators considering y2 as endogenous.

3.3.3. Simulation Result with Different Sample SizesFour sample sizes are chosen to represent those commonly encountered sizesin applied research. These range from small to large sample sizes: N=100,500, 1,000, and 2,000, respectively. Tables 3A–3D report the simulationoutcomes of the APE estimates with a strong IV, Z1 ¼ 0:5, for samplesizes N=100, 500, 1,000, and 2,000, respectively. Table 3C is equivalent toTable 1A.1. Columns 2–10 contain the true values of the APE estimatesand the means, SD and RMSE of the APE estimates from different modelswith different estimation methods. Columns 3–5 contain the means, SD, andRMSE of the APE estimates for all variables from 500 replications with y2assumed to be exogenous. Columns 6–10 contain the means, SD, andRMSE of the APE estimates for all variables from 500 replications with y2allowed to be endogenous. In general, the simulation results indicate thatthe SD and RMSE for all estimators are smaller for larger sample sizes.Previous discussion as in 3.2.1 is still applied. The QMLE and NLSestimators perform very well in all sample sizes with the smallest SD andRMSE. They are also the least biased estimators among all the estimators inthis discussion.

3.3.4. Simulation Result with a Particular Distributional AssumptionThe original assumption is that expða1Þ � Gammað1; 1=d0Þ. However,misspecification is dealt with in this part. The distribution of expða1Þ isno longer gamma; instead, a1 � Nð0; 0:12Þ is assumed. The finite samplebehavior of all the estimators in this incorrect specification is examined.Table 4 shows the simulation results for the sample size N=1,000 witha strong IV and Z1 ¼ 0:5 under misspecification. All of the previousdiscussions under the correct specification as in 3.2.1 are not affected.

HOA B. NGUYEN272

Table 3A. Simulation Result of the Average Partial Effects Estimates (N=100, Z1 ¼ 0:5; 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2350 �.1419 �.1695 �.2180 �.1688 �.1837 �.2340 �.2371 �.2366

(.0221) (.0193) (.0216) (.0667) (.0356) (.0253) (.0166) (.0162)

{.094} {.0066} {.0017} {.0066} {.0051} {.0001} {.0002} {.00017}

y2 Discrete 0–1 �.3281 �.2173 �.3000 �.2386 �.3277 �.3288 �.3281

(.0253) (.0339) (.0492) (.0457) (.0137) (.0122)

{.0111} {.0028} {.0090} {.00004} {.00005} {.00004}

1–2 �.1308 �.1767 �.1252 �.1840 �.1305 �.1300 �.1306

(.0222) (.0096) (.0257) (.0139) (.0073) (.0063)

{.0046} {.0006} {.0053} {.00004} {.00009} {.00004}

2–3 �.0214 �.0316 �.0243 �.0286 �.0217 �.0211 �.0213

(.0129) (.0034) (.0134) (.0041) (.0036) (.0027)

{.0010} {.0003} {.0007} {.00003} {.00003} {.00001}

Estim

atin

gaFractio

nalResp

onse

Model

273

Table 3B. Simulation Result of the Average Partial Effects Estimates (N=500, Z1 ¼ 0:5, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2358 �.1415 �.1710 �.2201 �.1617 �.1815 �.2334 �.2379 �.2376

(.0177) (.0157) (.0163) (.0175) (.0114) (.0119) (.0051) (.0086)

{.0046} {.0034} {.0007} {.0041} {.0029} {.0001} {.0001} {.0001}

y2 Discrete 0–1 �.3285 �.2190 �.3026 �.2351 �.3241 �.3293 �.3290

(.0219) (.0311) (.0153) (.0192) (.0109) (.0106)

{.0059} {.0012} {.0052} {.0002} {.00001} {.00002}

1–2 �.1309 �.1782 �.1259 �.1847 �.1300 �.1310 �.1311

(.0205) (.0082) (.0158) (.0058) (.0044) (.0043)

{.0028} {.0002} {.0031} {.00004} {.00001} {.000004}

2–3 �.0214 �.0309 �.0240 �.0267 �.0219 �.0212 �.0213

(.0109) (.0025) (.0084) (.0018) (.0017) (.0014)

{.0004} {.0001} {.0002} {.00002} {.00001} {.000004}


HOA

B.NGUYEN

274

Table 3C. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:5, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2347 �.1283 �.1591 �.2079 �.1583 �.1754 �.2295 �.2368 �.2371

(.0046) (.0042) (.0051) (.0110) (.0064) (.0077) (.0051) (.0050)

{.0034} {.0024} {.0008} {.0024} {.0019} {.0002} {.00008} {.00008}

y2 Discrete 0–1 �.3200 �.2014 �.2763 �.2262 �.3109 �.3201 �.3204

(.0046) (.0051) (.0082) (.0099) (.0041) (.0030)

{.0038} {.0014} {.0030} {.0003} {.00016} {.00001}

1–2 �.1273 �.1610 �.1193 �.1716 �.1258 �.1280 �.1278

(.0027) (.0017) (.0031) (.0023) (.0020) (.0016)

{.0011} {.0002} {.0014} {.00005} {.00004} {.00001}

2–3 �.0212 �.0388 �.0259 �.0317 �.0224 �.0214 �.0212

(.0030) (.0012) (.0031) (.0014) (.0014) (.0010)

{.0006} {.0001} {.0003} {.00004} {.00001} {.000001}

Estim

atin

gaFractio

nalResp

onse

Model

275

Table 3D. Simulation Result of the Average Partial Effects Estimates (N=2,000, Z1 ¼ 0:5, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2347 �.1286 �.1591 �.2080 �.1591 �.1755 �.2293 �.2369 �.2371

(.0028) (.0028) (.0031) (.0082) (.0044) (.0050) (.0031) (.0030)

{.0024} {.0017} {.0006} {.0017} {.0013} {.0001} {.00005} {.00006}

y2 Discrete 0–1 �.3201 �.2014 �.2766 �.2263 �.3106 �.3201 �.3204

(.0034) (.0036) (.0059) (.0074) (.0029) (.0021)

{.0027} {.0010} {.0021} {.0002} {.000001} {.000007}

1–2 �.1275 �.1609 �.1194 �.1717 �.1258 �.1281 �.1278

(.0020) (.0012) (.0024) (.0017) (.0015) (.0011)

{.0008} {.0002} {.0010} {.00004} {.00001} {.000009}

2–3 �.0213 �.0390 �.0259 �.0317 �.0224 �.0214 �.0212

(.0020) (.0008) (.0021) (.0010) (.0010) (.0007)

{.0004} {.0001} {.0002} {.00003} {.000002} {.000001}


HOA

B.NGUYEN

276

Table 4. Simulation Result of the Average Partial Effects Estimates (N=1,000, Z1 ¼ 0:5, a1 Is NormallyDistributed, 500 Replications).

Model True

Value


Probit


Probit

Fractional

Probit

Fractional

Probit

Estimation

method



y2 Continuous �.2379 �.1599 �.1876 �.2369 �.1625 �.1885 �.2375 �.2390 �.2390

(.0053) (.0041) (.0050) (.0088) (.0051) (.0056) (.0049) (.0048)

{.0025} {.0015} {.00003} {.0024} {.0016} {.00001} {.00003} {.00003}

y2 Discrete 0–1 �.3409 �.2431 �.3393 �.2445 �.3403 �.3401 �.3401

(.0040) (.0028) (.0063) (.0050) (.0023) (.0018)

{.0031} {.00005} {.0031} {.00002} {.00003} {.00003}

1–2 �.1361 �.2032 �.1358 �.2037 �.1360 �.1360 �.1360

(.0019) (.0010) (.0026) (.0012) (.0011) (.0010)

{.0021} {.000007} {.0021} {.000002} {.000002} {.000002}

2–3 �.0215 �.0195 �.0217 �.0192 �.0216 �.0216 �.0216

(.0027) (.0006) (.0032) (.0007) (.0008) (.0006)

{.00006} {.000005} {.00007} {.000003} {.000003} {.000003}

x1 .0238 .0265 .0223 .0240 .0237 .0224 .0240 .0242 .0240

(.0165) (.0084) (.0059) (.0189) (.0084) (.0059) (.0064) (.0061)

x2 .0238 .0234 .0217 .0240 .0230 .0218 .0240 .0239 .0239

(.0179) (.0103) (.0064) (.0192) (.0104) (.0064) (.0064) (.0061)


Estim

atin

gaFractio

nalResp

onse

Model

277

The APE estimates under the fractional probit model are still very close tothe true values of the APEs (Tables 5–7).

3.4. Conclusion from the Monte Carlo Simulations

This section examines the finite sample behavior of the estimators proposedin the FRM with an endogenous count variable. The results of some MonteCarlo experiments show that the QMLE and NLS estimators have smallestSDs, RMSE, and are least biased when the endogeneity is presented.

Table 6. Descriptive Statistics.

Variable Description Mean SD Min Max

frhour Fraction of hours that a woman spends working

per week

.126 .116 0 .589

kidno Number of kids .752 .977 0 10

age Mother’s age in years 29.742 3.613 21 35

agefstm Mother’s age in years when first child was born 20.118 2.889 15 32

hispan ¼ 1 if race is hispanic; ¼ 0 if race is black .593 .491 0 1

nonmomi Non-mom income = Family income – Mom’s

labor income

31.806 20.375 0 157.4

edu Education = Number of schooling years 11.005 3.305 0 20

samesex ¼ 1 if the first two kids have the same sex; ¼ 0

otherwise

.503 .500 0 1

multi2nd ¼ 1 if the 2nd birth is twin; ¼ 0 otherwise .009 .093 0 1

Table 5. Frequencies of the Number of Children.

Number of Kids Frequency Percent Cumulative Relative Frequency

0 16,200 50.90 50.90

1 10,000 31.42 82.33

2 3,733 11.73 94.06

3 1,373 4.31 98.37

4 323 1.01 99.39

5 134 .42 99.81

6 47 .15 99.96

7 6 .02 99.97

8 4 .01 99.99

9 2 .01 99.99

10 2 .01 100.00

Total 31,824 100.00

HOA B. NGUYEN278

The QMLE and NLS methods also produce least biased estimates in termsof both parameters and the APEs compared to other competitive methods.

4. APPLICATION AND ESTIMATION RESULTS

The fraction of hours that a woman spends working per week, which isaffected by the number of children, can be used as an empirical applicationof the FRM with a count endogenous explanatory variable. The data in thischapter were used in Angrist and Evans (1998) to illustrate a linear modelwith a dummy endogenous variable: more than two kids. They estimated theeffect of additional children on female labor supply, considering the numberof children as endogenous and using the instruments: samesex and twins atthe first two births. They found that married women who have the thirdchild reduce their labor supply, and their 2SLS estimates are roughly a halfsmaller than the corresponding OLS estimates.

In this chapter, the fractional response variable is the fraction of hoursthat a woman spends working per week. This variable is generated from thenumber of working hours, which was used in Angrist and Evans (1998),divided by the maximum hours per week (168). There is a substantial

Table 7. First-Stage Estimates Using Instrumental Variables.

Dependent Variable – Kidno Linear Model (OLS) Negative Binomial II Model (MLE)

edu �.065 �.078

(.002) (.002)

age .096 .119

(.002) (.002)

agefstm �.114 �.156

(.002) (.003)

hispan .036 .045

(.010) (.015)

nonmomi �.002 �.003

(.0002) (.0004)

samesex .075 .098

(.010) (.013)

multi2nd .786 .728

(.052) (.045)

constant .911 .013

(.042) (.067)

Note: Figures in parentheses are robust standard errors.


number of women who do not spend any hours working. Therefore, a Tobitmodel might also be of interest.

Table 8 shows the estimation results of the OLS in a linear model, theMLE in a Tobit model, and the QMLE in a fractional probit model when y2is assumed exogenous. The estimation results of the 2SLS in a linearmodel, the MLE in a Tobit BS model, the QMLE-PW, the QMLE, andNLS estimation in a fractional probit model are shown in Table 9 when y2 isassumed endogenous.

Since I also analyze the model using the Tobit BS model, its model specifica-tion and derivation of the conditional mean, the APEs, and the estimationapproach are included in Appendix B. The NLS method with the sameconditional mean used in the QMLE method is also presented in Appendix C.

The count variable in this application is the number of children instead ofan indicator for having more than two kids, which was used in Angrist and

Table 8. Estimates Assuming Number of Kids is ConditionallyExogenous.

Model Linear Tobit Fractional Probit

Estimation method OLS MLE QMLE

Coefficient Coefficient APE Coefficient APE

kidno (continuous) �.019 �.034 �.0225 �.099 �.0202

(.0007) (.0013) (.0008) (.004) (.0008)

0–1 �.0231 �.0207

(.0008) (.0008)

1–2 �.0207 �.0185

(.0007) (.0007)

2–3 �.0183 �.0163

(.0005) (.0005)

edu .004 .008 .005 .022 .005

(.0002) (.0004) (.0002) (.001) (.0002)

age .005 .008 .006 .024 .005

(.0002) (.0003) (.0002) (.001) (.0002)

agefstm �.006 �.010 �.007 �.030 �.006

(.0003) (.0004) (.0003) (.001) (.0003)

hispan �.032 �.052 �.034 �.150 �.031

(.001) (.0022) (.0014) (.007) (.0013)

nonmomi �.0003 �.0006 �.0004 �.002 �.0004

(.00003) (.00006) (.00004) (.0002) (.00003)

Note: Figures in parentheses under the Coefficient column are robust standard errors and

figures in parentheses under the APE column are bootstrapped standard errors.

HOA B. NGUYEN280

Table 9. Estimates Assuming Number of Kids is Endogenous.

Mode Linear Tobit (BS) Fractional Probit Fractional Probit Fractional Probit

Estimation

method

2SLS MLE QMLE-PW Kidno is

assumed continuous

QMLE NLS

Coefficient Coefficient APE Coefficient APE Coefficient APE Coefficient APE

kidno (continuous) �.016 �.027 �.018 �.078 �.016 �.081 �.017 �.081 �.017

(.007) (.013) (.008) (.037) (.008) (.007) (.001) (.007) (.001)

0–1 �.018 �.016 �.017 �.017

(.008) (.008) (.0009) (.0009)

1–2 �.017 �.015 �.015 �.015

(.007) (.007) (.0009) (.0009)

2–3 �.015 �.014 �.014 �.014

(.006) (.005) (.001) (.001)

edu .004 .009 .006 .024 .005 .024 .005 .024 .005

(.0005) (.0009) (.0006) (.002) (.0005) (.001) (.0005) (.001) (.0005)

age .005 .008 .005 .022 .004 .021 .004 .021 .004

(.0007) (.001) (.0008) (.004) (.0008) (.001) (.0008) (.001) (.0008)

agefstm �.006 �.010 �.006 �.028 �.006 �.027 �.005 �.027 �.005

(.0008) (.001) (.001) (.004) (.0009) (.002) (.0008) (.002) (.0008)

hispan �.032 �.052 �.034 �.150 �.031 �.151 �.031 �.151 �.031

(.001) (.002) (.001) (.007) (.001) (.007) (.001) (.007) (.001)

nonmomi �.0003 �.0005 �.0004 �.002 �.0003 �.002 �.0003 �.002 �.0003

(.00004) (.00006) (.00004) (.0002) (.00004) (.0002) (.00004) (.0002) (.00004)

Note: Figures in parentheses under the Coefficient column are robust standard errors and figures in parentheses under the APE column are

bootstrapped standard errors; those under the APEs for a count endogenous variable with the QMLE and NLS methods are computed

standard errors.

Estim

atin

gaFractio

nalResp

onse

Model

281

Evans (1998). The number of kids is considered endogenous, which is in linewith the recent existing empirical literature. First, the number and timingof children born are controlled by a mother who makes fertility decisionscorrelated with the number of children. Second, women’s fertility isdetermined by both heterogeneous preferences and correlated, heteroge-neous opportunity costs of children. The estimation sample contains 31,824women, in which more than 50% are childless, 31% have one kid, 11% havetwo kids, and the rest have more than two kids. Table 5 gives the frequencydistribution of the number of children and it appears to have excess zerosand long tails, with the average number of children being one. Otherexplanatory variables which are exogenous, including demographic andeconomic variables of the family, are also described in Table 6.

The current research on parents, preferences over the sex mixture of theirchildren using US data shows that most families would prefer at least onechild of each sex. For example, Ben-Porath and Welch (1976) found that56% of families with either two boys or two girls had a third birth, whereasonly 51% families with one boy and one girl had a third child. Angrist andEvans (1998) found that only 31.2% of women with one boy and one girlhad a third child, whereas 38.8% and 36.5% of women with two girls andtwo boys had a third child, respectively. With the evidence that women withchildren of the same sex are more likely to have additional children, theinstruments that we can use are samesex and twins.

4.1. Ordinary Least Squares

The OLS estimation often plays a role as a benchmark since its computa-tion is simple, its interpretation is straightforward, and it requires fewerassumptions for consistency. The estimates of a linear model in which thefraction of total working hours per week is the response variable andthe number of kids is considered exogenous are provided in Table 8.As discussed in the literature of women’s labor supply, the coefficient ofthe number of kids is negative and statistically significant. The linear modelwith the OLS estimation ignores functional form issues that arise from theexcess-zeros nature of the dependent variable. In addition, the predictedvalue of the fraction of the total weekly working hours for womenalways lies in the unit interval. The use of the linear model with the OLSestimation does not make any sense if the predicted value occurs outsidethis interval.

HOA B. NGUYEN282

4.2. A Tobit Model with an Exogenous Number of Kids

There are two reasons that a Tobit model might be practical. First, thefraction of working hours per week has many zeros. Second, the predictedvalue needs to be nonnegative. The estimates are given in Table 8. The Tobitcoefficients have the same signs as the corresponding OLS estimates, and thestatistical significance of the estimates is similar. For magnitude, the Tobitpartial effects are computed to make them comparable to the linear modelestimates. First of all, the partial effect of a discrete explanatory variable isobtained by estimating the Tobit conditional mean. Second, the differencesin the conditional mean at two values of the explanatory variable that are ofinterest are computed (e.g., we should first plug in y2i ¼ 1 and then y2i ¼ 0).As implied by the coefficient, having the first child reduces the estimatedfraction of total weekly working hours by about .023, or 2.3 percentagepoints, a larger effect than 1.9 percentage points of the OLS estimate.Having the second child and the third child makes the mother work lessby about .021 or 2.1 percentage points and .018 or 1.8 percentage points,respectively. All of the OLS and Tobit statistics are fully robust andstatistically significant. Comparing with the OLS partial effect, which isabout .019 or 1.9 percentage points, the Tobit partial effects are larger forhaving the first kid but almost the same for the second and the third kids.The partial effects of continuous explanatory variables can be obtained bytaking the derivatives of the conditional mean, or we can practically getthe adjustment factors to make the adjusted Tobit coefficients roughlycomparable to the OLS estimates. All of the Tobit coefficients given inTable 8 for continuous variables are larger than the correspondingOLS coefficients in absolute values. However, the Tobit partial effectsfor continuous variables are slightly larger than the corresponding OLSestimates in absolute values.

4.3. An FRM with an Exogenous Number of Kids

Following Papke and Wooldridge (1996), I also use the fractional probitmodel assuming the exogenous number of children for comparison purpose.The FRM’s estimates are similar to the Tobit’s estimates, but they are evencloser to the OLS estimates. The statistical significance of QML estimates isalmost the same as that of the OLS estimates (see Table 8). Having thesecond child reduces the estimated fraction of total weekly working hoursby 1.9 percentage points, which is roughly the same as the OLS estimate.


However, having the first child and the third child results in differentpartial effects. Having the first kid makes a mother work much less by2.0 percentage points, and having the third kid makes a mother work lessby 1.6 percentage points.

4.4. Two-Stage Least Squares

In the literature on female labor supply, Angrist and Evans (1998) considerfertility as endogenous. Their remarkable contribution is to use twobinary instruments: genders of the first two births are the same (samesex)and twins at the first two births (multi2nd) to account for an endogenousthird child. The 2SLS estimates are replicated and reported in Table 9.The first-stage estimates using the OLS method and assuming a continuousnumber of children, given in Table 7, show that women with highereducation are estimated to be 6.5 percentage points less likely to have kids.In magnitude, the 2SLS estimates are less than the OLS estimates forthe number of kids but roughly the same for other explanatory variables.With IV estimates, having children leads a mother to work less by about1.6 percentage points, which is smaller than the corresponding OLSestimates of about 1.9 percentage points. These findings are consistent withAngrist and Evans’ result.

4.5. A Tobit BS Model with an Endogenous Number of Kids

A Tobit BS model is used with the endogenous number of children (seeTable 9). Only the Tobit APEs of the number of kids have statisticallyslightly larger effect than that of the 2SLS estimates. The APEs of the Tobitestimates are almost the same as those of the corresponding 2SLS estimatesfor other explanatory variables. Having the first, second, and thirdkids reduces the fraction of hours a mother spends working per week byaround 1.8, 1.7, and 1.5 percentage points, respectively. Having the third kidreduces a mother’s fraction of working hours per week by the same amountas the 2SLS estimates. The statistical significance is almost the same forthe number of kids. The Tobit BS method is similar to the 2SLS method inthe sense that the first stage uses a linear estimation and it ignores the countnature of the number of children. It explains why the Tobit BS result getsvery close to the 2SLS estimates.

HOA B. NGUYEN284

4.6. An FRM with an Endogenous Number of Kids

Now let us consider the FRM with the endogenous number of kids. Thefractional probit model with Papke–Wooldridge method (2008) has dealtwith the problem of endogeneity. However, this method has not taken intoaccount the problem of count endogeneity. The endogenous variable in thismodel is assumed as a continuous variable; hence, the partial effects atdiscrete values of the count endogenous variable are not considered. In thischapter, the APEs of the QMLE-PW estimates are also computed in order tobe comparable with other APE estimates. Having the first kid reducesa mother’s fraction of weekly working hours by the same amount as the 2SLSestimates. Treating the number of children continuous also gives the sameeffect as the 2SLS estimate on the number of kids. As the number of childrenincreases, the more working hours a mother has to sacrifice. Having thesecond and third kids reduces the fraction of hours a mother spends workingper week by around 1.6, 1.5, and 1.4 percentage points, respectively. Thestatistical significance is the same as the Tobit BS esimates for the number ofkids. The APEs of the QMLE-PW estimates are almost the same as those ofthe corresponding 2SLS estimates for other explanatory variables.

The fractional probit model with the methods proposed in this chapter isattractive because it controls for endogeneity, functional form issues, and thepresence of unobserved heterogeneity. More importantly, the number ofchildren is considered a count variable instead of a continuous variable. Boththe QMLE and NLS are considered and the NLS estimates are quite the sameas the QML estimates. The QML and NLS coefficients and robust standarderrors are given in Table 9, and the first-stage estimates are reported in Table 7.

In the first stage, the Poisson model for the count variable is preferredbecause of two reasons. First, the distribution of the count variable with a longtail and excess zeros suggests an appropriate model of gamma heterogeneityinstead of normal heterogeneity. Second, adding the unobserved heterogeneitywith the standard exponential gamma distribution to the Poisson model trans-forms the model to the Negative Binomial model, which can be estimated bythe ML method. The OLS and Poisson estimates are not directly comparable.For instance, increasing education by one year reduces the number of kids by.065 as in the linear coefficient and by 7.8% as in the Poisson coefficient.

The fractional probit (FB) estimates have the same signs as thecorresponding OLS and 2SLS estimates. In addition, the result shows thatthe QMLE is more efficient than the OLS and 2SLS. For magnitude, theFB APEs are computed to make them comparable to the linear modelestimates. Similar to the Tobit model, the partial effect of a discrete


explanatory variable is obtained by estimating the conditional mean andtaking the differences at the values we are interested in. Regarding thenumber of kids, having more kids reduces the fraction of hours that amother works weekly. Having the first child cuts the estimated fraction oftotal weekly working hours by about .017, or 1.7 percentage points, which issimilar to the 2SLS estimates, and reduces more than the OLS estimates.Having the second child and the third child makes a mother work less byabout 1.5 and 1.4 percentage points, respectively. Even though having thethird kid reduces a mother’s fraction of weekly working hours comparedto having the second kid, the marginal reduction is less, since a marginalreduction of .2 percentage points for having the second kid now goesdown to .1 percentage points for having the third kid. This can be seen as the‘‘adaptation effect’’ as the mother adapts and works more effectively afterhaving the first kid. The partial effects of continuous explanatory variablescan be obtained by taking the derivatives of the conditional mean so that theywould be comparable to the OLS, 2SLS, and other alternative estimates.

All of the estimates in Table 9 tell a consistent story about fertility.Statistically, having any children reduces significantly a mother’s workinghours per week. In addition, the more kids a woman has, the more hoursthat she needs to forgo. The FRM estimate treating the number of kids asendogenous and as a count variable shows that the marginal reduction inwomen’s working hours per week is less as women have additional children.In addition, the FRM estimates, taking into account the endogeneity andcount nature of the number of children, are statistically significant and moresignificant than the corresponding 2SLS and Tobit estimates.

5. CONCLUSION

I present the QMLE and NLS methodologies to estimate the FRM with acount endogenous explanatory variable. The unobserved heterogeneity isassumed to have an exponential gamma distribution, and the conditionalmean of the FRM is estimated numerically. The QMLE and NLSapproaches are more efficient than the 2SLS and Tobit with IV estimates.They are more robust and less difficult to compute than the standard MLEmethod. This approach is applied to estimate the effect of fertility on thefraction of working hours for a female per week. Allowing the number ofkids to be endogenous, using the data provided in Angrist and Evans (1998),I find that the marginal reduction in women’s working hours per week is lessas women have additional children. In addition, the effect of the number ofchildren on the fraction of hours that a woman spends working per week

HOA B. NGUYEN286

is statistically significant and more significant than the estimates in all otherlinear and nonlinear models considered in the chapter.

NOTE

Details on CPU times and the number of quadrature points and codes areavailable upon request.

ACKNOWLEDGMENTS

I express my special thank to Jeffrey Wooldridge for his advice and support.I thank Peter Schmidt, Carter Hill, David Drukker, and two anonymousreferees for helpful comments. All remaining errors are my own.

REFERENCES

Angrist, J., & Evans, W. (1998). Children and their parents’ labor supply: Evidence from

exogenous variation in family size. American Economic Review, 88, 450–477.

Ben-Porath, Y., & Welch, F. (1976). Do sex preferences really matter? Quarterly Journal of

Economics, 90, 285–307.

Das, M. (2005). Instrumental variables estimators of nonparametric models with discrete

endogenous regressors. Journal of Econometrics, 124, 335–361.

Mullahy, J. (1997). Instrumental-variable estimation of count data models: Applications to

models of cigarette smoking behavior. Review of Economics and Statistics, 79, 586–593.

Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. In:

R. F. Engle & D. McFadden (Eds), Handbook of econometrics (Vol. 4, pp. 2111–2245).

Amsterdam: North Holland.

Papke, L., & Wooldridge, J. (1996). Econometric methods for fractional response variables with

an application to 401(k) plan participation rates. Journal of Applied Econometrics, 11,

619–632.

Papke, L., & Wooldridge, J. (2008). Panel data methods for fractional response variables with

an application to test pass rates. Journal of Econometrics, 145, 121–133.

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel,

longitudinal and structural equation models. Boca Raton, FL: Chapman & Hall/CRC.

Smith, R., & Blundell, R. (1986). An exogeneity test for a simultaneous equation tobit model

with an application to labor supply. Econometrica, 54, 679–685.

Staiger, D., & Stock, J. (1997). Instrumental variables regression with weak instruments.


Terza, J. (1998). Estimating count data models with endogenous switching: Sample selection

and endogenous treatment effects. Journal of Econometrics, 84, 129–154.

Weiss, A. (1999). A simultaneous binary choice/count model with an application to credit

card approvals. In: R. Engle & H. White (Eds), Cointegration, causality, and forecasting: A

festschrift in honour of Clive W. J. Granger (pp. 429–461). Oxford: Oxford University Press.


Windmeijer, F., & Santos-Silva, J. (1997). Endogeneity in count-data models: An application to

demand for health care. Journal of Applied Econometrics, 12, 281–294.

Winkelmann, R. (2000). Econometric analysis of count data. Berlin: Springer.

Wooldridge, J. (2002). Econometric analysis of cross section and panel data. Cambridge, MA:

MIT Press.

APPENDICES

Appendix A

This appendix derives asymptotic standard errors for the QML estimator inthe second step and the average partial effects. In the first stage, we havey2ijzi; a1i � Poisson ½expðzid2 þ a1iÞ� with the conditional density function:

f ðy2ijzi; a1iÞ ¼½expðzid2 þ a1iÞ�

y2i exp½� expðzid2 þ a1iÞ�

y2i!(A.1)

The unconditional density of y2i conditioned only on zi is obtained byintegrating a1i out of the joint density. That is:

f ðy2ijziÞ ¼

Za1i

f ðy2ijzi; a1iÞf ða1iÞda1i

¼

Z 1�1

½expðzid2 þ a1iÞ�y2i exp½� expðzid2 þ a1iÞ�

y2i!

dd00 expða1iÞd0�1 expð�d0 expða1iÞÞ

Gðd0Þda1i

Let mi ¼ expðzid2Þ and ci ¼ expða1iÞ then the conditional density is

f ðy2ijzi; a1iÞ ¼½mici�

y2i exp½�mici�

Gðy2i þ 1Þ

and the unconditional density is

f ðy2ijziÞ ¼

Z 10

½mici�y2i exp½�mici�

Gðy2i þ 1Þ

dd00 cd0�1i expð�d0ciÞÞ

Gðd0Þdci

¼½mi�

y2idd00Gðy2i þ 1ÞGðd0Þ

Z 10

exp½�ciðmi þ d0Þ�cy2iþd0�1i dci

¼½mi�

y2idd00Gðy2i þ 1ÞGðd0Þ

Gðy2i þ d0Þ

ðmi þ d0Þðy2iþd0Þ

HOA B. NGUYEN288

Define hi ¼ d0=mi þ d0 results in

f ðy2ijziÞ ¼Gðy2i þ d0Þh

d0i ð1� hiÞ

y2i

Gðy2i þ 1ÞGðd0Þ

where y2i ¼ 0; 1; . . . and d040, which is the density function for the NegativeBinomial distribution.

The log-likelihood for observation i is

liðd2;d0Þ ¼ d0 lnd0

d0þ expðzid2Þ

� �þ y2i ln

expðzid2Þ

d0þ expðzid2Þ

� �þ ln

Gðy2iþ d0ÞGðy2iþ 1ÞGðd0Þ

� �

(A.2)

For all observations:

Lðd2;d0Þ ¼XNi¼1

liðd2;d0Þ

We can estimate jointly d2 and d0 by a stepwise MLE.Let c ¼ ðd2; d0Þ

0 has the dimension of (L+1), where L is the dimension ofd2 which is the sum of K and the number of instruments, under standardregularity conditions, we have

ffiffiffiffiNpðc� cÞ ¼ N�1=2

XNi¼1

ri2 þ opð1Þ (A.3)

where

ri2 ¼�A�101 s01

�A�102 s02

" #(A.4)

in which

s0 ¼rd2 li

rd0 li

!¼

s01

s02

!

and

A0 ¼ Eðr2c liÞ ¼ E

r2d2li

r2d0li

!¼ E

H01

H02

!¼

A01

A02

!


After taking the first and the second derivatives, we have

s01 ¼z0id0ðy2i � expðzid2ÞÞ

d0 þ expðzid2Þ

H01 ¼ �z0izid0 expðzid2Þd0 þ expðzid2Þ

s02 ¼ lnd0

d0 þ expðzid2Þ

� �þ

expðzid2Þ � y2id0 þ expðzid2Þ

þG0ðy2i þ d0ÞGðy2i þ d0Þ

�G0ðd0ÞGðd0Þ

H02 ¼expðzid2Þ

d0½d0 þ expðzid2Þ��

expðzid2Þ � y2i

½d0 þ expðzid2Þ�2

þG00ðy2i þ d0ÞGðy2i þ d0Þ � ½G0ðy2i þ d0Þ�2

½Gðy2i þ d0Þ�2�

G00ðd0ÞGðd0Þ � ½G0ðd0Þ�2

½Gðd0Þ�2

s01 and H01 are L� 1 and L� L matrices; s012 and H02 are 1� 1 and 1� 1matrices. ri2ðcÞ has the dimension of ðLþ 1Þ � 1.With the two-step M-estimator, the asymptotic variance of

ffiffiffiffiNpðh� hÞ

must be adjusted to account for the first-stage estimation offfiffiffiffiNpðc� cÞ

(see more in 12.4.2 of Chapter 12, Wooldridge, 2002).The score of the QML (or the gradient) for observation i with respect

to h is

siðh; gÞ ¼ 5hliðhÞ

¼ y1i5hmimi� ð1� y1iÞ

5hmi1� mi

¼y1i5hmið1� miÞ � mið1� y1iÞ5hmi

mið1� miÞ

¼y1i5hmi � mi5hmi

mið1� miÞ

¼ðy1i � miÞ5hmimið1� miÞ

¼ðy1i � miÞmið1� miÞ

Z þ1�1

@FðgihÞ@h

f ða1jy2i; ziÞda1

siðh; gÞ ¼ðy1i � miÞmið1� miÞ

Z þ1�1

g0ifðgihÞf ða1jy2i; ziÞda1 ðA:5Þ

HOA B. NGUYEN290

where gi ¼ ðy2i; z1i; a1iÞ and h ¼ ða1; d1; Z1Þ0 and h has the dimension

of K+2

ffiffiffiffiNpðh� hÞ ¼ A�11 ðN

�1=2XNi¼1

ri1ðh; cÞÞ þ opð1Þ (A.6)

A1 ¼ E½�5hsiðh; gÞ�

¼ Eð5hmiÞ

05hmi

mið1� miÞ

� �

¼ E1

mið1� miÞB0B

� �

A1 ¼ N�1XNi¼1

1

mið1� miÞB0B

� �(A.7)

where B ¼Rþ1�1

g0ifðgihÞf ða1jy2i; ziÞda1

ri1ðh; cÞ ¼ siðh; gÞ � F1ri2ðcÞ

ri1ðh; cÞ ¼ siðh; gÞ � F1ri2ðcÞ ðA:8Þ

where ri1ðh; cÞ; siðh; gÞ are ðK þ 2Þ � 1 matrices, and ri2ðcÞ and F1 areðLþ 1Þ � 1 and ðK þ 2Þ � ðLþ 1Þ matrices, A1 is a ðK þ 2Þ � ðK þ 2Þmatrix.

F1 ¼ E½5csiðh; gÞ� ¼ E5d2

siðh; gÞ

5d0siðh; gÞ

!

Eð5d2siðh; gÞÞ ¼ E

�1

mið1� miÞB

Z þ1�1

FðgihÞ@f ða1jy2i; ziÞ

@d2da1

� ��

Eð5d0siðh; gÞÞ ¼ E�1

mið1� miÞB

Z þ1�1


@d0da1

� ��

F1 ¼1

N

XNi¼1

�1

mið1� miÞBRþ1�1


@d2da1

� �

�1

mið1� miÞBRþ1�1

FðgihÞ@f ðajy2i; ziÞ

@d0da1

� �

266664

377775 ðA:9Þ


where

@f ða1jy2i; ziÞ

@d2¼

z0iPC½d0 þ expðzid2Þ�ðy2iþd0�1Þ

Gðy2i þ d0Þ(A.9.1)

where P ¼ � expðzid2 þ a1Þ þ a1ðy2i þ d0Þ � d0 expða1Þ and C ¼ fðy2i þ d0Þexpðzid2Þ � expðzid2 þ a1Þ½d0 þ expðzid2Þ�g

@f ða1jy2i; ziÞ

@d0¼ f ða1jy2i; ziÞD (A.9.2)

whereD ¼ a1 � a1 expða1Þ þ lnðd0 þ expðzid2ÞÞ þ ðy2i þ dÞ0=ðd0 þ expðzid2ÞÞ�G0ðy2i þ d0Þ and

f ða1jy2i; ziÞ ¼expðPÞ½d0 þ expðzid2Þ�ðy2iþd0Þ

Gðy2i þ d0Þ

AvarffiffiffiffiNpðh� hÞ ¼ A�11 Var½ri1ðh; cÞ�A

�11

dAvarðhÞ ¼

1

NA�1

1 N�1XNi¼1

ri1r0i1

!A�1

1 ðA:10Þ

The asymptotic standard errors are obtained by the square roots of thediagonal elements of this matrix.

Now we obtain standard errors for the APEs.First, we need to obtain the asymptotic variance of

ffiffiffiffiNpðw� wÞ for

continuous explanatory variable where

w ¼ N�1XNi¼1

Z þ1�1


!h (A.11)

is the vector of scaled coefficients times the scaled factor in the APE sectionand

w ¼ E

Z þ1�1


� �h (A.12)

is the vector of scaled population coefficients times the mean response.If y2 is treated as a continuous variable:

dAPE ¼ N�1XN

i¼1

Z þ1�1

fða1y2i þ z1id1 þ Z1a1Þf ða1jy2i; ziÞda1

� �a1

HOA B. NGUYEN292

For a continuous variable z11:

dAPE ¼ N�1XNi¼1

Z þ1�1

fða1y2i þ z1id1 þ Z1a1Þf ða1jy2i; ziÞda1

!d11

Using problem 12.12 in Wooldridge (2003), and let p ¼ ðh0; d0

2; d0

0Þ0 we

have

ffiffiffiffiNpðw� wÞ ¼ N�1=2

XNi¼1

½jðgi; zi;pÞ � w� þ E½rpjðgi; zi;pÞ�ffiffiffiffiNpðp� pÞ þ opð1Þ

(A.13)

where jðgi; zi; pÞ ¼ ðRþ1�1

fðgihÞf ða1jy2i; ziÞda1Þh and

f ða1jy2; zÞ ¼ f ða1; d0; d0Þd0 þ expðzd2Þ

d0 þ expðzd2 þ a1Þ

� �d0þy2½expða1Þ�

y2

First, we need to findffiffiffiffiNpðp� pÞ

ffiffiffiffiNpðp� pÞ ¼ N�1=2

XNi¼1

A�11 ri1

ri2

!þ opð1Þ

ffiffiffiffiNpðp� pÞ ¼ N�1=2

XNi¼1

ki þ opð1Þ ðA:14Þ

Thus, the asymptotic variance offfiffiffiffiNpðw� wÞ is

Var

Z þ1�1


� �h� w

� �þ JðpÞki

� �(A.15)

where JðpÞ ¼ E½rpjðgi; zi; pÞ�.Next, we need to find rhjðgi; zi; pÞ;rd2 jðgi; zi;pÞ, and rd0 jðgi; zi;pÞ.

rhjðgi; zi;pÞ ¼

Z þ1�1


� �IKþ2

�

Z þ1�1

fðgihÞðgihÞðhgiÞf ða1jy2i; ziÞda1

� �ðA:16Þ

where IKþ2 is the identity matrix and (K+2) is the dimension of h

rd2 jðgi; zi;pÞ ¼ h

Z þ1�1

fðgihÞ@f ða1jy2i; ziÞ

@d2da1

� �0(A.17)


where @f ða1ijy2i; ziÞ=@d2 is defined in (A.9.1) and

rd0 jðgi; zi; pÞ ¼ h

Z þ1�1

fðgihÞ@f ða1jy2i; ziÞ

@d0da1

� �0(A.17.1)

where @f ða1ijy2i; ziÞ=@d0 is defined in (A.9.2).rd2 jðgi; zi; pÞ is ðK þ 2Þ � L matrix and rd0 jðgi; zi;pÞ is ðK þ 2Þ � 1 matrix.Then

rpjðgi; zi; pÞ ¼ rhjðgi; zi;p; hÞjrd2 jðgi; zi;p; d2Þjrd0 jðgi; zi;p; d0Þ�

(A.18)

and its expected value is estimated as

J ¼ JðpÞ ¼ N�1XNi¼1

rhjðgi; zi; p; hÞjrd2 jðgi; zi;p; d2Þjrd0 jðgi; zi; p; d0Þh i

(A.19)

Finally, Avar½ffiffiffiffiNpðw� wÞ� is consistently estimated as:

dAvar½ffiffiffiffiNpðw� wÞ� ¼ N�1

XNi¼1

Z þ1�1


� �h� wþ Jki

� �

�

Z þ1�1


� �h� wþ Jki

� �0ðA:20Þ

where all quantities are evaluated at the estimators given above. Theasymptotic standard error for any particular APE is obtained as the squareroot of the corresponding diagonal element of Eq. (A.20), divided by

ffiffiffiffiNp

.Now we obtain the asymptotic variance of

ffiffiffiffiNpðbk� kÞ for a count

endogenous variable where

APE ¼ Ea1 ½Fða1ykþ12 þ z1d1 þ Z1a1Þ � Fða1yk2 þ z1d1 þ Z1a1Þ� (A.21)

For example, yk2 ¼ 0 and ykþ12 ¼ 1.

dAPE ¼ N�1XNi¼1

Z þ1�1

Fðgkþ1i hÞf ða1jy2i; ziÞda1

�

�

Z þ1�1


�ðA:22Þ

HOA B. NGUYEN294

Var½ffiffiffiffiNpðbk� kÞ� ¼ Var

ffiffiffiffiNp½ðbkkþ1 � bkkÞ � ðkkþ1 � kkÞ�

¼ VarffiffiffiffiNpðbkkþ1 � kkþ1Þ þ Var

ffiffiffiffiNpðbkk � kkÞ

� 2Cov½ffiffiffiffiNpðbkkþ1 � kkþ1Þ;

ffiffiffiffiNpðbkk � kkÞ�

(1) We start with

ffiffiffiffiNpðbkk � kkÞ ¼ N�1=2

XNi¼1

ðjðgki ; zi;pÞ � kkÞ

þE½rpjðgki ; zi; pÞ�

ffiffiffiffiNpðp� pÞ þ opð1Þ

(A.23)

where jðgki ; zi;pÞ ¼Rþ1�1


dVar½ ffiffiffiffiNp ðbkk � kkÞ� ¼ N�1XNi¼1

Z þ1�1

Fðgki bhÞf ða1jy2i; ziÞda1 � bkk þ bJbki� �2

(A.24)

where the notations of bki is the same as Eq. (A.14) and J is defined asfollows:

J ¼ JðpÞ ¼ N�1XNi¼1

½rhjðgki ; zi;p; hÞjrd2 jðg

ki ; zi;p; d2Þjrd0 jðg

ki ; zi;p; d0Þ�

(A.25)

rhjðgki ; zi;p; yÞ ¼

Z þ1�1

gk0i jðgkibhÞf ða1jy2i; ziÞda1 (A.26)

rd2 jðgki ; zi; p; d2Þ ¼

Z þ1�1

Fðgki bhÞ @f ða1jy2i; ziÞ@d2da1 (A.27)

rd0 jðgki ; zi; p; d0Þ ¼

Z þ1�1

Fðgki bhÞ @f ða1jy2i; ziÞ@d0da1 (A.28)

(2) dVar½ ffiffiffiffiNp ðbkkþ1 � kkþ1Þ� is obtained in a similar way as (1).(3) Using the formula: Cov(x,y)=E(xy)�ExEy and getting the estimator

of this covariance with the notice that EðbkkÞ ¼ kk, after some algebra,we have the estimator for this covariance is 0.

Adding (1), (2), and (3) together, we get

dVar½ ffiffiffiffiNp ðbk� kÞ� ¼dVar½ ffiffiffiffiNp ðbkk � kkÞ� þdVar½ ffiffiffiffiNp ðbkkþ1 � kkþ1Þ� (A.29)


The asymptotic standard error for APE of the count endogenous variableis obtained as the square root of the corresponding diagonal element ofEq. (A.29), divided by

ffiffiffiffiNp

.

Appendix B

Following the Smith–Blundell (1986) approach, the model with endogenousy2 is written as

y1 ¼ maxð0; a1y2 þ z1d1 þ v2x1 þ e1Þ

where the reduced form of y2 is:

y2 ¼ zp2 þ v2; v2jz�Normalð0;S2Þ

and e1jz; v2 � Normalð0; s2eÞ.The conditional mean of y1 is:

Eðy1jz; y2; v2Þ ¼ F½ða1y2 þ z1d1 þ v2x1Þ=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið1þ s2eÞ

q�

¼ Fða1ey2 þ z1d1e þ v2x1eÞ

The Blundell–Smith procedure for estimating a1; d1; x1, and s2e will then be

(i) Run the OLS regression of yi2 on zi and save the residuals vi2,i=1,2,y ,N

(ii) Do Tobit of yi1 on yi2; z1i and vi2 to get a1e; d1e, and x1e, i=1,2,y ,N

APEs for Tobit model with exogenous or endogenous variable areobtained as follows:

* APE in Tobit model with exogenous variable y2

y1 ¼ maxð0; y�1Þ; y�1 ¼ a1y2 þ z1d1 þ a1; a1jy2; z1 � Nð0;s2Þ

The conditional mean is Eðy1jz1; y2Þ ¼ Fða1sy2 þ z1d1sÞða1y2 þ z1d1Þ þ

sfða1sy2 þ z1d1sÞ ¼ mðy2; z1; y1s; y1Þ where a1s ¼ a1=s; d1s ¼ d1=sFor a continuous variable y2 : APE ¼ @Eðy1jz1; y2Þ=@y2 ¼ Fða1sy2 þ

z1d1sÞa1The estimator for this APE is dAPE ¼ 1

N

PNi¼1Fða1sy2i þ z1id1sÞa1

For a discrete variable y2 with the two values c and c+1: APE ¼ mðy2i ¼cþ 1Þ �mðy2i ¼ cÞ and the estimator for this APE is dAPE ¼ 1

N

PNi¼1mðy2i ¼

cþ 1Þ � mðy2i ¼ cÞ, where mðy2i ¼ cÞ ¼ Fða1scþ z1id1sÞða1cþ z1id1Þþ

sfða1scþ z1id1sÞ.

HOA B. NGUYEN296

* APE in Tobit model with endogenous y2 (Blundell & Smith, 1986)

y1 ¼ maxð0; y�1Þ; y�1 ¼ a1y2 þ z1d1 þ Z1a1 þ e1 ¼ a1y2 þ z1d1 þ u1

y2 ¼ zd2 þ a1,

Varða1Þ ¼ s2; e1jz; a1�Nð0; t21Þ

The standard method is to obtain APEs by computing the derivatives orthe differences of

Ea1 ½mða1y2 þ z1d1 þ Z1a1; t21Þ�

where mða1y2 þ z1d1 þ Z1a1; t21Þ ¼ mða1y2 þ z1d1; Z21s

2 þ t21Þ

The conditional mean is mða1y2 þ z1d1; Z21s2 þ t21Þ ¼ Eðy1jz1; y2Þ ¼

Fða1sy2 þ z1d1sÞða1y2 þ z1d1Þ þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiZ21s

2 þ t21

qfða1sy2 þ z1d1sÞ; where a1s ¼

a1=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiZ21s

2 þ t21

q; d1s ¼ d1=


2 þ t21

q.

Estimators of APEs are obtained from the derivatives or the differ-ences of mða1y2 þ z1d1; Z

21s

2þ t21Þ with respect to elements of ðz1; y2Þ,

where s2 is the estimate of error variance from the first-stage OLSregression.

dAPE with respect to z1 ¼ N�1PN

i¼1Fða1sy2i þ z1id1sÞa1dAPE with respect to y2 ¼ N�1PN

i¼1mðy2i ¼ cþ 1Þ � mðy2i ¼ cÞ

wheremðy2i ¼ cÞ ¼ Fða1scþ z1id1sÞða1cþ z1id1Þ þ


2þ t21

qfða1scþ z1id1sÞ.

An alternative method is to get APEs by computing the derivatives or thedifferences of

Ea1 ½mða1y2 þ z1d1 þ Z1a1; t21Þ�

where mðz1; y2; a1; t21Þ ¼ mðx; t21Þ ¼ Fðx=t1Þxþ s1/ðx=t1Þ

dAPE with respect to z1 ¼ N�1PN

i¼1Fðx=ffiffiffiffiffibt21

qÞbd11

dAPE with respect to y2 ¼ N�1PN

i¼1½ bm1 � bm0� where bm0 ¼ bm½y2 ¼ 0�

in which x ¼ ba1y2 þ z1bd1 þ bZ1ba1i and a1 is the residual obtained from thefirst-stage estimation.

For more details, see the Blundell–Smith procedure and the APEs inWooldridge (2002), Chapter 16.


Appendix C

In order to compare the NLS and the QML estimation, the basic frameworkis introduced as below:

The first stage is to estimate d2 and d0 by using the step-wise maximumlikelihood of yi2 on zi in the Negative Binomial model. Obtain the estimatedparameters d2 and d0. In the second stage, instead of using QMLE, we usethe NLS of yi1 on yi2; zi1 to estimate a1; d1, and Z1 with the approximatedconditional mean miðh; y2;zÞ.

The NLS estimator of h solves:

minh2Y

N�1XNi¼1

y1i �

Z þ1�1

Fða1y2i þ z1id1 þ Z1a1iÞf ða1jy2i; ziÞda1i

� �2

or

minh2Y

N�1XNi¼1

½y1i � miðh; y2i;ziÞ�2=2

The score function can be written as

si ¼ �ðy1i � miÞZ þ1�1

g0ifðgihÞf ða1jy2i; ziÞda1

Appendix D

We are given expða1Þ distributed as Gammaðd0; 1=d0Þ using a singleparameter d0. We are interested in obtaining the density function ofY ¼ a1. Let X ¼ expða1Þ. The density function of X is specified as follows:

f ðX ; d0Þ ¼dd00 X

d0�1 expð�d0XÞGðd0Þ

; X40; d040

Since X40 and Y ¼ lnðXÞ; dX=dY ¼ expðYÞ and Y 2 ð�1;1Þ. Thedensity function of Y will be derived as

f ðY ; d0Þ ¼ f ½hðYÞ�dX

dY

;Y 2 ð�1;1Þ

where f ½hðYÞ� ¼ dd00 expðYÞd0�1 exp½�d0 expðYÞ�=Gðd0Þ. Plug in Y ¼ a1, we get

f ðY ; d0Þ ¼dd00 expða1Þ

d0 exp½�d0 expða1Þ�Gðd0Þ

which is Eq. (4).

HOA B. NGUYEN298

ALTERNATIVE RANDOM EFFECTS

PANEL GAMMA SML ESTIMATION

WITH HETEROGENEITY IN

RANDOM AND ONE-SIDED ERROR

Saleem Shaik and Ashok K. Mishra

ABSTRACT

In this chapter, we utilize the residual concept of productivity measuresdefined in the context of normal-gamma stochastic frontier productionmodel with heterogeneity to differentiate productivity and inefficiencymeasures. In particular, three alternative two-way random effects panelestimators of normal-gamma stochastic frontier model are proposed usingsimulated maximum likelihood estimation techniques. For the threealternative panel estimators, we use a generalized least squares procedureinvolving the estimation of variance components in the first stage andestimated variance–covariance matrix to transform the data. Empiricalestimates indicate difference in the parameter coefficients of gammadistribution, production function, and heterogeneity function variablesbetween pooled and the two alternative panel estimators. The differencebetween pooled and panel model suggests the need to account for spatial,temporal, and within residual variations as in Swamy–Arora estimator,and within residual variation in Amemiya estimator with panel frame-work. Finally, results from this study indicate that short- and long-run





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)00000260013

299

dx.doi.org/10.1108/S0731-9053(2010)00000260013

variations in financial exposure (solvency, liquidity, and efficiency) play animportant role in explaining the variance of inefficiency and productivity.

1. INTRODUCTION

The stochastic frontier model, introduced simultaneously by Aigner, Lovell,and Schmidt; Meeusen and van den Broeck; Battese and Corra in 1977,decomposes the error term e into a symmetrical random error v and a one-sided error or inefficiency u.1 Aigner et al. (1977) assumed a normal–halfnormal and an exponential distribution while Meeusen and van den Broeck(1977) assumed an exponential distribution of inefficiency term, u. In 1982,Jondrow, Materov, Lovell, and Schmidt suggested a method to estimatefirm-specific inefficiency measures. In 1980, Greene proposed a flexiblefrontier production model and followed up with a gamma-distributedstochastic frontier model in 1990. In 2003, Greene proposed the simulatedmaximum likelihood (SML) estimation of normal-gamma stochasticfrontier model to overcome the complexity associated with the loglikelihood function.2

Since the 1990s other theoretical contributions to stochastic frontieranalysis include the estimation of time invariant and variant models underalternative distributions of one-sided inefficiency term for panel data.However, most of the panel stochastic frontier model include fixed effectsdummy under time invariant models and variations in the intercept(Schmidt & Sickles, 2004). This assumption of time invariant was relaxedlater in the estimation of stochastic frontier models. Still, these models forcethe fixed effects dummy not only to capture the heterogeneity but also toestimate inefficiency across cross-sectional units. The proposed randomeffects time invariant and variant stochastic frontier models primarily dealtwith one-way random effects associated with cross-sectional variation (seeCornwell, Schmidt, & Sickles, 1990; Battese & Coelli, 1992). Additionally,research has focused on the influence of a broader set of determinants ofinefficiency, heteroskedasticity, and heterogeneity, namely, geographic,market structure conduct and performance, policy, and size of the firmvariables, using a two-step procedure. The two-step procedure has been thesubject of analysis by earlier researchers, though it might be biased due toomitted or left out variables (see Wang & Schmidt, 2002; Greene, 2004).

Other extensions of the stochastic frontier analysis include differentiationinto productivity and inefficiency measures using panel data. Productivity

SALEEM SHAIK AND ASHOK K. MISHRA300

identified with residual error (Abramovitz, 1956; Solow, 1957), is defined asthe difference between the log of the output and log of the input. In thischapter, we first utilize the residual concept of productivity measures definedby Abramovitz (1956) in the context of stochastic frontier production modelto differentiate productivity and efficiency measures (see below). FollowingGreene (2004), instead of a two-step process, a stochastic frontier modelwith heteroskedasticity of v random error term identified as productivityand u one-sided error term identified as inefficiency is used to examine theimportance of short- and long-run variation in financial ratios (liquidity,solvency, and efficiency) on productivity and inefficiency.

Second, three alternative two-way random effects panel estimators ofnormal-gamma stochastic frontier model are proposed, using SMLestimation techniques. For the three alternative panel estimators, we use ageneralized least squares procedure3 involving the estimation of variancecomponents in the first stage and the use of the estimated variance–covariance matrix to transform the data in the second stage. Severalpossibilities exist for the first stage, namely the use of pooled OLS residualsas in Wallace–Hussain (WH) approach (1969), within residuals as inAmemiya (AM) approach (1971) or within residuals, between cross-sectional residuals and between time-series residuals as in Swamy–Arora(SA) approach (1972) in the estimation of alternative panel estimators.

In Section 2, we extend Greene’s (2003) normal-gamma SML stochasticfrontier methodology to include technical efficiency and productivity.Then three alternative two-way random effects panel estimators of normal-gamma SML stochastic frontier model are presented. Section 3 will providedetails of the panel data used in the analysis and simulation. Applicationof the three alternative two-way random effects panel estimators of normal-gamma SML stochastic frontier is presented in Section 4, and someconclusions are drawn in Section 5.

2. RANDOM EFFECTS PANEL GAMMA SML

STOCHASTIC FRONTIER MODELS

2.1. Stochastic Frontier Models to Include Efficiency and Productivity

Following Greene (2003), the gamma SML stochastic frontier model can beused to represent a Cobb–Douglas production function as

y ¼ f ðx; bÞ þ v� u (1)

Alternative Random Effects Panel Gamma SML Estimation 301

where y is the output and x is a vector of inputs used in the productionfunction, b is a vector coefficients associated with inputs, v representsthe random error and v � Nð0;s2vÞ, u represents the one-sided inefficiencyand can be represented with alternative distributions including normal-gamma with a scale q and a shape parameter P. The normal-gamma distribution of the inefficiency, u, following Greene (2003) can bedefined as

f ðuÞ ¼qp expð�quÞuP�1

GðPÞ(2)

Eq. (1) with normal-gamma distribution can be extended by intro-ducing heterogeneity in the random error, v, and the one-sided inefficiency,u, as

y ¼ f ðx; bÞ þ v� u

s2u ¼ expðd0zÞ

s2v ¼ expðd0zÞ ð3Þ

where s2u is the variance in the inefficiency term, s2v is the variance in therandom error. The variances in the inefficiency and random error terms canbe modeled as a function of variance in variables z. Here, we definedthe variances as a function of variance of financial ratio variables, whichinclude solvency, liquidity, and efficiency. The inefficiency and random errorvariances in Eq. (3) can be paraphrased as variance in inefficiency andproductivity measures.

Productivity or total factor productivity (TFP) is defined as the ratio ofinputs over outputs. Mathematically, the production function assuminginefficiency can be represented as y ¼ f ð ~x;bÞ þ v, where f ð ~x; bÞ is equal tof ðx; bÞ � u. This production function, assuming inefficiency, can be usedto represent TFP as v ¼ y=f ð ~x; bÞ. The productivity concept could beincorporated into stochastic frontier production function (SFPF) withdecomposed error terms, y ¼ f ðx; bÞ þ v� u; where v constitutes a conven-tional random error or TFP and u constitutes one-side disturbance that isdistributed as normal-gamma and represents inefficiency.

The SFPF with heteroskedasticity listed above is used to examine theimportance of short- and long-run variation in liquidity, solvency, and


efficiency financial ratios. Specifically, the model can be represented as:

y ¼ f ðx; bÞ þ v� u . . . . . . Output

s2inefficiency ¼ expðd0zÞ . . . . . . Inefficiency

s2productivity ¼ expðd0zÞ . . . . . . Productivity

(4)

2.2. Panel Gamma SML Stochastic Frontier Models

The time-series or cross-section gamma SML stochastic frontier modelEq. (4) can be extended to one- and two-way fixed or random effects panelmodel. The basic panel gamma SML stochastic frontier production functionwith heterogeneity in the random error v and the one-sided inefficiency u canbe represented as

yit ¼ f ðxit; bÞ þ vit � uit or yit ¼ f ðxit; bÞ � uit þ vit

s2uit ¼ expðd0zitÞ

s2vit ¼ expðd0zitÞ ð5Þ

where i ¼ 1; . . . ;N cross-section observations (in our case 48) and t ¼1; . . . ;T (in our case 44) number of years, y is the vector output with NT� 1,x is a vector of inputs with NT�K, and b is K� 1 with K being the numberof explanatory variables used in the production function.

Let us consider a one-way error disturbance gamma SML stochasticfrontier production function

yit ¼ f ðxit; bÞ þ vit � uit with vit ¼ mi þ �it



where the random error for one-way random effects model can berepresented as vit ¼ mi þ �it with mi representing the temporally invariantcross-section or spatial effect and �it representing the remaining randomerror.

If mi representing individual cross-sectional units are assumed to befixed, a one-way fixed effects gamma SML stochastic frontier productionfunction with heterogeneity in the random error vit and the one-sided


inefficiency uit can be written as

yit ¼ f ðxit; b; Zl; miÞ � uit þ �it



where Zl is a vector of individual cross-sectional dummies and mi is theassociate parameters of the cross-sectional dummies.

An alternative to the estimation of too many parameters (dummies) is totreat mi as random, which leads to one-way random effects model. The one-way random panel gamma SML stochastic frontier production functionwith heterogeneity in the random error vit and the one-sided inefficiency uitcan be represented as

yit ¼ f ðxit; bÞ � uit þ mi þ �it



where mi is the temporally invariant spatial error, normally distributed withmean zero and variance s2m, �it is the remaining random error which isnormally distributed with mean zero and variance s2� , and mi is independentof eit. Further, xit is independent of mi and eit for all i and t.

Similarly, the two-way error gamma SML stochastic frontier productionfunction with heterogeneity in the random error, vit, and the one-sidedinefficiency, uit, can be represented as

yit ¼ f ðxit; bÞ þ vit � uit; with vit ¼ mi þ lt þ �it



where mi represents the temporally invariant cross-section or spatial effect, ltrepresents the spatially invariant time-series or temporal effect, and eitrepresents the remainder random error.

If mi and lt, representing individual cross-sectional and time-series units,respectively, are assumed to be fixed, a two-way fixed effects gamma SMLstochastic frontier production function can be written as

yit ¼ f ðxit; b; Zl; mi;Zk; ltÞ � uit þ �it (10)

where Zl is a vector of individual cross-sectional dummies and mi is a vectorof associate parameters of the cross-sectional dummies, Zk is a vector of


individual time-series dummies, and lt is the associate parameters of thetimes-series dummies.

Similarly, one can assume that mi and lt are random – leading to a two-way random effects model. The two-way random panel gamma SMLstochastic frontier production function with heterogeneity in the randomerror, vit, and the one-sided inefficiency, uit, can be represented as

yit ¼ f ðxit; bÞ � uit þ mi þ lt þ �it



where mi is temporally invariant spatial error and mi � Nð0;smÞ, lt isspatially invariant temporal error and lt � Nð0;slÞ, and mi, lt, and eit areindependent. Further, xit is independent of mi, lt, and �it for all i and t.

2.3. Alternative Two-Way Panel Estimatorsof Gamma SML Stochastic Frontier Models

The two-way random effects stochastic frontier production function withheterogeneity in the random error, vit, and the one-sided inefficiency, uit, canbe represented as

yit ¼ f ðxit; bÞ � uit þ vit



where uit ¼ ðu11; . . . ; u1T ; u21; . . . ; u2T ; . . . . . . ; uN1; . . . uNT Þ, vit ¼ v11; . . . ; v1T ;ð

v21; . . . ; v2T ; . . . ; vN1; . . . vNT Þ, vit ¼ Zl mi þ Zk lt þ Z��it and

Zl ¼ ðIN � iT Þ; m0 ¼ ðu1; u2; . . . ; uNÞ

Zk ¼ ðIT � iNÞ; l0 ¼ ðl1; l2; . . . ; lT Þ,

Z� ¼ ðIN � ITÞ; �0 ¼ ð�1; �2; . . . ; �NT Þ

where IN and IT (iN and iT ) represent identity matrices (vector of ones) of Nand T dimensions (T and N dimensions) respectively, and represent the


random error components with zero mean and covariance matrix:

E

m

l

�

0B@

1CA m0 l0 �0� �

¼

s2mIN 0 0

0 s2lIT 0

0 0 s2� INT

0BB@

1CCA (13)

The error variance–covariance (O) of the gamma SML stochastic frontierproduction function can be represented as:

O � s2v ¼ s2m ðIN � iT Þ þ s2lðIT � iNÞ þ s2� ðIN � IT Þ (14)

Finally, to estimate the two-way error component gamma SML stochasticfrontier model with heterogeneity in the random error, vit, and the one-sidedinefficiency, uit, we need to transform Eq. (12):

y�it ¼ f ðx�it; bÞ � uit þ vit

s2uit ¼ expðd0z�itÞ

s2vit ¼ expðd0z�itÞ ð15Þ

where y�it ¼ O�1=2 yit; x�it ¼ O�1=2xit with O defined in Eq. (14), or

y�it ¼ yit � y1yi: � y2y:t þ y3y::

where

y1 ¼ 1�s�

j1=22

!; y2 ¼ 1�

s�

j1=23

!; and y3 ¼ y1 þ y2 þ

s�

j1=24

!� 1

yi:, y:t, and y:: in the above equation represent the cross-section, time-series,and the overall mean of the variable and are computed as yi: ¼

PTt¼1yit=T ,

y:t ¼PN

n¼1yit=N, and y:: ¼PN

n¼1

PTt¼1yit=NT , respectively. The phi’s

j2 ¼ Ts2m þ s2� , j3 ¼ Ns2l þ s2� and j4 ¼¼ Ts2m þNs2l þ s2� are obtainedvariances of between cross-section, between time-period, and within cross-section time-period errors. The phi’s used in the computation of the thetas(y1, y2, and y3) can be estimated by: (1) using residuals estimated from thepooled gamma SML stochastic frontier model proposed by WH approach;or (2) using residuals estimated from the within gamma SML stochasticfrontier model proposed by AM approach; or (3) using the residualsestimated from within, between cross-section and between time-seriesgamma SML stochastic frontier model proposed by SA approach.


2.3.1. Swamy–Arora (SA) EstimatorThe Swamy–Arora transformation involves the use of residuals obtainedfrom within, between time-series, and between cross-section gamma SMLstochastic frontier models to estimate following three models.

Within Gamma SML Stochastic Frontier Production Function Model.Consider within gamma SML stochastic frontier productionfunction with heterogeneity in the random error, vit, and the one-sidedinefficiency, uit

~yit ¼ f ð ~xit; bÞ � uit þ vit

s2uit ¼ expðd0 ~zitÞ

s2vit ¼ expðd0 ~zitÞ ð16Þ

where

~yit ¼ yit Q1

¼ yit ðIN � IT Þ � ðIN � iT Þ � ðIT � iNÞ þ ðiN � iT Þ½ �,

¼ yit � yi: � y:t þ y::

and yi:, y:t, and y:: are defined in Eq. (15) and Q1 ¼ ðIN � IT Þ�½

ðIN � iT Þ � ðIT � iNÞ þ ðiN � iT Þ�.Consistent estimates of within variance error (sm) can be obtained using

within error ~vit ¼ ~yit � ~xit b estimated from within SML stochastic frontierproduction function model SFPF

s2� �~v0itQ1 ~vittraceðQ1Þ

(17)

Between Cross-Section Gamma SML Stochastic Frontier ProductionFunction. Similarly, consider a between cross-section gamma SMLstochastic frontier production function with heterogeneity in the randomerror, vi; and the one-sided inefficiency, ui;

�yi ¼ f ð �xi;bÞ � ui þ vi

s2ui ¼ expðd0 �ziÞ

s2vi ¼ expðd0 �ziÞ ð18Þ


where

�yi ¼ yitQ2

¼ yit ðIN � IT Þ � ðIN � iT Þ½ �

¼ yit � yi:

and yi: and y:: are defined in Eq. (15) and Q2 ¼ ðIN � IT Þ � ðIN � iT Þ½ �.Consistent estimates of between cross-section variance error (sl) can be

obtained using cross-section error �vi ¼ �yi � �xi b estimated from withincross-section SML stochastic frontier production function model SFPF

s2m ��v0iQ2 �vi

traceðQ2Þ(19)

Between Time-Series Gamma SML Stochastic Frontier ProductionFunction. Finally, consider a between time-series gamma SML stochasticfrontier production function with heterogeneity in the random error, vt, andthe one-sided inefficiency, ut,

�yt ¼ f ð �xt; bÞ � ut þ vt

s2ut ¼ expðd0 �ztÞ

s2vt ¼ expðd0 �ztÞ ð20Þ

where

�yt ¼ yitQ3

¼ yit ðIN � IT Þ � ðIT � iNÞ½ �

¼ yit � y:t

and yt and y:t are defined earlier in the text and Q3 ¼ ðIN � IT Þ�½ ðIT � iNÞ�.Consistent estimates of between time-series variance error (sl) can be

obtained using within error �vt ¼ �yt � �xtb estimated from within time-seriesSML stochastic frontier production function model SFPF

s2l ��v0t Q3 �vt

traceðQ3Þ(21)

2.3.2. Wallace–Hussain (WH) EstimatorThe Wallace–Hussain transformation involves the use of residuals frompooled gamma SML stochastic frontier model with heterogeneity in the


random error, vit, and the one-sided inefficiency, uit, as defined below.

~yit ¼ f ð ~xit; bÞ � uit þ vit

s2uit ¼ expðd0 ~zitÞ

s2vit ¼ expðd0 ~zitÞ ð22Þ

Consistent estimates of within error variance can be obtained from theresiduals of the pooled gamma SML stochastic frontier model

s2� �~v0itQ1 ~vittraceðQ1Þ

(23)

The between cross-section error variance can be obtained from theresiduals of the pooled gamma SML stochastic frontier model

s2m �~v0itQ1 ~vittraceðQ2Þ

(24)

and the between time-series error variance can be obtained from theresiduals of the pooled gamma SML stochastic frontier model

s2l �~v0it Q1 ~vittraceðQ3Þ

(25)

2.3.3. Amemiya (AM) EstimatorThe Amemiya transformation involves the use of residuals from withingamma SML stochastic frontier model with heterogeneity in the randomerror, v and the one-sided inefficiency, u, as defined below.

yit ¼ f ðxit; bÞ � uit þ vit



Consistent estimates of within error variance can be obtained from theresiduals of the within gamma SML stochastic frontier model

s2� �v0it Q1 vit

traceðQ1Þ(27)

The between cross-section error variance can be obtained from theresiduals of the within gamma SML stochastic frontier model

s2m �v0it Q1 vit

traceðQ2Þ(28)


and the between time-series error variance can be obtained from theresiduals of the within gamma SML stochastic frontier model

s2l �v0it Q1 vit

traceðQ3Þ(29)

The empirical application of the above concept and methods is simple andstraightforward. Further, it is easy to estimate two-way random effectsgamma SML stochastic frontier production function by using the frontiermodule of LIMDEP package:

Step 1: Estimate pooled Eq. (26), within Eq. (16), between cross-sectionEq. (18), and between time-series system Eq. (20) gamma SML stochasticfrontier production function models with heterogeneity in the random error,vit, and the one-sided inefficiency, uit, using standard LIMDEP package.Specifically, estimate gamma SML stochastic frontier model of yit onxit and zit for pooled, ~yit on ~xit and ~zit for within, and �yi on �xi and �zi ( �yt on�xt and �zt) for between cross-section (between time-series).

Step 2: Compute estimates of s�, sm, and sl as described in Eqs. (17, 19,and 21), respectively, for SA estimator; Eqs. (23, 24, and 25), respectively,for WH estimator; and Eqs. (27, 28, and 29), respectively, for AM estimator.

Step 3: Obtain the error variance s�, sm, and sl to develop j2j3, and j4

to estimate the thetas, y1; y2; y3 in order to transform the output and inputvariables, y�it and x�it ; respectively see Eq. (15) for each of the three alternativeestimators (SA, WH, and AM), i.e., y�it ¼ yit � y1yi:� y2y:t þ y3y::.

Step 4: Finally, estimate the gamma SML stochastic frontier model withheterogeneity in the random error, v, and the one-sided inefficiency, u, onthis transformed model as represented in Eq. (15).

3. DATA AND VARIABLES USED IN THE ANALYSIS

The U.S. Department of Agriculture’s Economic Research Service (ERS)constructs and publishes the state and aggregate production accountsfor the farm sector.4 The features of the state and national productionaccounts are consistent with gross output model of production and are welldocumented in Ball, Gollop, Kelly-Hawke, and Swinand (1999). Output isdefined as gross production leaving the farm, as opposed to real value added(quantity index, base 1,960 ¼ 100). Price of land is based on hedonic


regressions. Specifically the price of land in a state is regressed againstland characteristics and location (state dummy). Ball et al. (1999) points outthat the land characteristics are obtained from climatic and geographic datacontained in State Soil Geographic (STATSGO) database (USDA).In addition, a ‘‘population accessibility’’ index for each country is used toestimate the price of land. The indexes are derived from a gravity model ofurban development, which provides measures of accessibility to populationconcentrations. The index increases as population increases and/or distancefrom population center decreases. Construction of the population accessi-bility index is calculated on the basis of population within a 50-mile radiusof each parcel. Prices of capital inputs are obtained on investment goodsprices, taking into account the flow of capital services per unit of capitalstock in each state (Ball, Bureau, Butault, & Nehring, 2001). All inputs arequantity index, with 1,960 ¼ 100.

The financial exposure variables are defined as follows and availablefrom U.S. Department of Agriculture’s Economic Research Service.5

Financial solvency is defined as the ratio of total farm debt to totalfarms assets. It measures debt pledged against farm business assets,indicating overall financial risk. Financial liquidity is defined as the ratioof interest plus principal payments to gross cash farm income and itmeasures the share of the farm business’s gross income needed to servicethe debt. Finally, financial efficiency is defined as the ratio of gross cashfarm income to farm business assets and it measures the gross farm incomegenerated per dollar of farm business assets. Table 1 presents the summarystatistics of the output, inputs, and variables that measure financialexposure.

4. EMPIRICAL APPLICATION AND RESULTS

To evaluate the importance of accounting for panel data using alternativepanel estimators – SA, WH, and AM – we use a generalized least squaresprocedure involving the estimation of the variance components in the firststage of gamma SML stochastic frontier model with heterogeneity in therandom and one-sided inefficiency. Specifically, pooled Eq. (5), withinEq. (16), between cross-section Eq. (18), and between time-series Eq. (20)gamma SML stochastic production frontier models with heterogeneity inthe random and one-sided inefficiency are estimated. In the second stage,we use the estimated variance–covariance matrix to transform the data asdefined in Eq. (15). The SA panel estimator uses the errors estimated from


within, between cross-section, and between time-series models. In contrast,the WH and AM panel estimator uses the errors estimated from the pooledand within models, respectively.

To examine the importance of short and long variation in financialexposure on efficiency and productivity variance, gamma SML stochasticfrontier model with heterogeneity in the random error, v, and the one-sidedinefficiency, u, as defined in Eq. (5) is estimated. Second, the SA and AMpanel gamma SML stochastic frontier production function with hetero-geneity in the random error, v; and the one-sided inefficiency, u; as definedby Eq. (15), is used to estimate variance in financial exposure variables.

The output and inputs in the production function equation are estimatedusing the logs of the variables and the variance in solvency, liquidity,and efficiency – financial exposure variables – while heteroskedasticity ininefficiency and productivity variance function is estimated in levels.Gamma SML stochastic frontier analysis of the production function withheteroskedasticity is estimated following Greene (2007).

We specified a Cobb–Douglas functional form for the pooled, time-series,cross-section, SA, and AM panel gamma SML stochastic frontier models

Table 1. Summary Statistics of Output, Input, and Financial ExposuresVariables of U.S. Agriculture Section, 1961–2003.

Mean SD Minimum Maximum

Output (quantity index, 1,960 ¼ 100) 140.99 46.9772 59.5198 336.103

Capital (quantity index, 1,960 ¼ 100) 108.141 28.026 39.381 219.242

Land (quantity index, 1,960 ¼ 100) 80.212 17.192 33.868 104.955

Labor (quantity index, 1,960 ¼ 100) 59.003 21.430 14.388 134.597

Chemicals (quantity index, 1,960 ¼ 100) 227.881 209.809 28.818 2901.580

Energy (quantity index, 1,960 ¼ 100) 118.530 31.162 51.795 322.732

Materials (quantity index, 1,960 ¼ 100) 130.013 46.022 41.752 380.328

Measures of financial exposure

Liquidity: debt servicing ratio 0.170 0.065 0.050 0.480

Liquidity debt servicing – long-run riska 0.038 0.022 0.000 0.112

Liquidity debt servicing – short-run riskb 0.018 0.014 0.000 0.095

Solvency: debt-to-asset ratio 15.876 4.199 3.533 31.640

Solvency debt to asset – long-run risk 2.100 0.954 0.007 5.073

Solvency debt to asset – short-run risk 1.021 0.778 0.007 6.037

Efficiency: asset turnover ratio 20.620 7.361 6.700 65.720

Efficiency asset turnover ratio – long-run risk 2.810 1.864 0.057 12.176

Efficiency asset turnover ratio – short-run risk 1.525 1.188 0.057 9.589

aLong-run risk is defined as the cumulative standard deviation of the financial variables.bShort-run risk is defined as a 5-year moving standard deviation of the financial variables.


with heterogeneity in the random error, v, and the one-sided inefficiency, u.The long- and short-run variance of financial exposure variables werespecified in the inefficiency and productivity heteroskedasticity variancefunction. Further, we estimated the SML function with 50 Halton draws forall the models. The Cobb–Douglas functional form with heteroskedasticitywas specified as:

Outputit ¼ b0 þ b1Capitalit þ b2 Landit þ b3 Laborit þ b4 Chemicalsit

þ b5Energyit þ b6Materialsit þ b7Yearþ �it

s2u ¼ g1;uLR riskit þ g2;uSR riskit

s2v ¼ g1;vLR riskit þ g2;vSR riskit ð30Þ

where LR risk is the long-run risk and defined as the cumulative standarddeviation of the financial exposure variable, SR risk is the short-run riskand, defined as a 5-year moving standard deviation of the financial exposurevariable.

4.1. Pooled Gamma SML Stochastic Frontier Models

Parameter estimates of the pooled gamma SML stochastic frontierproduction function are presented in Table 2. In addition to showing thevariables related to production function, the table also demonstrates theimpact of short- and long-run variations in financial – solvency, liquidity,and efficiency – risk variables due to the heterogeneity in the random error vand the one-sided inefficiency u specification related to production function.Each of these variables measure a facet of financial risk faced in agriculture.By including them in the production function, we can easily assess theimpact of short- and long-run variations in financial exposure variables, inthree different ways, on efficiency and productivity, after accounting fortechnology change. Recall that the independent and dependent variables arein logarithms, hence, the coefficients represent elasticity of endogenousvariable with respect to exogenous variation.

Results from Table 2 suggest that in the case of pooled gamma SML,all three models of variations in financial exposure (solvency, liquidity,and efficiency) perform equally well. The year, a proxy for technology, ispositively related to agricultural output and returns to scale ranged from0.91 for financial liquidity model to as low as 0.84 for financial efficiencymodel, with the inclusion of technology.6 The theta, P, and sigma (v) werepositive and significant. In each case, the input variables in the production


function are all positive and statistically significant at the 1 percent levelof significance. An interesting finding here is that, in all three models(measuring financial risk in different ways), the estimated coefficients arevery similar. Results indicate an input elasticity of materials ranging from0.46 in the financial liquidity model to 0.42 in the financial efficiency model(Table 2), indicating that a 100 percent increase in the use of material inputincreases output by 46, 45, and 42 percent, respectively.

The second factor with a significant impact on agricultural production inthe United States is energy. Results in Table 2 indicate an input elasticityof energy ranging from 0.14 in financial solvency and efficiency models to0.12 in the financial liquidity model (Table 2), indicating that a 100 percentincrease in the use of energy would increase the output by 14 percent in thefinancial solvency and efficiency models. Meanwhile, a 100 percent increasein energy input would increase agricultural output by 12 percent in thefinancial liquidity model.

Land input elasticity ranges from 0.13 percent for financial liquiditymodel to about 0.09 for the financial solvency and efficiency model. Land

Table 2. Pooled Gamma SML Stochastic Frontier Production FunctionResults for Solvency, Liquidity, and Efficiency Financial Variables.

Financial Solvency Financial Liquidity Financial Efficiency

Coefficient P[|Z|Wz] Coefficient P[|Z|Wz] Coefficient P[|Z|Wz]

Constant �21.4854 o0 �24.5737 o0 �21.4817 o0

Capital 0.0365 0.0087 0.05892 o0 0.0517 0.0001

Land 0.0930 o0 0.12895 o0 0.0843 o0

Labor 0.0717 o0 0.08458 o0 0.0744 o0

Chemicals 0.0609 o0 0.05044 o0 0.0648 o0

Energy 0.1361 o0 0.11983 o0 0.1371 o0

Materials 0.4473 o0 0.4551 o0 0.4167 o0

Year 0.0113 o0 0.01279 o0 0.0114 o0

Theta 18.9495 o0 48.1545 0.0006 18.9551 o0

Shape parameter (P) 1.0483 0.001 1.59415 o0 1.0046 0.0005

Sigma (v) 0.0835 o0 0.06904 o0 0.0820 o0

Theta (1/u)

Long-run risk �0.0014 0.9876 12.2275 0.0011 �0.0188 0.7637

Short-run risk �0.1927 0.0219 7.88384 0.0499 �0.0613 0.3353

Productivity

Long-run risk 0.1507 o0 12.9806 o0 0.0978 o0

Short-run risk �0.0200 0.415 �3.01444 0.1166 �0.0008 0.9628


ranks third with respect to the magnitude of contributions to agriculturaloutput. Farm labor with an elasticity of 0.072, 0.085, and 0.074 for liquidity,solvency, and efficiency models, respectively. Finally, the impact of capitaland chemicals on agricultural output are somewhat similar in all models.In the case of chemicals the elasticity ranges from 0.04 to 0.06 percent, whilefor capital input the elasticity ranges from 0.01 to 0.05 percent (Table 2).

With regard to short- and long-run variations in financial exposurevariables, our analysis shows some conflicting results. For example, theshort-run financial solvency risk (variability in debt-to-asset ratio) variablein theta or inverse of inefficiency variance function is negative and signifi-cant at 2 percent level of significance. Accordingly, an increase in thevariation of financial solvency (debt-to-asset ratio) decreases the variationin theta or inverse of inefficiency variance in the short-run. A possibleexplanation is that more indebted farmers are higher costs farmers, andhence more technically inefficient. This finding is consistent with the‘‘Agency costs’’7 theory proposed by Nasr, Barry, and Ellinger (1998).

In contrast, short-run financial liquidity risk variable has a positive andsignificant impact on theta or inverse of inefficiency variance. The positivesign indicates that short-run variation in financial liquidity (debt servicingratio) would increase the variation in theta or inverse or inefficiencyvariance. Findings here support the ‘‘Free Cash Flow’’ hypothesis, whichpostulates that excess cash flows encourage managerial laxness, whichtranslates into technical inefficiency. Finally, in all three models of financialrisk the sign on long-run risk variable coefficient is positive and statisticallysignificant at the 1 percent level of significance. Results indicate thatregardless of the measure of long-run financial risk, variation in financialrisks would increase the variability in agricultural productivity (Table 2).

4.2. Panel Gamma SML Stochastic Frontier Models

In exploring alternatives to the pooled gamma SML frontier productionfunction we now switch to estimating various panel gamma SML stochasticfrontier production functions with heterogeneity in the random error, v, andthe one-sided inefficiency, u. In particular, we estimate panel gamma SMLstochastic frontier production functions using SA, WH, and AM two-wayrandom effects panel estimators. Due to the wrong skewness, we do notpresent results of the WH model. Parameter estimates of the panel gammaSML stochastic frontier production function (SA and AM) are presented inTables 3 and 4, respectively.


Table 3 presents result from the SA alternative panel SML stochasticproduction frontier model with heterogeneity in the random error, v and theone-sided inefficiency, u. In addition to usual input factors, we also assessthe impact of short- and long-run variations in financial exposure variableson technical efficiency and productivity variance function. Comparing theoverall result between pooled and panel gamma SML stochastic frontierproduction function indicate that the coefficients of the inputs are roughlythe same in case of material inputs. However, significant differences existin all other inputs. In the case of variations in financial exposure variables(both short- and long-run) – theta and productivity variations – results showthat in some cases the signs on the coefficient are reversed or the magnitudeof the coefficient becomes larger. For instance, the coefficient of short-runfinancial liquidity risk increases from �3.01 and insignificant in pooledanalysis (Table 2) to �3.46 and significant in panel analysis (Table 3).This may be due to the accounting of spatial, temporal, and within residualvariation used to transform the data in the SA model.

Table 3. Swamy–Arora Alternative Panel Gamma SML StochasticFrontier Production Function Results for Solvency, Liquidity, and

Efficiency Financial Variables.



Constant �23.9604 o0 �17.8617 o0 �23.9168 o0

Capital 0.0152 0.2329 0.0523 0.0001 �0.0057 0.6451

Land 0.1331 o0 0.14879 o0 0.1464 o0

Labor 0.0973 o0 0.09633 o0 0.1147 o0

Chemicals 0.0656 o0 0.05383 o0 0.0919 o0

Energy 0.0727 o0 0.06054 o0 �0.0158 0.2448

Materials 0.4549 o0 0.46029 o0 0.4639 o0

Year 0.0124 o0 0.01324 o0 0.0121 o0

Theta 16.2376 o0 48.6613 o0 23.1985 o0

Shape parameter (P) 0.8436 0.0001 1.92584 o0 0.8217 o0

Sigma (v) 0.0575 o0 0.06144 o0 0.0635 o0

Theta (1/u)

Long-run risk 0.0482 0.538 11.7154 0.0005 �0.0470 0.2483

Short-run risk �0.1997 0.0105 6.60034 0.0613 �0.0752 0.1577

Productivity

Long-run risk 0.2449 o0 15.0019 o0 0.0646 0.0008

Short-run risk 0.0306 0.2897 �3.45814 0.0706 0.0184 0.2717


Results in Table 3 suggest that year, as a proxy for technology, ispositively related to agricultural output. The output returns to scale, rangedfrom 0.89 for financial liquidity model to as low as 0.81 for the financialefficiency model, with the inclusion of technology.6 The theta, P, and sigma(v) were all positive and significant. Results also indicate that input variablesin production are all positive and significantly related to the output, with theexception of energy and capital factors in the financial efficiency model(column 6, Table 3). The production function results are consistent withproduction theory, i.e., an increase in the quantity of input leads to anincrease in the quantity of output produced.

The results from the SA panel model indicate about 0.46 input elasticityfor materials in all the three models. The elasticity is relatively higher to theother inputs, indicating that a 100 percent increase in material inputs wouldincrease the output by 46 percent. The coefficient on land input is about0.13–0.14, depending on the financial risk measure used in the model. Itshould be noted that land input ranks second with respect to the magnitude

Table 4. Amemiya Alternative Panel Gamma SML Stochastic FrontierProduction Function Results for Solvency, Liquidity, and Efficiency

Financial Variables.



Constant �21.4854 o0 �19.1302 o0 �21.4877 o0

Capital 0.0366 0.0085 0.05695 o0 0.0507 0.0001

Land 0.0931 o0 0.07576 o0 0.0708 0.0003

Labor 0.0717 o0 0.07525 o0 0.0768 o0

Chemicals 0.0609 o0 0.08339 o0 0.0675 o0

Energy 0.1359 o0 0.15157 o0 0.1360 o0

Materials 0.4474 o0 0.49155 o0 0.4166 o0

Year 0.0113 o0 0.00991 o0 0.0114 o0

Theta 18.9495 0.0035 34.6749 o0 18.9454 0.0011

Shape parameter (P) 1.0494 0.1241 1.0001 o0 1.0607 0.1483

Sigma (v) 0.0835 o0 0.0646 o0 0.0794 o0

Theta (1/u)

Long-run risk �0.0016 0.9862 �21.2496 o0 0.0209 0.6925

Short-run risk �0.1930 0.0331 38.9859 o0 �0.0946 0.0616

Productivity

Long-run risk 0.1508 o0 15.4 o0 0.0979 o0

Short-run risk �0.0202 0.4085 �0.29185 o0 0.0036 0.8392


of contributions to agricultural output, indicating that a 100 percentincrease in land input increases agricultural output by about 13–14 percent.Farm labor with an elasticity of about 0.10 and capital input with anelasticity of 0.02 are much smaller compared to materials or land input,indicating that labor and capital inputs have a smaller positive influence onagricultural output.

When comparing the impact of variations in financial exposure variableson technical efficiency and productivity in case of panel models, the findingsare similar to those obtained in pooled analysis (Table 2). The short-runfinancial solvency risk (variability in debt-to-asset ratio) variable in theta orinverse of inefficiency variance function is negative and significant at1 percent level of significance. This indicates that an increase in thevariation of financial solvency decreases the variation in theta or inverseof inefficiency variance in the short-run. A possible explanation is that moreindebted farmers are higher costs farmers, and hence more technicallyinefficient. This finding is consistent with the agency costs theory proposedby Nasr et al. (1998). In contrast, the short-run financial liquidityrisk variable has a positive and significant impact on theta or inverse ofinefficiency variance. The positive sign indicates that, short-run variation infinancial liquidity (debt serving ratio) would increase the variation in thetaor inverse of inefficiency variance.

The final two rows in Table 3 assess the impact of long-run and short-runvariations in financial exposure variables on productivity in case of panelgamma SML stochastic frontier functions. The results in Table 3 show thatin all three models the impact of long-run variation in financial exposure ispositive and statistically significant at the 1 percent level of significance.Findings here support the free cash flow hypothesis, which postulates thatexcess cash flows encourage managerial laxness, which translates intotechnical inefficiency. Finally, the coefficient on short-run variation infinancial exposure is negative and significant for the financial liquidity model,indicating that an increase in the short-run variation of the financial liquidity(debt servicing ratio) would lead to a decrease in the variation of productivity.

Table 4 presents the results of the Amemiya alternative panel gammaSML stochastic frontier production function. The parameter estimates ofinputs (land, labor, capital, etc.) are similar to those obtained in Table 2(pooled analysis) because the errors used in the transformation of thedata were obtained from the pooled model in the first stage. However,when considering the impact of short- and long-run variations in financialexposure on theta and productivity variation, the results are similar to the


SA panel model. For example, short-run financial risk, in case of financialsolvency and liquidity, has a negative and positive significant effect on thetaor inverse of inefficiency, respectively. Additionally, long-run variations infinancial exposure in all models have a positive and significant impact onproductivity variation. Year, as a proxy for technology, is positively relatedto agricultural output and returns to scale, ranging from 0.94 for financialliquidity model to as low as 0.83 for financial efficiency model, with theinclusion of technology.6 The theta and sigma (v) are positive and significant(Table 4). Finally, production function results are consistent withproduction theory, i.e., an increase in the quantity of input leads to anincrease in quantity of output produced.

Interestingly, results in Table 4 show that short-run financial efficiencyrisk (asset turnover ratio) variable in theta or inverse of inefficiency isnegative and significant. This indicates an increase in the variability offinancial efficiency leads to a decrease in the variation of theta or inverse ofinefficiency measure. It is highly likely that variability in gross farm incomeis the driving force behind the variability in asset turnover ratio. Highervariability would be associated with higher variance in production. It hasbeen argued that more efficient farmers have higher asset turnover ratio.According to Nasr et al. (1998), lenders like to advance funds to ‘‘low-cost’’(technically efficient) farmers. Under this hypothesis, we expect efficientfarmers to have higher debt. Therefore, any fluctuations in the gross farmrevenue would have a negative impact on variation in technical efficiency.Finally, results in Table 4 show that long-run variation in financial exposurein all three models has a positive and significant impact on productivityvariance function. Findings here suggest that an increase in the variation infinancial solvency, liquidity, and efficiency risk would lead to an increase inproductivity variance.

Finally, the estimated theta and sigma (v) for pooled, SA, and AM panelestimators are all significant at 1 percent level of significance. This resultindicates a good fit of the gamma SML pooled and panel model withheteroskedasticity in one-sided and random errors. The shape parameter, pas defined in Eq. (2), was also significant for the pooled, SA, and AM panelestimators with the exception of financial solvency and financial efficiencyin AM panel model. Larger values of p (greater than 1) allow the massof the inefficiency distribution to move away from zero. Results indicatethat for the pooled and most of panel models, the value of p was less thanor close to zero, with the exception of financial liquidity pooled and SApanel models.


5. CONCLUSION

The contribution of the research presented in this chapter is twofold.First, three alternative two-way random effects panel estimators ofthe normal-gamma stochastic frontier model with heterogeneity in therandom error and the one-sided inefficiency is proposed and testedusing simulated maximum likelihood estimation techniques. In particular,we propose a generalized least squares procedure that involvesestimating the variance components in the first stage and then using theestimated variance–covariance matrix to transform the data. The datatransformation involves estimation of the pooled model (Wallace–Hussain estimator); within model (Amemiya estimator); or within, betweencross-sectional residuals and between time-series model (Swamy–Aroraestimator) in the estimation of alternative panel estimators. Second, thestochastic frontier model with heteroskedasticity of a random error term,identified with productivity, and a one-sided error term, identified withinefficiency, is used to examine the importance of short- and long-runvariations in financial risk variables – namely, financial liquidity, solvency,and efficiency.

Empirical estimates indicate differences in the parameter estimates ofgamma distribution, production function, and heterogeneity functionvariables between pooled and the two alternative panel estimators – namelySA and AM estimators. The difference between the pooled and thepanel model suggests the need to account for spatial, temporal, and withinresidual variations as in Swamy–Arora estimator, and within residualvariation in Amemiya estimator within the panel framework. Ourfindings show production increasing with increasing units of inputs.Results from this study indicate that variations in financial exposuremeasures (solvency, liquidity, and efficiency) play an important role intechnical efficiency and productivity. For example, in the case of financialsolvency and financial liquidity risk models, our findings reveal a negativeand positive effect on technical efficiency in the long run and short run,respectively.

Future research could examine the implications of time invariant andspecification variant to gamma SML stochastic frontier production and costfunction models. Further, research could also focus on the robustness of thealternative two-way random effects models with application to farm-leveldata. Compared to aggregate production analysis, individual farm-leveldata results may vary with regard to production of agricultural output andthe impact of financial risk on productivity and efficiency.


NOTES

1. Efficiency concept introduced by Farrell (1957) is defined as the distance of theobservation from the production frontier and measured by the observed output of afirm, state, or country relative to realized output, i.e., output that could be producedif it were 100 percent efficient from a given set of inputs.2. According to Greene (2003), the ‘‘normal-gamma model provides a richer and

more flexible parameterization of the inefficiency distribution in the stochastic frontiermodel than either of the canonical forms, normal-half normal and normal-exponential.’’3. Alternatively, maximum likelihood estimator of the two-way random effects

panel stochastic frontier models can also be presented.4. The data are available at the USDA/ERS website http://www.ers.usda.gov/

data/agproductivity/5. The data are available at the USDA/ERS website http://www.ers.usda.gov/

data/farmbalancesheet/fbsdmu.htm6. However, returns to scale (RTS) were slightly lower when technology was

excluded from the regression.7. Due to asymmetric information and misaligned incentives between lenders and

borrowers, monitoring of borrowers by lenders is implied. Monitoring involvestransactions costs and lenders may pass on the monitoring costs to the farmers in theform of higher interest rates and/or collateral requirements.

ACKNOWLEDGMENTS

The authors wish to thank the participants of the 8th Annual Advances inEconometrics Conference, Nov. 6–8, Baton Rouge, LA, for their usefulcomments and questions. We thank the editor and two anonymous reviewersfor their valuable suggestions that greatly improved the exposition andreadability of the paper. Mishra’s time on this project was supported by theUSDA Cooperative State Research Education & Extension Service, Hatchproject # 0212495 and Louisiana State University Experiment Station project# LAB 93872. Shaik’s time on this project was supported by USDACooperative State Research Education & Extension Service, Hatch pro-ject # 0217864 and North Dakota State University Experiment Stationproject # ND 01397.

REFERENCES

Abramovitz, M. (1956). Resource and output trends in the United States since 1870. American

Economic Review, 46(2), 5–23.

Aigner, D. J., Lovell, C. A. K., & Schmidt, P. (1977). Formulation and estimation of stochastic

frontier production function models. Journal of Econometrics, 6, 21–37.


http://www.ers.usda.gov/data/agproductivity/

http://www.ers.usda.gov/data/agproductivity/

http://www.ers.usda.gov/data/farmbalancesheet/fbsdmu.htm

http://www.ers.usda.gov/data/farmbalancesheet/fbsdmu.htm

Amemiya, T. (1971). The estimation of the variances in a variance-components model.

International Economic Review, 12, 1–13.

Ball, V. E., Bureau, J.-C., Butault, J.-P., & Nehring, R. (2001). Levels of farm sector

productivity: An international comparison. Journal of Productivity Analysis, 15, 5–29.

Ball, V. E., Gollop, F., Kelly-Hawke, A., & Swinand, G. (1999). Patterns of productivity

growth in the U.S. farm sector: Linking state and aggregate models. American Journal of

Agricultural Economics, 81, 164–179.

Battese, G., & Coelli, T. J. (1992). Frontier production functions, technical efficiency and panel

data: With application to paddy farmers in India. Journal of Productivity Analysis, 3,

153–169.

Battese, G., & Corra, G. (1977). Estimation of a production frontier model: With application

for the pastoral zone of eastern Australia. Australian Journal of Agricultural Economics,

21, 167–179.

Cornwell, C., Schmidt, P., & Sickles, R. C. (1990). Production frontiers with cross-sectional and

time-series variation in efficiency levels. Journal of Econometrics, 46(1–2), 185–200.

Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical

Society Series A, 120, 253–290.

Greene, W. H. (1980). Maximum likelihood estimation of econometric frontier functions.

Journal of Econometrics, 13(1), 27–56.

Greene, W. H. (1990). A gamma-distributed stochastic frontier model. Journal of Econometrics,

46(1), 141–164.

Greene, W. H. (2003). Simulated likelihood estimation of the normal-gamma stochastic frontier

function. Journal of Productivity Analysis, 19(2/3), 179–190.

Greene, W. (2004). Distinguishing between heterogeneity and inefficiency: Stochastic frontier

analysis of the World Health Organization’s panel data on National Health Care

Systems. Health Economics, 13(10), 959–980.

Greene, W. (2007). LIMDEP computer program: Version 9.0. Plainview, NY: Econometric

Software.

Jondrow, J., Materov, I., Lovell, K., & Schmidt, P. (1982). On the estimation of technical

inefficiency in the stochastic frontier production function model. Journal of Econo-

metrics, 19, 233–238.

Meeusen, W., & van den Broeck, J. (1977). Efficiency estimation from Cobb–Douglas

production functions with composed error. International Economic Review, 18, 435–444.

Nasr, R. E., Barry, P. J., & Ellinger, P. N. (1998). Financial structure and efficiency of grain

farms. Agricultural Finance Review, 58, 33–48.

Schmidt, P., & Sickles, R. C. (1984). Production frontiers and panel data. Journal of Business

and Economic Statistics, 2(4), 367–374.

Solow, R. M. (1957). Technical change and the aggregate production function. Review of

Economics and Statistics, 39(3), 312–320.

Swamy, P. A. V. B., & Arora, S. S. (1972). The exact finite sample properties of the estimators

of coefficients in the error component regression models. Econometrica, 40, 261–275.

Wallace, T. D., & Hussain, A. (1969). The use of error components models in combining cross-

section and time-series data. Econometrica, 37, 55–72.

Wang, H., & Schmidt, P. (2002). One-step and two-step estimation of the effects of exogenous

variables on technical efficiency levels. Journal of Productivity Analysis, 18, 129–144.


MODELING AND FORECASTING

VOLATILITY IN A BAYESIAN

APPROACH

Esmail Amiri

ABSTRACT

In a Bayesian approach, we compare the forecasting performance offive classes of models: ARCH, GARCH, SV, SV-STAR, and MSSVusing daily Tehran Stock Exchange (TSE) market data. To estimate theparameters of the models, Markov chain Monte Carlo (MCMC)methods is applied. The results show that the models in the fourth andthe fifth class perform better than the models in the other classes.

1. INTRODUCTION

Volatility is a well-known characteristic in many financial time series.Volatility changes over time. In the basic option pricing model introducedby Black and Schools (1973), the returns volatility of the underlying assetis only one of five parameters, which is not directly observable and must beforecasted. Also, in some financial studies, for a more precise estimation ofthe value at risk, an accurate volatility forecast plays an important role.





ISSN: 0731-9053/doi:10.1108/S0731-9053(2010)00000260014

323

dx.doi.org/10.1108/S0731-9053(2010)00000260014

In the literature, different econometric models are suggested to forecastvolatility, among them autoregressive conditional heteroskedasticity (ARCH)and generalized ARCH (GARCH) family have an important role (e.g.,Brooks, 1998; Yu, 2002).

Our aim is to compare the performance of ARCH and GARCH and somestochastic volatility (SV) models for predicting volatility in the Tehran(Iran) stock market in a Bayesian approach. This study contributes to thevolatility forecasting literature in two ways. First, a data set from a stockmarket rarely used in the literature is processed. Second, more SV modelsinto the competing candidates are included.

In Section 2, the data is introduced; Section 3 displays five classes ofvolatility models; Section 4 is devoted to the Bayesian inference; Section 5describes the likelihood of the models and priors; Section 6 presents themethod of volatility forecasting; in Section 7, forecast evaluation measuresare introduced; Section 8 displays the empirical results; and Section 9 isallocated to the conclusion.

2. DATA

We analyze Tehran stock exchange (TSE) market data set. The TSE beganits operations in 1967 and during years 1979 (the beginning of Iran’s Islamicrevolution) to 1989 was inactive. The Iranian government’s new economicreforms and a privatization initiative in 1989 raised the attention to theprivate sector and brought life back to the TSE. TSE, which is a full memberof the World Federation of Exchanges (WFE) and a founding member of theFederation of Euro-Asian Stock Exchanges (FEAS), has been one of theworld’s best performing stock exchanges in recent years (e.g., Azizi, 2004;Najarzadeh & Zivdari, 2006). The government of Iran directly holds 35%of the TSE, while securing another 40% through Iranian pension funds andinvestment companies. Foreign investment accounts for only about 2% ofthe stock market in 2009. The TSE is open for trading 5 days a week fromSaturday to Wednesday, excluding public holidays. As of June 2008, 400companies, with a market capitalization of US$70 billion were listed on TSE.

The sample for this study, consists of 2,398 daily returns over the periodfrom 1 January 1999 to 31 December 2008 (excluding holidays andno-trading days). Returns are defined as the natural logarithm of pricerelatives; that is, yt ¼ logXt=Xt�1, where Xt is the daily capital index.The basic framework is a 5-day trading week, with the markets closing forvarious holidays.

ESMAIL AMIRI324

In the literature, there are different methods to obtain monthly volatilityseries using daily returns. Since we only have a data set at the dailyfrequency, we calculate the volatility in a certain period simply, we take theapproach of Merton (1980) and Perry (1982) and use the following simpleformula:

s2T ¼XNT

t¼1

y2t (1)

where yt is the daily return on day t and NT is the number of trading days inmonth T.

Figs. 1 and 2 plot the daily stock index and the daily return series,respectively. Figs. 1 and 2 also exhibit a trend and volatility clustering,respectively.

In total, we have 120 monthly volatilities. Fig. 3 plots the series. FromFig. 3, two particularly volatile periods can be easily identified. The firstone corresponds to the 2003 crash while the second one occurred in 2008,

Days of years 1999-2009

TS

E

0 500 1000 1500 2000

2000

4000

6000

8000

1000

012

000

1400

0

1999 Jan 2009 Jan

Fig. 1. Plot of the Daily TSE Stock Index.

Modeling and Forecasting Volatility in a Bayesian Approach 325

Days of years 1999-2009

TS

E-R

etur

n

0 500 1000 1500 2000

-0.0

4-0

.02

0.0

0.02

0.04

1999 Jan 2009 Jan

Fig. 2. Plot of the Daily TSE-Return Series.

Months form 1999-2009

vola

tility

0 20 40 60 80 100 120

0.0

0.00

20.

004

0.00

6

1999 Jan 2009 Jan

Fig. 3. Plot of the Monthly Volatility of TSE from 1999 to the end of 2008.

ESMAIL AMIRI326

the period of the world financial crises. Table 1 presents a descriptive pictureof the volatility time series.

Table 1 also presents that the autocorrelation function (ACF) of volatilityis decreasing, to test for stationarity, the KPSS (the KPSS test, is due toKwiatkowski, Phillips, Schmidt, & Shin, 1992) statistic is calculated (Zivot &Wang, 2006). The KPSS statistic for the entire sample is 0.1887, which issmaller than 99% quantile, 0.762. Therefore, the null hypothesis that thevolatility series is stationary is accepted at the 1% level.

After obtaining the monthly volatility series, the forecasting horizonhas to be chosen. In this study, 1-month ahead forecasts are chosen.Furthermore, a period has to be chosen for estimating parameters anda period for predicting volatility. The first 8 years of data are used tofit the models. Thus, the first month for which an out-of-sample forecast isobtained is January 2007. As the sample is rolled over, the models arere-estimated and sequential 1-month ahead forecasts are made. Hence,in total 24 monthly volatilities are forecasted. Furthermore, when the KPSSstatistics is calculated for the first 8 years and last 2 years, the values arefound to be 0.2283 and 0.4622, respectively. Again, for both series, the nullhypotheses that the volatility series is stationary is accepted at the 1% level.

3. VOLATILITY MODELS

We apply four classes of volatility models to TSE-return time series andevaluate their forecast performance. The first and second class areknown as observation-driven and the two other classes are calledparameter-driven. In the first two classes, we examine ARCH and GARCHmodels and in the last two classes, we study SV models with different stateequations.

In the following, yt is the return on an asset at time t ¼ 1; . . . ;T , fetg isindependent Gaussian white noise processes, and ht ¼ log s2t .

Table 1. Summary Statistics of the Monthly Volatility of TSE.

Mean Median Max. Kurt. r1

0.0005068 0.000239 0.0075356 27.39 0.5673

r2 r3 r4 r5 r60.2256 0.1681 0.0965 0.0496 0.0229


3.1. ARCH Models

Based on Engle (1982), the ARCH(p) model is defined as

yt ¼ mþ stet

s2t ¼ xþPpi¼1

aiðyt�i � mÞ2 (2)

where s2t is volatility at time t, x and ai(i¼ 1,y, p) are the parameters ofdeterministic volatility model.

3.2. GARCH Models

Class of GARCH models builds on the fact that the volatility is time varyingand persistent, and current volatility depends deterministically on pastvolatility and the past-squared returns. GARCHmodels are easy to estimateand quite popular, since it is relatively straight forward to evaluate thelikelihood function for this kind of models. This type of model is extendedby Bollerslev (1986). A GARCH (p,q) model is defined as

yt ¼ mþ stets2t ¼ xþXpi¼1

aiðyt�i � mÞ2 þXqj¼1

bjs2t�j (3)

where given the observation up to time t� 1, the volatility s2t at time tis deterministic, once the parameters x; ai;bjði ¼ 1; . . . ; p; j ¼ 1; . . . ; qÞ areknown.

3.3. Stochastic Volatility Models

Taylor (1986) originally introduced the SV model. For the class of SVmodels, the innovations to the volatility are random and the volatilityrealizations are therefore unobservable and more difficult to be discoveredfrom data. Therefore, the characteristic distinguishing SV models fromGARCH models is the presence of an unobservable shock component inthe volatility dynamics process. Exact value of volatility at time t cannot beknown even if all past information is employed to determine it. As moreinformation becomes available, the volatility in a given past period couldbe better evaluated. Both contemporaneous and future information thuscontribute to learning about volatility. In contrast, in the deterministic

ESMAIL AMIRI328

setting of the simple GARCH volatility process, the volatility in a certaintime period is known, given the information from the previous period(Rachev, Hsu, Bagasheva, & Fabozzi, 2008).

The following log-normal SV model is well known in the SV literature(e.g., Shephard, 1996):

yt ¼ stet ¼ e0:5htetht ¼ xþ dht�1 þ sZZt (4)

where yt is the return on an asset at time t ¼ 1; . . . ;T . fetg and fZtg areindependent Gaussian white noise processes, sZ is the standard deviation ofthe shock to ht, and ht has a normal distribution. x and d are volatility modelparameters. However, it is impossible to write the likelihood function of SVmodels in a simple closed form expression. Estimating an SV model involvesintegrating out the hidden volatilities.

3.4. Stochastic Volatility Models with STAR Volatility

Different models have been proposed for generating the volatility sequenceht in the literature (Kim, Shephard, & Chib, 1998).

In a two-regime self-exciting threshold autoregressive (SETAR) model,the observations ht are generated either from the first regime when ht�d issmaller than the threshold, or from the second regime when ht�d is greaterthan the threshold value. If the binary indicator function is replaced bya smooth transition function 0oFðztÞo1 which depends on a transitionvariable zt (like the threshold variable in TAR, models), the model is calledsmooth transition autoregressive (STAR) model. A general form of STARmodel is as follows:

ht ¼ Xtfð1Þð1� FðztÞÞ þ XtcðFðztÞÞ þ Zt; Zt � Nð0; s2Þ (5)

where c ¼ ð1;c1; . . . ;cpÞ, fð1Þ ¼ ð1;fð1Þ1 ; . . . ;fð1Þp Þ, and Xt ¼ ð1; ht�1;ht�2; . . . ht�pÞ. For practical computation, let fð2Þ ¼ c� fð1Þ, then Eq. (5)can be rewritten as

ht ¼ Xtfð1Þþ Xtf

ð2ÞðFðztÞÞ þ Zt (6)

where fð2Þ ¼ ð1;fð2Þ1 ; . . . ;fð2Þp Þ.Model (6) is similar to a two-regime SETAR model. Now the observa-

tions ht switch between two regimes smoothly in the sense that the dynamicsof ht may be determined by both regimes, with one regime sometimes havingmore impact and the other regime having more impacts at other times.


Two popular choices for the smooth transition function are the logisticfunction and the exponential function as follows, respectively:

Fðzt; g; cÞ ¼ ½1þ e�gðzt�cÞ��1; g40 (7)

Fðzt; g; cÞ ¼ 1� e�gðzt�cÞ2

; g40 (8)

The resulting model is referred to as logistic STAR or LSTAR modeland exponential STAR or ESTAR, respectively. In the Eqs. (7) and (8) theparameter c is interpreted as the threshold as in TAR models, and gdetermines the speed and smoothness of the transition.

If, in a SV model, the volatility sequence evolves according to theequation of a STAR (p), then the SV model is called stochastic volatilitymodel with STAR volatilities as (SV-STAR).

yt ¼ stet ¼ e0:5htet et � Nð0; 1Þ

ht ¼ Xtfð1Þþ Xtf

ð2ÞðFðg; c; ht�dÞÞ þ sZZt; Zt � Nð0; 1Þ

(9)

where fð1Þ and fð2Þ are pþ 1 dimensional vectors, and Fðg; c; ht�d Þ is asmooth transition function. We assume, without loss of generality, thatd � p always. When p ¼ 1, the STAR(1) reduces to an AR(1) model.In Fðg; c; ht�d Þ, g40, c and d are smoothness, location (threshold), anddelay parameters, respectively. When g!1, the STAR model reducesto a SETAR model, and when g! 0, the standard AR(p) model arises.We assume that h�pþ1; h�pþ2; � � � ; h0 are not known quantities.

For the sake of computational purposes, the second equation of Eq. (9) ispresented in a matrix form,

ht ¼W 0fþ sZZt (10)

where f0 ¼ ðfð1Þ;fð2ÞÞ and W 0 ¼ ðXt;XtFðg; c; lt�d ÞÞ.

3.5. Markov Switching Stochastic Volatility Models

A Markov switching stochastic volatility model (MSSV), as in So, Lam, andLi (1998), is

yt ¼ stet ¼ e0:5htet; et � Nð0; 1Þ

ht ¼ xst þ dht�1 þ sZZt; Zt � Nð0; 1Þ

xst ¼ r1 þXkj¼2

rjI jt; rj40

(11)

ESMAIL AMIRI330

where st is a state variable and I jt is an indicator variable that is equal to 1when st � j. st follows a k-state first-order Markov process,

pij ¼ Prðst ¼ jjst�1 ¼ iÞ; for i; j ¼ 1; . . . ; k (12)

wherePk

j¼1pij ¼ 1. The goal of this model is to separate clusters of high andlow volatility, captured in the different x’s, and, therefore, more preciselyestimate the persistence parameter d (Hamilton & Susmel, 1994).

We assume that ht is greater in high volatility state ðst ¼ kÞ than in lowvolatility state ðst ¼ 1Þ; when st ¼ 1, MSSV model reduces to model (4). Tosimplify the notation, let x ¼ ðx1; . . . ; xkÞ, s ¼ ðs1; . . . ; sT Þ, h ¼ ðh1; . . . ; hT Þand P ¼ ðpijÞ. Moreover, to make our notation consistent, we present a porder MSSV model as SV-MSAR(p), where p is the order of autoregressivepart of MSSV model.

4. BAYESIAN INFERENCE OF VOLATILITY MODELS

For a given volatility model (GARCH-type or SV), we denote by y 2 Y �Rd the vector of parameters, pðyjyÞ the posterior of y and y ¼ ðy1; y2; . . . ; yT Þthe vector of observations. The joint posterior density of y is then obtainedby Bayes’ theorem as

pðyjyÞ ¼f ðyjyÞ � pðyÞR

Y f ðyjyÞ � pðyÞdy(13)

where pðyÞ is the prior density of y and f ðyjyÞ is the joint density of y given y,which is called likelihood. In Eq. (13),

RY f ðyjyÞ � pðyÞdy is defined as

marginal likelihood, which is the normalizing constant of pðyjyÞ. In Eq. (13),once f ðyjyÞ � pðyÞ has been obtained, a technical difficulty is calculatingthe high-dimensional integral necessary to find the normalizing constantof pðyjyÞ.

The likelihood function of a SV model is intractable; therefore, we haveto resort to simulation methods. Of course, likelihood function of ARCHand GARCH models is well behavior than SV model1, but we prefer tochoose a similar estimation method for all models.

Markov chain Monte Carlo (MCMC) method is, a promising way ofattacking likelihood estimation by simulation techniques using the computerintensive MCMC methods to draw samples from the distribution ofvolatilities conditional on observations. Early work on these methodspioneered by (Hastings, 1970; Geman & Geman, 1984) while recent develop-ments appear in (Gelfand & Smith, 1990) and (Chib & Greenberg, 1995,


1996). Kim and Shephard (1994) and Jacquier, Polson, and Rossi (1994) areamong the first pioneers who applied MCMC methods to estimate SVmodels. MCMC permits to obtain the posterior distribution of theparameters by simulation rather than analytical methods.

Estimating the parameters of models (2)–(4) and (9)–(11) by MCMCmethods is carried on in a Bayesian framework. For the application ofMCMC, priors with specific hyper-parameters should be assumed for thedistributions of parameters. We follow (Congdon, 2003) and fit the modelsto TSE-return time series using MCMC methods. The performanceof the models is evaluated using in-sample and out-of-sample measures.For in-sample model selection, two model choice criteria are considered,DIC and PLC. DIC is the deviance at the posterior mean criterion ofSpiegelhalter, Best, Carlin, and van der Linde (2002) and PLC is theprediction loss criterion of Gelfand and Ghosh (1998). For out-of-samplemodel fit, two model fit criteria are chosen, root mean square error (RMSE)and linear-exponential (LINEX) (Yu, 2002).

4.1. Markov Chain Monte Carlo Methods

Markov Chain Monte Carlo (MCMC) methods have virtually revolutionizedthe practice of Bayesian statistics. Early work on these methods pioneered byHastings (1970) and Geman and Geman (1984) while recent developmentsappears in Gelfand and Smith (1990) and Chib and Greenberg (1995, 1996).When sampling from some high-dimensional posterior densities areintractable, MCMC methods provide us with the algorithms to achieve thedesired samples. Letting pðyÞ be the interested target posterior distribution,the main idea behind MCMC is to build a Markov chain transition kernel

Pðz;CÞ ¼ PrfyðmÞ 2 Cjyðm�1Þ 2 zgMm¼1 (14)

where M is the number of iterations, yðmÞ is a sample of y at the m thiteration of the chain. Starting from some initial state yð0Þ, with limitinginvariant distribution equal to pðyÞ. It has been proved (e.g., by Chib &Greenberg, 1996) that under some suitable conditions, one can build such atransition kernel generating a Markov chain fyðmÞjyðm�1Þg whose realizationsconverge in distribution to pðyÞ. Once convergence is happened, a sample ofserially dependent simulated observation on the parameter y is obtained,which can be used to perform Monte Carlo inference. Much effort has beendevoted to the design of algorithms able to generate a convergent transition

ESMAIL AMIRI332

kernel. The Metropolis–Hastings (M–H) and the Gibbs sampler are amongthe most famous algorithms, which are very effective in buildings the above-mentioned Markov chain transition kernel.

4.2. The Metropolis–Hastings Algorithm

The M–H algorithm introduced by Metropolis, Rosenbluth, Rosenbluth,Teller, and Teller (1953) and Hastings (1970) has similarities with adaptiverejection (A-R) algorithm (Gilks & Wild, 1992). Let pðyÞ denote theun-normalized posterior density, sampling from which is not possible. Tosample from pðyÞ, M–H uses an auxiliary density, which is called candidate-generating density. This will be denoted by qðy; yÞ, where

Rqðy; yÞdy ¼ 1.

This density is used to move from the point y to a new point y by takingsamples from density qðy; yÞ, in other words qðy; yÞ plays the role of a kernel.The move is governed by the probability of move, aðy; yÞ. The probability ofmove is defined as

aðy; yÞ ¼pðyÞqðy; yÞpðyÞqðy; yÞ

(15)

where aðy; yÞ � 1. When pðyÞqðy; yÞopðyÞqðy; yÞ, it means move from y toy happens more often and when pðyÞqðy; yÞ4pðyÞqðy; yÞ we can takeaðy; yÞ ¼ 1 and then y and y can be interchanged in Eq. (15). To summarize,the probability of move would be

aðy; yÞ ¼min

pðyÞqðy; yÞpðyÞqðy; yÞ

; 1

� �if pðyÞqðy; yÞ40

1 otherwise

8<: (16)

It is worth mentioning that if the candidate-generating density is symmetric,then the fraction in Eq. (16) reduces to pðyÞ=pðyÞ and there would be noneed for normalizing pðyÞ. Therefore, the algorithm is as follows, where wedenote aðy; yÞ by r:

� Set i ¼ 0 and yi ¼ y0, where y0 is chosen arbitrarily.� Step 1: Set i ¼ i þ 1, if i4m, stop, otherwise go to the next step.� Step 2: Draw y from qðyi; yÞ and u from Uð0; 1Þ.� Step 3: If r � u accept y, otherwise stay at yi.� Step 4: Go to step 1.


4.3. The Gibbs Sampler

The Gibbs sampler is a version of M–H algorithm for obtaining marginaldistributions from a nonnormalized joint density by a Markovian updatingscheme. It has been developed in the context of image restoration initially byGeman and Geman (1984) and later Tanner and Wong (1987) utilized thismethod in the statistical framework. Later Gelfand and Smith (1990)showed its applicability to general parametric Bayesian computation.

In the Gibbs sampler, conditional distributions play an importantrole in simulation from multivariate distributions. Let’s assume pðyÞ is anunnormalized multivariate density, where y ¼ ðy1; y2; . . . ; ykÞ. The distribu-tions of the form pðyijy�iÞ ¼ pðyijy1; . . . ; yi�1; yiþ1; . . . ; ykÞ are called fullconditionals. If full conditionals are available, the Gibbs sampler is calledconjugate, but if those are not available the Gibbs sampler is callednon-conjugate. Essentially the Gibbs sampler is the M–H algorithm inwhich the full conditionals play the role of candidate generating densitiesand the probability of the acceptance is one. A systematic form of theso-called Gibbs sampler proceeds as follows:

� Step 1: Choose an arbitrary starting set of values yð0Þ1 ; . . . ; yð0Þk� Step 2: Draw yð1Þ1 from pðy1jy

ð0Þ2 ; . . . ; yð0Þk Þ, then yð1Þ2 from pðy2jy

ð1Þ1 ;

yð0Þ3 ; . . . ; yð0Þk Þ, and so on up to yð1Þk from pðykjyð1Þ1 ; yð1Þ2 ; . . . ; yð1Þk�1Þ to complete

one iteration of the scheme.� Step 3: After t iterations of step 2 we would arrive at ðyðtÞ1 ; . . . ; yðtÞk Þ.

5. LIKELIHOODS, CONDITIONAL POSTERIORS,

AND PRIORS

Following Congdon (2003), the log-likelihood for the tth observation underthe models (2) and (3) is

logLt ¼ �0:5 logð2pÞ � 0:5 log s2t � 0:5 logðyt � mÞ2

s2t(17)

and the log-likelihood for the tth observation under the models (4) and (9) is(e.g., Gerlach & Tuyl, 2006)

logLt / �0:5y2t e�ht � 0:5ht � 0:5 log s2Z � 0:5Z2t (18)

ESMAIL AMIRI334

The likelihood for the model (11) can be written as

logLt / �0:5y2t e�ht � 0:5ht � 0:5 log s2Z � 0:5ðht � xst � dht�1Þ

2 (19)

BRugs and SVPack software are used to facilitate programming ofsimulation. BRugs is a collection of R functions that allow users to analyzegraphical models using MCMC techniques. Most of the R functions inBRugs provide an interface to the BRugs dynamic link library (sharedobject file). BRugs is a free software and is downloadable from http://mathstat.helsinki.fi/openbugs. The Bayesian study of volatility models issimple to implement in BRugs and does not require the programmer toknow the precise formulae for any prior density or likelihood. The generalsteps of programming with BRugs are:

1. Identify the likelihood2. Identify the priors of the parameters.

SVPack is a freeware dynamic link library for the OX programinglanguage (Doornik, 1996).

In the following subsections, we summarize only the main steps of theMCMC algorithm for the studied models and present the conditionalposteriors and priors.

5.1. MCMC Algorithm for ARCH and GARCH Models

To estimate the parameters of ARCH and GARCH models using MCMCmethods, the conditional posterior pdfs are necessary. Using the Bayes’theorem, the kernel of conditional posterior pdfs could be a combinationof priors and the likelihood function.

Let us denote a ¼ ða1; a2; . . . ; Þ, b ¼ ðb1; b2; . . . ; bqÞ;s2 ¼ ðs21; s

22; . . . ;s

2T Þ,

and y ¼ ðx; a;b;mÞ. We assume the following independent priors on theunknown parameters:

pðyÞ / pðmÞpðxÞpðaÞpðbÞ (20)

where pðmÞ is a normal pdf and pðxÞ; pðaÞ, and pðbÞ are uniformpdfs. The conditional likelihood function for the ARCH and GARCHmodel is

f ðyjy; t0Þ ¼ ð2pÞ�T=2

YTt¼1

s�1t

!exp �

1

2

XTt¼1

s�2t ðyt � mÞ2 !

(21)


http://mathstat.helsinki.fi/openbugs

http://mathstat.helsinki.fi/openbugs

where t0 is a set of initial conditions. In this chapter we assume the followinginitial conditions:

1. yt ¼ 0 for to0; and2. s2t ¼ s2, for t � 0, where s2 denotes the sample variance of yt.

We use the following conditional pdfs within the Gibbs sampler algorithm:

xjy; a; b;m; s2 � pðxÞf ðyjy; t0Þ (22)

ajy; x; b;m; s2 � pðaÞf ðyjy; t0Þ (23)

bjy; x; a; m;s2 � pðbÞf ðyjy; t0Þ (24)

mjy; x; a; b;s2 � pðmÞf ðyjy; t0Þ (25)

In our empirical work, we drew observations from these conditional pdfsusing a (M–H) algorithm (see Chib & Greenberg, 1995). The main steps ofthe MCMC algorithm for ARCH and GARCH models described byNakatsuma (1998) are:

1. Initialize x; a;b and calculate s2 and set l ¼ 02. Sample a new x by xjy; a;b;m;s2

3. Sample a new a by ajy; x;b;m;s2

4. Sample a new b by bjy; x; a;m; s2

5. Sample a new m by mjy; x; a;b; s2

6. Update s2

7. If loL set l to l þ 1 and continue with 2, where L is the required numberof iterations.

5.2. Priors for ARCH and GARCH Models

We use the following priors for ARCH and GARCH models

m � Nð0; 1; 000Þ; x � Uð0; 1; 000Þ (26)

ai � Uð0; 1Þ; i ¼ 1; � � � ; p (27)

bj � Uð0; 1Þ; j ¼ 1; � � � ; q (28)

s21 � IGð1; 0:001Þ (29)

Also, we assume

Yt � Nðm; s2t Þ (30)

ESMAIL AMIRI336

5.3. MCMC Algorithm for SV Model

We use the same MCMC algorithm as described in Boscher, Fronk, andPigeot (1998). Unlike GARCH models, the sequence of volatilities is notobservable in the SV model; therefore, it has to be estimated along withthe other parameters. To estimate parameters of a SV model conditionalposteriors has to be constructed. Let us denote y ¼ ðx; d; s2ZÞ andh�t ¼ ðh1; h2; . . . ; ht�1; htþ1; . . . ; hT Þ. The likelihood for the tth obsevation is

f ðytjht; y; Þ � Nð0; eht Þ (31)

Using the Markov property, it can be shown that the full conditionalposterior distribution of ht is

pðhtjh�t; y; yÞ / pðhtjht�1; htþ1; y; ytÞ (32)

and

pðhtjh�t; y; yÞ/pðhtjht�1; yÞpðhtþ1jht; yÞ‘ðytjht; yÞ

/exp �1

2ðht þ y2t expð�htÞÞ

� �exp

�ðht � ht Þ2

2v2t

( )(33)

where

ht ¼xð1� dÞ þ dðhtþ1 þ ht�1Þ

1þ d2; v2t ¼

s2Z1þ d2

; t ¼ 2; . . . ;T � 1.

For sampling the full conditionals pðhtjh�t; y; yÞ, we apply the Hastingsalgorithm. A proposal h0 is made using the transition kernel pðhtjht�1;htþ1; y; ytÞ and accepted with an acceptance probability given by

min 1;f ðytjh

0t; yÞ

f ðytjht; y

� �.

The beginning log-volatility value, h1, can be considered either constantor stochastic. We assume h1 is randomly distributed according to thestationary volatility distribution

h1 � Nx

1� d;

s2Z1� d2

!

Like Kim et al. (1998), we assume the prior of s2Z as an inverse gamma (IG)distribution, which leads to an IG as posterior distribution of s2Zjx; d; h; y.


Kim et al. (1998) assert a normal prior for the intercept, x, in the volatilitydynamics equation. The choice of prior distribution for the persistenceparameter, d, is dictated by the goal of imposing stationarity (i.e., restrictingd within the interval (0.1, 1)). That prior is based on the beta distribution.To obtain the prior, define d � Betaðn1; n2Þ. Let d ¼ 2d � 1. It can be easilyverified that the prior distribution of d, pðdÞ, is

pðdÞ ¼Gðn1 þ n2Þ2Gðn1ÞGðn2Þ

1þ d

2

� �n1�1 1� d

2

� �n2�1

(34)

Then the full conditional posterior distribution of d is given by

pðdjx;s2Z; hÞ /1þ d

2

� �n1�1 1� d

2

� �n2�1

exp�1

2s2Z

XT�1t¼1

ðhtþ1 � x� dhtÞ2

" #

(35)

The hyper-parameters are set to n1 ¼ 20 and n2 ¼ 1:5. We proceed likeKim, Shephard, and Chib (1996) and use Hastings algorithm to sample fromthe conditional posterior djx;s2Z; h with a normal distribution Nðmd; vdÞ asproposal density, where

md ¼XT�1t¼1

htþ1ht=XT�1t¼1

h2t ; vd ¼ s2ZXT�1t¼1

h2t

!�1(36)

Thus, here only the main steps of the MCMC algorithm for the SV modelis summarized:

1. Initialize h, x, a, d, and s2Z and set l ¼ 0.2.

(a) For t ¼ 1; 2; . . . ;T , sample a new ht by htjh�t, y, x, d, s2Z.(b) Sample a new s2Z by s2Zjh; y; x; d.(c) Sample a new x by xjh, d, s2Z, y.(d) Sample a new d by djh, x, s2Z, y.

3. If loL set l to l þ 1 and continue with 2, where L is the chosen number ofiterations.

5.4. Priors for SV Model

We use the following priors for SV model:

h0 � Nðx;s2ZÞ (37)

ESMAIL AMIRI338

s2Z � IGð2:5; 0:025ÞðIG distributionÞ (38)

x � Nð0; 10Þ (39)

d ¼ 2d � 1; �1odo1 (40)

d � Betað20; 1:5Þ (41)

Also, we assume

Yt � Nð0; eht Þ (42)

5.5. MCMC Algorithm for SV-STAR

In our application y, h, and y ¼ ðf; g; c;s2ZÞ are the vector of observation, thevector of log volatilities, and the vector of identified unknown parameters,respectively. Following Kim et al. (1998) f ðyjyÞ ¼

Rf ðyjh; yÞpðhjyÞdh

is the likelihood function, the calculation of this likelihood function isintractable.

The aim is to sample the augmented posterior density pðh; yjyÞ thatincludes the latent volatilities h as unknown parameters.

To sample the posterior density pðh; yjyÞ, following Jacquier et al. (1994),full conditional distribution of each component of ðh; yÞ is necessary.

Let’s, assume p and d are known. Applying Lubarno’s (2000) formula-tion, we assume the following priors:

pðgÞ ¼1

1þ g2; g40

where pðgÞ is a truncated cauchy density.

c � U½c1; c2�

where c has a uniform density, c 2 ½c1; c2�, c1 ¼ Fð0:15Þ, c2 ¼ Fð0:85Þ,and F is the empirical cumulative distribution function (CDF) of the timeseries.

pðs2Þ /1

s2


With the assumption of independence of g, c, s2Z, and fð1Þ and also animproper prior for fð1Þ,

pðfð1Þ; g; s2; cÞ / ð1þ g2Þ�1s�2

ðfð2Þjs2; gÞ � Nð0; s2ZegIpþ1Þ

Then, the joint prior density is

pðyÞ / s�3Z ð1þ g2Þ�1 exp �12ðgþ s�2Z e�gf0ð2Þfð2Þ

n o(43)

A full Bayesian model consists of the joint prior distribution ofall unknown parameters, here, y, the unknown states, h ¼ ðh�pþ1; . . . ; h0;h1; . . . ; hT Þ, and the likelihood. Bayesian inference is then based on theposterior distribution of the unknowns given the data. By successiveconditioning, the prior density is

pðy; hÞ ¼ pðyÞpðh0; h�1; . . . ; h�pþ1js2ZÞ YTt¼1

pðhtjht�1; . . . ; ht�p; yÞ (44)

where, we assume

ðh0; h�1; . . . ; h�pþ1js2ZÞ � Nð0;s2IpÞ

and

ðhtjht�1; . . . ; ht�p; yÞ � NðW 0f;s2ZÞ

where f ¼ ðfð1Þ;fð1ÞÞ.The likelihood is

f ð y1; . . . ; yT jy; hÞ ¼YTt¼1

f ðytjhtÞ (45)

where

f ð ytjhtÞ � Nð0; eht Þ

Thus,

f ð y1; . . . ; yT jy; hÞ ¼1

ð2pÞT=2exp �

1

2

XTt¼1

ðe�hty2t þ htÞ

( )(46)

ESMAIL AMIRI340

Using the Bayes, theorem, the joint posterior distribution of the unknownsgiven the data is proportional to the prior times the likelihood, that is

pðy; hjy1; . . .; yT Þ/ð1þ g2Þ�1s�ðTþpþ3Þ

exp �1

2s2s2gþ e�gf0ð2Þfð2Þ þ

X�pþ1t¼0

h2t

"(

þXTt¼1

½ðht �W 0fÞ2 þ s2ðe�hty2t þ htÞ�

#) (47)

In order to apply MCMC methods, full conditional distributions arenecessary, the full conditionals are as follows:

pðyjhÞ /s�ðTþ6Þ=2

ð1þ g2Þexp �

1

s2gs2 þ e�gf0ð2Þfð2Þ þ

XTt¼1

ðht �Wt0fÞ2

" #( )(48)

htjh�t � NðW 0f;s2Þ, h�t ¼ ðh�pþ1; . . . ; h0; h1; . . . ; ht�1; htþ1; . . . ; hT Þ (49)

ðyjht; g; cÞ � NX

WtW0ts�2Z þM

h i XWthts�2Z

� �,X

WtW0ts�2Z þM

� �n o

(50)

where M ¼ diagð0; s2Ze�gIpþ1Þ.

ðs2Zjh; yÞ � IGT þ pþ 1

2; ðegf0ð2Þfð2Þ þ

Xðht �W 0fÞ2Þ=2

� �(51)

where IG denotes inverse gamma density function.

f ðg; cjh; yÞ /s�ðTþ6Þ=2

1þ g2exp �

1

2s2Zgs2Z þ e�gf0ð2Þfð2Þ þ

XTt¼1

ðht �W 0fÞ2" #( )

(52)

f ðhtjh�t; y; yÞ / f ðytjhtÞYpi¼0

f ðhtþijhtþi�1; . . . ; htþi�p; yÞ ¼ gðhtjh�t; y; yÞ (53)

If p and d are not known, their conditional posterior distributions can becalculated as follows.


Let pðdÞ be the prior probability of the d 2 f1; 2; . . . ;Lg, where L is aknown positive integer. Therefore, the conditional posterior distributionof d is

pðdjh; yÞ / f ðdjh; yÞpðdÞ /s�ðTÞ=2

ð2pÞT=2exp �

1

s2XTt¼1

ðht �W 0tfÞ2

( )(54)

Let pðpÞ be the prior probability of the p 2 f1; 2; . . . ;Ng, where N is a knownpositive integer, multiplying the prior by the likelihood and integrating outthe y, the conditional posterior distribution of p is

pðkjh; g; c;d;s2ZÞ / ð2pÞðkþ1Þ=2

½s2Zeg��

kþ12

XTt¼1

WtW0ts�2Z þM

1=2

exp �1

2s�2Z h0h�

XTt¼1

Wthts�2Z

" # XTt¼1

WtW0ts�2Z þM

" #0 (

XTt¼1

WtW0ts�2Z þM

" #�1 XTt¼1

WtW0ts�2Z þM

" # XTt¼1

Wth0ts�2Z

" #1A9=;

(55)

The sampling strategy when p and d are known is as follows:

1. Initialize the volatilities and the parameter vector at some hð0Þ and yð0Þ,respectively.

2. Simulate the volatility vector hi from the following full conditionalpðhtjh

ðiÞ�pþ1; . . . ; h

ðiÞ1 ; . . . ; hðiÞt�1; h

ði�1Þtþ1 ; . . . ; hði�1ÞT ; yði�1Þ; yÞ.

3. Sample f from pðfjhðiþ1Þ; gðiÞ; cðiÞ;s2ðiÞ

Z Þ.

4. Sample s2Z from pðs2Zjhðiþ1Þ;fðiþ1ÞÞ.

5. Sample g and c from pðg; cjhðiþ1Þ;fðiþ1ÞÞ using M–H algorithm.6. If i � m, go to 2, where m is the required number of iterations to generate

samples from pðh; yjyÞ.

If p and d are not known, the following steps could be inserted before thealgorithm’s final step:

1. Sample d from pðdjhðiþ1Þ; yðiþ1ÞÞ.2. Sample k from pðkjhðiþ1Þ; gðiþ1Þ; cðiþ1Þ; d ðiþ1ÞÞ using M–H algorithm.

ESMAIL AMIRI342

5.6. Priors for SV-STAR Models

We use the following priors for SV-STAR model:

pðgÞ ¼1

1þ g2; g40

where pðgÞ is a truncated cauchy density.

c � U½c1; c2�

where c has a uniform density, c 2 ½c1; c2�, c1 ¼ Fð0:15Þ, c2 ¼ Fð0:85Þ, andF is the emprical CDF of the time series.

h0 � Nðx;s2ZÞ (56)

s2Z � IGð2:5; 0:025ÞðIG distributionÞ (57)

xi � Nð0; 10Þ; i ¼ 1; 2 (58)

dij ¼ 2dij � 1; �1odijo1; i ¼ 1; 2; j ¼ 1; . . . ; p (59)

dij � Betað20; 1:5Þ; i ¼ 1; 2; j ¼ 1; . . . ; p (60)

Also, we assume

Yt � Nð0; eht Þ (61)

5.7. MCMC Algorithm for MSSV Models

A MCMC algorithm for the MSSV model based on forward filtering-backward sampling and the smoother algorithm of Shephard (1994) isdeveloped by (So, Lam, & Li, 1998).

For the unknown parameters in the MSSV model, we work with thefollowing prior distributions:

r1 � Nðr10;Cr1Þ; d � TNð�1;1Þðd0;CdÞ

s2Z � IGðn1; n2Þ; h1 � Nðh11;Ch1Þ

ri � TNð0;1Þðri0;Cri Þ; for i ¼ 2; . . . ; k

pi � Dirðui0Þ; for pi ¼ ðpi1; . . . ; pikÞ; i ¼ 1; . . . ; k.


The hyper-parameters are chosen to represent fairly non-informativepriors, so we set d0 ¼ h11 ¼ 0, Cd ¼ Ch1 ¼ 100, n1 ¼ 2:001, n2 ¼ 1, ri0 ¼ 0,Cri ¼ 100, and ui0 ¼ ð0:5; . . . ; 0:5Þ for i ¼ 1; . . . ; k.

To estimate the parameters of the MSSV model (11), the MCMCalgorithm of So et al. (1998) generates samples from the joint posteriordensity of HT , ST , y, and pðHT ;ST ; yjDT Þ

pðHT ;ST ; yjDT Þ / f ðDT jHT ÞpðHT jST ; yÞpðST jyÞpðyÞ (62)

where y ¼ ðx; d; s2Z;PÞ, Dt ¼ ðy1; . . . ; ytÞ, Ht ¼ ðh1; . . . ; htÞ and St ¼

ðs1; . . . ; stÞ. In Eq. (62), f ðDT jHT Þ, pðHT jST ; yÞ, pðST jyÞ and pðyÞ are thelikelihood, conditional posterior density ofHT , posterior distribution of ST ,and prior density of y, respectively. The terms in Eq. (62) defined as

f ðDT jHT Þ /YTt¼1

e�0:5ht exp �1

2y2t e�ht

� �(63)

pðHT jST ; yÞ / s�TZ

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� d2

pexp �

1

2s2Z

XTt¼2

ðht � xst � dht�1Þ2

"(

þð1� d2Þ h1 �xs1

1� d

� �2#) (64)

pðST jyÞ ¼YTt¼2

pst�1stps1 ; pi ¼ Prðs1 ¼ iÞ; i ¼ 1; . . . ; k (65)

and pðyÞ depend on the choice of the prior distribution for the unknownparameter y. We assume that

pðyÞ ¼ pðs2ZÞpðdÞYki¼1

½pðriÞpðpi1; . . . ; pikÞpðpiÞ� (66)

In practice, the Gibbs sampling of drawing successively from the fullconditional distributions is iterated for L ¼M þN times. The first Mburn in iterations are discarded. The last N iterates are taken to be anapproximate sample from the joint posterior distribution.

To simulate the latent variable ST , consider the decomposition ofpðST jDT ;HT Þ as

pðST jDT ;HT Þ ¼ pðsT jDT ;HT ÞYT�1t¼1

pðstjDT ;HT ;Stþ1Þ (67)

ESMAIL AMIRI344

where St ¼ ðst; . . . ; sT Þ. In Eq. (67) pðstjDT ;HT ;Stþ1Þ is

pðstjDT ;HT ;Stþ1Þ ¼ pðstjDT ;HT ; stþ1Þ

/ pðstþ1jst;DT ;HT ÞpðstjDT ;HT Þ

¼ pðstþ1jstÞpðstjHT ;DT Þ

(68)

Using the discrete filter developed by Carter and Kohn (1994) and Chib(1996), desired samples from pðST jDT ;HT Þ are generated via Eqs. (67) and(68). The discrete filter is defined as

pðstjHt; stþ1Þ ¼pðstjHtÞpðstþ1jstÞ

pðstþ1jHtÞ(69)

pðstjHt�1Þ ¼Xki¼1

pðstjst�1;Ht�1Þpðst�1 ¼ ijHt�1Þ

¼Xki¼1

pðstjst�1Þpðst�1 ¼ ijHt�1Þ

(70)

pðstjHt;DT Þ ¼pðstjHt�1ÞpðhtjHt�1; stÞPki¼1pðstjHt�1ÞpðhtjHt�1; stÞ

(71)

To simulate log-volatility variable HT , So et al. (1998) formulate the MSSVmodel in the partial non normal state-space form.

log y2t ¼ ht þ ztht ¼ xst þ dht�1 þ sZZt

(72)

where zt ¼ log e2t . The trick of the approach is to approximate the log w21distribution of zt in Eq. (72) by a mixture of normal distributions; that is

pðztÞ �X7i¼1

qipðztjrt ¼ iÞ, (73)

where ztjrt ¼ i � Nðmi � 1:2704; t2i Þ and qi ¼ Prðrt ¼ iÞ.Given the rt’s as specified under the normal mixture approximation, the

MSSV model in Eq. (72) can be written in an ordinary linear state spaceform; that is

log y2t ¼ ht þ ut

ht ¼ xst þ dht�1 þ sZZt(74)


where utjrt � Nðm0t; t2t Þ, m

0t ¼ mrt � 1:2704, and t2t ¼ t2rt . Instead of sampling

the ht’s directly from f ðh1; . . . ; hT jr1; . . . ; rT ; y;ST ;HT Þ, So et al. (1998)sample Zt’s from their multivariate posterior distribution pðZ1; . . . ; ZT jr1; . . . ; rT ; y;ST ;HT ;DT Þ.

To simulate rt’s, the full conditional density of rt is proportional to

pðlog y2t jrt; htÞpðrtÞ /1

trtexp �

1

2t2rtðlog y2t � ht � mrt þ 1:2704Þ2

" #pðrtÞ (75)

rt’s are independent with discrete support. rt’s are drawn separately from theuniform distribution over ½0; 1�.

The conditional posterior distribution of s2Z is IGððT=2Þ þ n1; ð1=wÞÞwhere

w ¼1

2

XTt¼2

ðht � xst � dht�1Þ2

(þ ð1� d2Þ h1 �

xs11� d

� �2

þ1

n2,

n1 ¼ 2þ 10�2 and n2 ¼ 100.Let’s define time series zit; i ¼ 1; . . . ; k by

zi1 ¼


p1� d

½ð1� dÞh1 � xs1 þ ri�I i1

and

zit ¼ ðht � xst þ ri � dht�1ÞI it; t ¼ 2; . . . ;T

Then, the posterior density of ri is Nð ~ri; ~viÞ, where

~vi ¼vis2Zð1� dÞ

við1þ dÞ þ ð1� dÞ½sZ þ ðT � 1Þvi�

~ri ¼~vis2Z


p1� d

zi1 þXTt¼2

zit þ~rs2Zvi

!

and

I11 ¼ 1; I it ¼1 if ri40

0 otherwise; i ¼ 2; . . . ; k

(

ðvi ¼ 1012Þ

ESMAIL AMIRI346

The conditional posterior distribution of d is

pðdjHT ;ST ;r1; . . . ;rkÞ / QðdÞ exp �a

2s2Zd�

b

a

� �2( )

Id

where

QðdÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffi1� d2

pexp �

1

2sZð1� d2Þ h1 �

xs11� d

� �2( )

a ¼XTt¼2

h2t�1 þsZ

s2d; b ¼

XTt¼2

ht�1ðht � xst Þ þ �dsZ

s2d

and Id is an indicator function. As pðdjHT ;ST ;r1; . . . ;rkÞ does not haveclosed form, we sample from using the M–H algorithm with proposaldensity the truncated N ½�1;1�ð

ba;s2ZaÞ (with the use of d � Nð�d;s2dÞ, sd ¼ 106,

�d ¼ 0 as prior).Given i ¼ i, with the use of noninformative prior, the full conditional

distribution of ðpi1; pi2; . . . ; pikÞ is Dirichlet distribution Dirðdi1; di2; . . . ;dikÞ, where

dij

XTt¼2

Iðst�1 ¼ iÞIðst ¼ jÞ þ 0:5.

Using noninformative prior, the full conditional distribution ofðp1; . . . ;pkÞ is Dirichlet distribution Dirð0:5þ Iðs1 ¼ 1Þ; 0:5þ Iðs1 ¼ 2Þ; . . . ;0:5þ Iðs1 ¼ kÞÞ.

To allow the components of y to vary in the real line we take thelogarithm of s2Z, ri; i ¼ 2; . . . ; k, and pij=ð1� pijÞ.

The main steps of the MCMC algorithm for MSSV model is summarized:

1. Set i ¼ 1.2. Get starting values for the parameters yðiÞ and the states S

ðiÞT and H

ðiÞT .

3. Draw yðiþ1Þ � pðyjHðiÞT ;SðiÞT ;DT Þ.4. Draw S

ðiþ1ÞT � pðST jy

ðiþ1Þ;HðiÞT ;DT Þ

5. Draw Hðiþ1ÞT � pðHT jy

ðiþ1Þ;Sðiþ1ÞT ;DT Þ.6. Set i ¼ i þ 1.7. If ioL, go to 3, where L is the required number of iterations to generate

samples from pðHT ;ST ; yjDT Þ.


6. METHOD OF THE VOLATILITY FORECASTING

With MCMC we sample from the posterior distribution of parameters andestimate the parameters. Based on the estimates obtained from a time seriesof length T in the lth iteration the future volatility at time T þ s is generatedfor s ¼ 1; . . . ;K . Thus, for models (2)–(4) and (9) and (10), an s-day aheadstep forecast is as follows, respectively:

1. Model (2)

s2ðlÞTþs ¼ aðlÞ0 þXqi¼1

aðlÞi y2Tþs�i

where aðlÞ0 , aðlÞi , and s2ðlÞt denote the sample of the respective parameter duringthe lth iteration.

2. Model (3)

s2ðlÞTþs ¼ aðlÞ0 þXpi¼1

aðlÞi y2Tþs�i

Xqj¼1

bðlÞj s2ðlÞTþs�j

where xðlÞ, aðlÞi , bðlÞj , s2ðlÞTþs�j, and s2ðlÞTþs denote the sample of the respectiveparameter during the lth iteration.

3. Model (4)

hðlÞTþs ¼ xðlÞ þ dðlÞhðlÞTþs�1

where aðlÞ, dðlÞ, hðlÞt�1, and hðlÞTþs, denote the sample of the respective parameter

during the lth iteration.4. Model (9)

hðlÞTþs ¼ XTþsf

ð1ÞðlÞþ XTþsf

ð2ÞðlÞðFðg; c; hTþs�d ÞÞ

where fð1ÞðlÞ, fð2ÞðlÞ, and hðlÞTþs denote the sample of respective parameter

during the lth iteration.5. Model (11)

hðlÞTþs ¼ xðlÞst þ dðlÞhðlÞTþs�1

where xðlÞst , dðlÞ, h

ðlÞt�1, and h

ðlÞTþs denote the sample of the respective parameter

during the lth iteration.The volatility s2Tþs is then estimated by

s2Tþs ¼1

L

XLl¼1

s2ðlÞTþs

ESMAIL AMIRI348

After obtaining the daily volatility forecasts across all trading days in eachmonth, monthly volatility forecasts can be calculated using the expression

s2Tþ1 ¼XNTþ1

t¼1

s2t ; T ¼ 96; . . . ; 119

7. FORECAST EVALUATION MEASURES

The performance of the models is evaluated using in-sample and out-of-sample measures.

7.1. In-Sample Measures

In model choice procedures, approaches similar in some ways to classicalmodel validation procedures are often required because the canonicalBayesian model choice methods (via Bayes factors) are infeasible or difficultto apply in complex models or large samples (Gelfand & Ghosh, 1998;Carlin & Louis, 2000). The Bayes factor may be sensitive to the informationcontained in diffuse priors, and is not defined for improper priors.Even under proper priors, with sufficiently large sample sizes the Bayesfactor tends to attach too little weight to the correct model and too muchto a less complex or null model. Hence, some advocate a less formal viewto Bayesian model selection based on predictive criteria other than theBayes factor.

Following the original suggestion of Dempster (1974), a model selectioncriterion in the Bayesian framework is developed (Spiegelhalter et al., 2002).This criterion is named Deviance Information Criterion (DIC) which is ageneralization of well-known Akaike, information criterion (AIC). Thiscriteria is preferred to, Bayesian information criterion (BIC) and AIC,because, unlike them, DIC needs to effective number of parameters of themodel and applicable to complex hierarchical random effects models. DIC isdefined based on the posterior distribution of the classical deviance DðYÞ,as follows:

DðYÞ ¼ �2 log f ðyjYÞ þ 2 log f ðyÞ (76)

where y and Y are vectors of observations and parameters, respectively.

DIC ¼ �Dþ pD (77)


�D ¼ EYjy½D� and pD ¼ EYjy½D� �DðEYjy½Y�Þ ¼ �D�Dð �YÞ. Also DIC can bepresented as

DIC ¼ Dþ 2pD (78)

where D ¼ Dð �YÞ.Predictive Loss Criterion (PLC) is introduced by (Gelfand & Ghosh,

1998). PLC is obtained by minimizing posterior loss for a given model. PLCis a penalized deviance criterion. The criterion comprises a piece which is aBayesian deviance measure and a piece which is interpreted as a penalty formodel complexity. The penalty function arises without specifying modeldimension or asymptotic justification.

PLC can be presented as

PLC ¼ Pm þ Gm (79)

here Pm is a penalty term and Gm is a goodness-of-fit measure. Pm and Gm

are defined as

Pm ¼XTi¼1

VarðZiÞ; Gm ¼w

wþ 1

XTi¼1

fEðZi � yiÞg; w40 (80)

Zi is the replicate of observation yi. Typical values of w at which to comparemodels might be w ¼ 1, w ¼ 10, and w ¼ 100; 000. Larger values of w down-weight precision of predictions. For under-fitted models, predictivevariances will tend to be large and thus so will Pm; but also for over-fittedmodels inflated predictive variances is expected, again making Pm large.Hence, models, which are too simple, will do poorly with both Gm and Pm.As models become increasingly complex, a trade-off will be observed; Gm

will decrease but Pm will begin to increase. Eventually, complexity ispenalized and a parsimonious choice is encouraged.

DIC and PLC admittedly not formal Bayesian choice criteria, but arerelatively easy to apply over a wide range of models including nonconjugateand heavily parameterized models, which the later is our case (Congdon,2003). In comparison, PLC is a predictive check measure based on replicatesampling while DIC is a likelihood based measure.

7.2. Out-of-Sample Measures

Two measures, are used to evaluate the forecast accuracy, the RMSEand the LINEX loss function, which has been advocated in Yu (2002).

ESMAIL AMIRI350

They are defined by

RMSE ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

I

XIi¼1

ðs2i � s2i Þ2

vuut

LINEX ¼1

I

XIi¼1

½expð�aðs2i � s2i ÞÞ þ aðs2i � s2i Þ � 1�

where I is the number of forecasted month and a is a given parameter.Despite mathematical simplicity of RMSE, it is symmetric, a property

that is not very realistic and inconceivable under some circumstances(see Brailsford & Faff, 1996). It is well known that the use of symmetricloss functions may be inappropriate in many circumstances, particularlywhen positive and negative errors have different consequences. LINEXloss functions, introduced by Varian (1975), are asymmetric. The pro-perties of the LINEX loss functions is extensively discussed by ze Zellner(1986).

In the LINEX loss function, the magnitude of a reflects the degree ofasymmetry, also positive errors are weighed differently from the negativeerrors. The sign of shape parameter a reflects the direction of asymmetry.We set a40ðao0Þ if overestimation is more (less) serious than underestimation. In this chapter we follow Yu (2002) and choose a ¼ �20;�10; 10; 20. Obviously, a ¼ �10;�20 penalize overpredictions more heavily,while a ¼ 10; 20 penalize underpredictions more heavily.

8. EMPIRICAL RESULTS

In the following examples smooth transition function is logistic function,but the exponential function can be easily replaced. For the convergencecontrol, as a rule of thumb the Monte Carlo error (MC error) for eachparameter of interest should be less than 5% of the sample standarddeviation.

8.1. In-Sample Fit

Table 2 presents the values of the model fit criteria DIC (Spiegelhalteret al., 2002) and PLC (Gelfand & Ghosh, 1998) for 14 competing models(we set w to 100; 000). DIC criterion shows that SV models have the best fit.


Moreover, within the SV family, SV-LSTAR(1), d=2 has the lowest DICvalue and hence the best in-sample fit. The superior in-sample fit of the SVfamily is also supported by the PLC values, while the ARCH(1) has thelowest PLC value. We choose 10 models with lowest DIC and PLC valuesfrom Table 2 for our out-of-sample fit study.

8.2. Out-of-Sample Fit

The out-of-sample fit results are presented in Tables 3 and 4. In Table 3,the value and ranking of all 10 competing models under the RMSE arereported, while Table 4 presents the value under the four LINEX lossfunctions.

The RMSE statistic favors the SV family while the SV-LSTAR(2), d=2model is the best. Table 3 shows the poor performance of ARCH andGARCH models. In Table 4, the same models are evaluated underasymmetric loss functions, where four LINEX loss functions are used(a ¼ 20; 10;�10, and �20). LINEX with a ¼ 10 and a ¼ 20 identifies theSV-LSTAR(1), d=2 model as the best performer while the ARCH modeland the GARCH model provide the worst forecasts. LINEX with a ¼ �10and a ¼ �20 ranks the SV-LSTAR model first.

Table 2. DIC and PLC Criterion for Volatility Models.

Model DIC PLC

ARCH(1) 1808 15.68

ARCH(2) 1807 15.99

ARCH(3) 1809 15.78

GARCH(1,1) 1805 15.7

GARCH(1,2) 1810 15.82

GARCH(2,2) 1810 15.87

GARCH(2,3) 1812 16.1

GARCH(3,3) 1815 16.2

SV-AR(1) 1805 15.75

SV-LSTAR(1), d=1 1798 15.71

SV-LSTAR(1), d=2 1795 15.7

SV-LSTAR(2), d=1 1805 15.73

SV-LSTAR(2), d=2 1817 15.75

SV-MSAR(1), k=2 1803 15.78

ESMAIL AMIRI352

9. CONCLUSION

In a Bayesian framework using MCMC methods, we fitted five typesof volatility models, namely, ARCH, GARCH, SV-AR, SV-STAR, andSV-MSAR to TSE data set. Applying RMSE and LINEX criterion, theresult of examination shows that the SV models perform better than ARCHand GARCH models. Also among SV models, SV-STAR(1) with d ¼ 2showed smaller RMSE and LINEX criterion.

Table 3. RMSE Criterion for Selected Volatility Models.

Model RMSE Rank

ARCH(1) 0.0060588 7

ARCH(2) 0.0060651 9

GARCH(1,1) 0.0060598 8

GARCH(1,2) 0.0061005 10

GARCH(2,2) 0.0061107 11

SV-AR(1) 0.0059815 6

SV-LSTAR(1), d=1 0.0051432 4

SV-LSTAR(1), d=2 0.0045760 1

SV-LSTAR(2), d=1 0.0048911 2

SV-LSTAR(2), d=2 0.0049951 3

SV-MSAR(1), k=2 0.0059415 5

Table 4. LINEX Criterion for Selected Volatility Models.

Model a ¼ �20 a ¼ �10 a ¼ 10 a ¼ 20

ARCH(1) 4.87254 0.914421 1.425419 5.205047

ARCH(2) 4.99245 0.924153 1.332167 5.246215

GARCH(1,1) 4.66458 0.899152 1.328796 5.354812

GARCH(1,2) 4.66381 0.899123 1.352265 5.491561

GARCH(2,2) 5.21214 0.898822 1.256482 5.482564

SV-AR(1) 4.25454 0.885123 1.225894 5.145892

SV-LSTAR(1), d=1 4.21254 0.884521 1.215268 5.145467

SV-LSTAR(1), d=2 4.12112 0.884150 1.099282 5.002143

SV-LSTAR(2), d=1 4.35412 0.889893 1.154621 5.110056

SV-LSTAR(2), d=2 4.23415 0.885521 1.112451 5.125489

SV-MSAR(1), k=2 4.21211 0.88504 1.225774 5.145788


NOTE

1. There are easier methods such as maximum likelihood estimation (MLE)method, which can be used to estimate parameters of ARCH and GARCH models,see Hamilton (1994) for more details.

ACKNOWLEDGMENTS

The author would like to thank the editors, and anonymous referees forhelpful comments and suggestions.

REFERENCES

Azizi, F. (2004). Estimating the relation between the rate of inflation and returns on stocks in

Tehran Stock Exchange. (In Farsi, with English summary). Quarterly Journal of the

Economic Research, 11–12, 17–30.

Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of

Political Economy, 81, 637–659.

Bollerslev, T. (1986). Generalized autoregressive conditional hetroskedasticity. Journal of


Boscher, H., Fronk, E.-M., & Pigeot, I. (1998). Estimation of the stochastic volatility

by Markov chain Monte Carlo. In: R. Galata & H. Kiichenhoff (Eds), Festschrift

zum 65. Geburtstag von Prof. Dr. Hans Schneeweit3: Econometrics in theory and practice

(pp. 189–203). Heidelberg: Physica-Verlag.

Brailsford, T. J., & Faff, R. W. (1996). An evaluation of volatility forecasting technique. Journal

of Banking and Finance, 20, 419–438.

Brooks, C. (1998). Predicting stock market volatility: Can market volume help? Journal of

Forecasting, 17(1), 59–80.

Carlin, B., Louis, T. (2000). Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed.

Texts in Statistical Sciences. Chapman and Hall/ RCR, Boca Raton.

Carter, C. K., & Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81,

541–553.

Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture

models. J. Econometrics, 75, 79–97.

Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm.

The American Statistician, 49, 327–335.

Chib, S., & Greenberg, E. (1996). Markov Chain Monte Carlo simulation methods in

econometrics. Econometrics Theory, 12, 409–431.

Congdon, P. (2003). Applied Bayesian Modelling. Chichester, UK: Wiley. ISBN: 0-471-48695-7.

Dempster, A. P. (1974). The direct use of likelihood for significance testing. Proceedings of

Conference on Fundamental questions in Statistical Inference, Department of Theoretical

Statistics: University of Aarhus (pp. 335–352).

ESMAIL AMIRI354

Doornik, J. A. (1996). Ox: Object oriented Matrix programming, 1.10. London: Chapman &

Hall.

Engle, R. F. (1982). Autoregressive conditional hetroskedasticity with estimates of the

variannce of U.K. inflation. Econometrica, 50, 987–1008.

Gelfand, A., & Ghosh, S. (1998). Model choice: A minimum posterior predictive loss approach.

Biometrika, 85(1), 1–11.

Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal

densities. Journal American Statistical Association, 85, 398–409.

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian

restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence,

6, 721–741.

Gerlach, R., & Tuyl, F. (2006). MCMC methods for comparing stochastic volatility and

GARCH models. International Journal of Forecasting, 22, 91–107.

Gilks, W., & Wild, P. (1992). Adaptive rejection sampling for Gibbs sampling. Applied

Statistics, 41(2), 337–348.

Hamilton, J. D., & Susmel, R. (1994). Autoregressive conditional heteroskedasticity and

changes in regime. Journal of Econometrics, 64, 307–333.

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov Chains and their

applications. Biometrika, 57, 97–109.

Jacquier, E., Polson, N. G., & Rossi, P. (1994). Bayesian analysis of stochastic volatility models.

Journal of Business & Econometric Statistics, 12, 371–417.

Kim, S., & Shephard, N. (1994). Stochastic volatility: Likelihood inference and comparison

with ARCH models. Economics Papers 3. Economics Group, Nuffield College,

University of Oxford.

Kim, S., Shephard, N., & Chib, S. (1998). Stochastic volatility: Likelihood inference and

comparison with ARCH models. Review of Economic Studies, 65, 361–393.

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of

stationarity against the alternative of a unit root. Journal of Econometrics, 54, 159–178.

Merton, R. (1980). On estimating the expected return on the market: An explanatory

investigation. Journal of Financial Economics, 8, 323–361.

Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. (1953). Equation

of state calculations by fast computing machines. Journal of Chemical Physics, 21(6),

1087–1092.

Najarzadeh, R., & Zivdari, M. (2006). An empirical investigation of trading volume and return

of the Teheran Stock Exchange. Quarterly Journal of the Economic Research, 2, 59–80.

Nakatsuma, T. (1998). A Markov-chain sampling algorithm for GARCH models. Studies in

Nonlinear Dynamics and Econometrics, 3(2), 107–117.

Perry, P. (1982). The time-variance relationship of security returns: Implications for the return-

generating stochastic process. Journal of Finance, 37, 857–870.

Rachev, S., Hsu, J., Bagasheva, B., & Fabozzi, F. (2008). Bayesian methods in finance. Finance.

Hoboken, NJ: Wiley. ISBN: 978-0-471-92083-0.

Shephard, N. (1994). Partial non-Gaussian state space. Biometrika, 81, 115–131.

Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In: D. R. Cox,

O. E. Barndorff-Nielson & D. V. Hinkley (Eds), Time series models in econometrics,

finance and other fields (pp. 1–67). London: Chapman and Hall.

So, M. K. P., Lam, K., & Li, W. K. (1998). A stochastic volatility model with Markov

switching. Journal of Business and Economic Statistics, 16, 244–253.


Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian

measures of model complexity and fit. Journal of the Royal Statistical Society, Series B,

64, 583–639.

Tanner, M., & Wong, W. (1987). The calculation of posterior distribution by data

augmentation. Journal of the American Statistical Association, 81, 82–86.

Taylor, S. J. (1986). Modeling financial time series. Chichester, Great Britain: Wiley.

Varian, H. R. (1975). A Bayesian approach to real estate assessment. In: S. E. Fienberg &

A. Zellner (Eds), Studies Bayesian Econometrics and Statistics in Honor of Leonard J.

Savage (pp. 195–208). Amsterdam: North Holland.

Yu, Jun (2002). Forecasting volatility in the New Zealand stock market.

ze Zellner, A. (1986). Bayesian estimation and prediction using asymmetric loss functions.


Zivot, E., & Wang, J.-h. (2006). Modeling Financial Time Series with S-Plus. Springer, Berlin

Heidelberg: Springer-Verlag. ISBN 0-387-27965-2.

ESMAIL AMIRI356

[william greene] maximum simulated likelihood meth(bookzz.org)

Documents