e62: stochastic frontier models and efficiency...

E62: Stochastic Frontier Models and Efficiency Analysis 1

E62: Stochastic Frontier Models and Efficiency Analysis

E62.1 Introduction Chapters E62–E65 present LIMDEP’s programs for two types of efficiency analysis, stochastic frontier analysis (SFA) and data envelopment analysis (DEA). To a large extent, these are competing methodologies. No formulation has yet been devised that unifies the two in a single analytical framework. Arguably, the former is a fully parameterized model whereas the latter is ‘nonparametric,’ albeit also atheoretical in nature. The stochastic frontier model is used in a large literature of studies of production, cost, revenue, profit and other models of goal attainment. The model as it appears in the current literature was originally developed by Aigner, Lovell, and Schmidt (ALS, 1977). The canonical formulation that serves as the foundation for other variations is ALS’s model, y = β′x + v - u, where y is the observed outcome (goal attainment), β′x + v is the optimal, frontier goal (e.g., maximal production output or minimum cost) pursued by the individual, β′x is the deterministic part of the frontier and v ~ N[0,σv

2

u = |U| and U ~ N[0,σu2

] is the stochastic part. The two parts together constitute the ‘stochastic frontier.’ The amount by which the observed individual fails to reach the optimum (the frontier) is u, where

]

(change to v + u for a stochastic cost frontier or any setting in which the optimum is a minimum). In this context, u is the ‘inefficiency.’ This is the normal-half normal model which forms the basic form of the stochastic frontier model. Many varieties of the stochastic frontier model have appeared in the literature. A major survey that presents an extensive catalog of these formulations is Kumbhakar and Lovell (2000). (See, as well, Bauer (1990), Greene (2008) and several other surveys, many of which are cited in Kumbhakar and Lovell and in Greene.) The estimator in LIMDEP computes parameter estimates for most single equation cross section and panel data variants of the stochastic frontier model. A large number of variants of the stochastic frontier model based on different assumptions about the distribution of the ‘inefficiency’ term, u have been proposed in the received literature. Most of these are available in LIMDEP, as suggested in the list below. The bulk of the received technology centers on cross section style modeling. However, recent advances include many extensions that take advantage of the features of panel data. A large array of panel data estimators are also supported by LIMDEP as well.


The conventional approach to deterministic frontier estimation is currently data envelopment analysis. This is usually handled with linear programming techniques. The analysis assumes that there is a frontier technology (in the same spirit as the stochastic frontier production model) that can be described by a piecewise linear hull that envelopes the observed outcomes. Some (efficient) observations will be on the frontier while other (inefficient) individuals will be inside. The technique produces a deterministic frontier that is generated by the observed data, so by construction, some individuals are ‘efficient.’ This is one of the fundamental differences between DEA and SFA. Data envelopment analysis is documented in Chapter E65. The analysis of production, cost, etc. in the stochastic frontier framework involves two steps. In the first, the frontier model is estimated, usually by maximum likelihood. In the second, the estimated model is used to construct measures of inefficiency or efficiency. Individual specific estimates are computed that provide the basis of comparison of firms either to absolute standards or to each other. The sections of this chapter develop several model forms used in the first step. Efficiency estimation, the second step, appears formally in Section E62.8. The general methodology is then used in the already developed specifications and with several proposed in the sections that follow, as well as in Chapters E63 and E64.

E62.2 Stochastic Frontier Model Specifications The stochastic frontier model is y = β′x + v-u, u =|U| In this area of study, unlike most others, estimation of the model parameters is usually not the primary objective. Estimation and analysis of the inefficiency of individuals in the sample and of the aggregated sample are usually of greater interest. This part of the development will present tools for estimation of inefficiency. Typically, the production or cost model is based on a Cobb-Douglas, translog, or other form of logarithmic model, so that the essential form is log y = β′x + v - u where the components of x are generally logs of inputs for a production model or logs of output and input prices for a cost model, or their squares and/or cross products. In this form, then, at least for relatively small variation, u represents the proportion by which y falls short of the goal, and has a natural interpretation as proportional or percentage inefficiency. The numerous examples below will demonstrate. Users are also referred to the various survey sources listed earlier. The results one obtains are, of course, critically dependent on the model assumed. Thus, specification and estimation of model parameters, while perhaps of secondary interest, are nonetheless a major first step in the model building process. In nearly all received formulations, the random component, v, is assumed to be normally distributed with zero mean. In some models, v may be heteroscedastic. But, in either form, the large majority of the different frontier models that have been proposed result from variations on the distribution of the inefficiency term, u. The range of specifications examined in this chapter includes the following:

• Distributional assumptions: half normal, exponential, gamma • Partially nonparametric frontier function • Sample selection model


The following extensions are presented in Chapter E63

• Truncated normal with nonzero, heterogeneous mean in the underlying U • Heteroscedasticity in v and/or u • Heterogeneity in the parameter of the exponential or gamma distribution • Amsler et al.’s ‘scaling model’ • Alvarez et al.’s model of fixed, latent management

A number of treatments for panel data are presented in Chapter E64. E62.3 Basic Commands for Stochastic Frontier Models The command for all specifications of the stochastic frontier model is FRONTIER ; Lhs = y ; Rhs = one, ... ; … other specifications $ NOTE: One must be the first variable in the Rhs list in all model specifications. The default specification is Aigner, Lovell and Schmidt’s canonical normal-half normal model. The default form is a production frontier model, y = β′x + v - u, u = |U|. That is, the right hand side of the equation specifies the maximum goal attainable. To specify a cost frontier model or other model in which the frontier represents a minimum, so that y = β′x + v + u, u = |U|, use ; Cost This specification is used in all forms of the stochastic frontier model. As noted below, one additional specification you may find useful is ; Start = values for β, λ, σ (the meanings of the parameters are developed below). ALS also developed the normal-exponential model, in which u has an exponential distribution rather than a half normal distribution. To request the exponential model, use ; Model = Exponential (or ; Model = E ) in the FRONTIER command. For this model, the parameters are (β,θ,σv). Further details appear below. There are also several model forms, and numerous modifications such as heteroscedasticity that are developed below.


This is the full list of general specifications that are applicable to this model estimator. Controlling Output from Model Commands

; Par keeps ancillary parameters σ, λ, etc. with main parameter β vector in b. ; OLS displays least squares starting values when (and if) they are computed. ; Table = name saves model results to be combined later in output tables.

Robust Asymptotic Covariance Matrices

; Covariance Matrix displays estimated asymptotic covariance matrix (normally not shown), same as ; Printvc.

; Choice uses choice based sampling (sandwich with weighting) estimated matrix. ; Cluster = spec requests computation of the cluster form of corrected covariance estimator. Optimization Controls for Nonlinear Optimization

; Start = list gives starting values for a nonlinear model. ; Tlg [ = value] sets convergence value for gradient. ; Tlf [ = value] sets convergence value for function. ; Tlb [ = value] sets convergence value for parameters. ; Alg = name requests a particular algorithm, Newton, DFP, BFGS, etc. ; Maxit = n sets the maximum iterations. ; Output = n requests technical output during iterations; the level ‘n’ is 1, 2, 3 or 4. ; Set keeps current setting of optimization parameters as permanent.

Predictions and Residuals

; List displays a list of fitted values with the model estimates. ; Keep = name keeps fitted values as a new (or replacement) variable in data set. ; Res = name keeps residuals as a new (or replacement) variable. ; Fill fills missing values (outside estimating sample) for fitted values.

Hypothesis Tests and Restrictions

; Test: spec defines a Wald test of linear restrictions. ; Wald: spec defines a Wald test of linear restrictions, same as ; Test: spec. ; CML: spec defines a constrained maximum likelihood estimator. ; Rst = list specifies equality and fixed value restrictions.

; Maxit = 0 ; Start = the restricted values specifies Lagrange multiplier test.


E62.3.1 Predictions, Residuals and Partial Effects Predicted values and ‘residuals’ for the stochastic frontier models are computed as follows: The same forms are used for cross section and panel data forms. The predicted value is β′x. (These are rarely useful in this setting.) The ‘residual’ is computed directly as ˆ

i i ie y ′= − xβ This residual is usually not of interest in itself. It is, however, the crucial ingredient in the efficiency estimator discussed in Section E62.8. The estimator of ui that we will use is computed by the Jondrow, et. al (1982) formula E[u|v-u] or E[u|v+u] if based on a cost frontier,

2

( )ˆ[ | ] ,1 1 ( )

wE u w v uw

σλ φε = − ε = ± + λ −Φ

, w = ελ/σ,

2 2 , .uv u

v

σσ = σ + σ λ =

σ

In the JLMS formula, ei is the estimator of εi. The formulas and computations are discussed in Section E62.8. The frontier model is, save for its involved disturbance term, a linear regression model. The conditional mean in the model is E[yi|xi] = β′xi - E[ui|xi]. In most cases, E[ui|xi]is not a function of xi, so the derivatives of E[yi|xi] with respect to xi are just β. In other cases, we will consider, the conditional mean of ui does depend on xi or other variables, so the partial effects in the model might be more involved than this. Once again, however, these will usually not be of direct interest in the study. But, in all cases, ˆ[ | ]E u ε will be an involved function of xi and any other variables that appear anywhere else in the model. We will examine the partial effects on the efficiency estimators in Section E62.8. E62.3.2 Results Saved by the Frontier Estimator The results saved by the frontier estimator are Matrices: b = regression parameters, α,β varb = asymptotic covariance matrix Scalars: sy, ybar, nreg, kreg, and logl Last Function: JLMS estimator of ui.


Use ; Par to add the ancillary parameters to these. The ancillary parameters that are estimated for the various models are as follows, including the scalars saved by the estimation program: Half and truncated normal: estimates λ, σ, saves lmda and s = σ, Truncated normal: same as half normal, estimates µ, saved as mu, Exponential: estimates θ, σv, saves theta and s = σv, Heteroscedastic model: average value of σ as s, average value of λ as lmda Heterogeneity in mean: estimates λ, σ, saves lmda and s = σ. E62.4 Data for the Analysis of Frontier Models We will use two data sets to illustrate the frontier estimators. The first, the data on U.S. airlines is a panel data set that we will use primarily for illustrating the stochastic frontier model. The second, the famous WHO data on health care attainment, will be used both for the stochastic frontier models and for the later work on data envelopment analysis. E62.4.1 Data on U.S. Airlines We will develop several examples in this section using a panel data set on the U.S. airline industry from the pre-deregulation period (airlines.dat). The observations are an unbalanced panel on 25 airlines. The original balanced panel data set contained 15 observations (1970-1984) on each of 25 airlines. Mergers, strikes and other data problems reduced the sample to the unbalanced panel of 256 observations The group sizes (number of firms) are 2 (4), 4(1), 7 (1), 9 (3), 10 (3), 11 (1), 12 (2), 13 (1), 14 (3) and 15 (6). The variables in the data set are firm = ID, 1,...,25 year = 1970...1984 t = year - 1969 = 1,...,15 cost = total cost revenue = revenue output = total output stage = average stage length points = number of points served loadfct = load factor cmtl = materials cost mtl = materials quantity pm = price of material cfuel = fuel cost fuel = fuel quantity pf = fuel price ceqpt = equipment cost eqpt = equipment quantity pe = equipment price clabor = labor cost labor = labor quantity pl = labor price cprop = property cost property = property quantity pp = property price k = capital index pk = capital price index Transformed variables used in the examples are as follows: lc = log(cost) cn = cost/pp lcn = log(cn) lpm = log(pm) lpf = log(pf) lpe = log(pe) lpl = log(pl) lpp = log(pp) lpk = log(pk) lpmpp = log(pm/pp) lpfpp = log(pf/pp) lpepp = log(pe/pp) lplpp = log(pl/pp) lf = log(fuel) lm = log(mtl) le = log(eqpt) ll = log(labor) lp = log(property) lq = log(output) lq2 = lq2


E62.4.2 World Health Organization (WHO) Health Attainment Data The data used by the WHO in their 2000 World Health Report assessment of health care attainment by 191 countries have been used by many researchers worldwide both for developing frontier models and for analyzing health outcomes. The data are a panel of five years, 1993-1997, on health outcome data for 191 countries and a number of internal political units, e.g., the states of Mexico. The main outcome variables are dale and comp (an aggregate of such measures as efficiency and equity of health care delivery in the country). The main input variables are hexp and educ. A variety of other variables, listed below, were observed only in 1997. The following descriptive statistics apply to the entire data set of 840 observations: Variable Mean Std. Dev. Description country * * country number omitting internal units, 1...,191 year * * year (1993-1997) small * * internal political unit, 0 for countries, else 1,...,6. comp 75.0062726 12.2051123 composite health care attainment dale 58.3082712 12.1442590 disability adjusted life expectancy hexp 548.214857 694.216237 health expenditure per capita, PPP units educ 6.31753664 2.73370613 educational attainment, years oecd .279761905 .449149577 OECD member country, dummy variable gdpc 8135.10785 7891.20036 per capita GDP in PPP units popden 953.119353 2871.84294 population density per square KM gini .379477914 .090206941 gini coefficient for income distribution tropics .463095238 .498933251 dummy variable for tropical location pubthe 58.1553571 20.2340835 proportion of health spending paid by government geff .113293978 .915983955 World Bank government effectiveness measure voice .192624849 .952225978 World Bank measure of democratization (The data were analyzed in Greene (2004a,b). Some of the variables, such as popden and gdpc, were augmented from other sources in these studies.) Although the data are a five year panel – a few countries were observed for fewer than five years – there is almost no cross year variation in any variable. (The proportion of total variation that is within groups is less than 1% for the four time varying variables.) We have created a cross section from these data as follows: First, we discarded the data on internal political units. We then averaged comp, dale, hexp and educ across the five years. We retained a sample of 191 cross sectional (country) units. The following command set creates the data set. SAMPLE ; 1-840 $ REJECT ; small > 0 $ SETPANEL ; Group = country ; Pds = ti $ RENAME ; hc3 = educ $ CREATE ; lpubthe = log(pubthe) $ CREATE ; dalebar = Group Mean(dale,pds = ti) $ CREATE ; compbar = Group Mean(comp,pds = ti) $ CREATE ; educbar = Group Mean(educ,pds = ti) $ CREATE ; hexpbar = Group Mean(hexp,pds = ti) $ CREATE ; logdbar = Log(dalebar) ; logcbar = Log(compbar) $ CREATE ; logebar = Log(educbar) ; loghbar = Log(hexpbar) $ CREATE ; loghbar2 = loghbar^2 $ REJECT ; year # 1997 $


E62.5 Skewness of the OLS Residuals and Problems Fitting Stochastic Frontier Models Before maximum likelihood estimation begins, the skewness of the OLS residuals in the regression of y on x is checked. Waldman (1982) has shown that when the OLS residuals are skewed in the wrong direction, a solution for the maximum likelihood estimator for the stochastic frontier model is simply OLS for the slopes and for σv

2 and 0.0 for σu2. If this condition is found, a

lengthy warning is issued. We emphasize, this is not a bug in the program, nor is it something to be ‘fixed,’ beyond changing the specification of the model or rethinking the stochastic frontier as the modeling platform. This is our single most frequently posed question, so we offer an application to demonstrate the effect. Consider the commands CALC ; Ran(12345) $ SAMPLE ; 1-500 $ CREATE ; u = Abs(Rnn(0,2))

; v = Rnn(0,1) ; x = Rnn(0,1) ; y = x + v + u $

REGRESS ; Lhs = y ; Rhs = one,x ; Res = e $

FRONTIER ; Lhs = y ; Rhs = one,x $ KERNEL ; Rhs = e $ The CREATE command generates y exactly according to the model, except note that u is not subtracted, it is added. Thus, we should expect this model to perform poorly. The estimation results from the FRONTIER command are shown below. Note the string of warnings. Estimation is allowed to proceed, but the results are not a ‘frontier’ as such. The final estimate of λ is essentially zero, with a huge standard error and the reported estimate of σu

2 in the box above the results is 0.0000. The other estimates are, in fact, the same as OLS. The kernel density estimator for the OLS residuals is clearly skewed in the positive, that is, the wrong direction. Once again, we emphasize, this is a failure of the data to conform to the model. Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE. WARNING! OLS residuals have the wrong skewness for SFM Other forms of the model models may also behave poorly. In this case, one MLE for the half normal model is OLS for beta and sigma and zero for the inefficiency term. Warning 141: Iterations:current or start estimate of sigma nonpositive Warning 141: Iterations:current or start estimate of sigma nonpositive Warning 141: Iterations:current or start estimate of sigma nonpositive Warning 141: Iterations:current or start estimate of sigma nonpositive Warning 141: Iterations:current or start estimate of sigma nonpositive Line search at iteration 30 does not improve fn. Exiting optimization.


----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable Y Log likelihood function -921.33848 Estimation based on N = 500, K = 4 Inf.Cr.AIC = 1850.7 AIC/N = 3.701 Variances: Sigma-squared(v)= 2.33375 Sigma-squared(u)= .00000 Sigma(v) = 1.52766 Sigma(u) = .00000 Sigma = Sqr[(s^2(u)+s^2(v)]= 1.52766 Gamma = sigma(u)^2/sigma^2 = .00000 Stochastic Production Frontier, e = v-u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 -921.33851 Chi-sq=2*[LogL(SF)-LogL(LS)] = .000 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence Y| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 1.61107 165.2912 .01 .9922 -322.35365 325.57580 X| 1.00746*** .07057 14.28 .0000 .86914 1.14578 |Variance parameters for compound error Lambda| .10897D-05 135.6070 .00 1.0000 -.26578D+03 .26578D+03 Sigma| 1.52766*** .00242 630.99 .0000 1.52292 1.53241 --------+--------------------------------------------------------------------

Figure E62.1 Kernel Density for Least Squares Residuals


Unfortunately, the Waldman result is a sufficient condition, not a necessary one. That is, it has been shown that when the OLS residuals have the ‘right’ skewness, then the MLE for the frontier model is unique, and you will have no trouble in estimation. When they have the ‘wrong’ skewness, it is only shown that the OLS results are a local stationary point of the log likelihood, not that they are the global maximizers. There may be another point that is yet better than OLS. Our airline data used below provide an example. Consider the following results, where we present both the stochastic frontier estimates and OLS. (The model, itself, is developed later, so we show only the useful results here.) As above, we receive the initial warning about the skewness of the OLS residuals. Then, estimation proceeds and an apparently routine solution emerges that is different from, and better than (has a higher log likelihood) OLS. Error 315: Stoch. Frontier: OLS residuals have wrong skew. OLS is MLE. WARNING! OLS residuals have the wrong skewness for SFM Other forms of the model models may also behave poorly. In this case, one MLE for the half normal model is OLS for beta and sigma and zero for the inefficiency term. Normal exit: 11 iterations. Status=0, F= -105.0617 ----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LQ Log likelihood function 105.06169 Variances: Sigma-squared(v)= .02411 Sigma-squared(u)= .00457 Sigma(v) = .15527 Sigma(u) = .06757 Stochastic Production Frontier, e = v-u --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LQ| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| -1.05847*** .02333 -45.37 .0000 -1.10419 -1.01274 LF| .38355*** .07045 5.44 .0000 .24547 .52163 LE| .21961*** .07300 3.01 .0026 .07653 .36270 LM| .71667*** .07654 9.36 .0000 .56666 .86668 LL| -.41139*** .06382 -6.45 .0000 -.53647 -.28630 LP| .18973*** .02960 6.41 .0000 .13171 .24775 |Variance parameters for compound error Lambda| .43515** .20117 2.16 .0305 .04086 .82944 Sigma| .16933*** .00057 295.74 .0000 .16821 .17045 --------+-------------------------------------------------------------------- Ordinary least squares regression ............ Diagnostic Log likelihood = 105.05876 Standard error of e = .16244 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LQ| Coefficient Error t |t|>T* Interval --------+-------------------------------------------------------------------- Constant| -1.11237*** .01015 -109.57 .0000 -1.13227 -1.09247 LF| .38283*** .07116 5.38 .0000 .24335 .52231 LE| .21922*** .07389 2.97 .0033 .07441 .36404 LM| .71924*** .07732 9.30 .0000 .56769 .87078 LL| -.41015*** .06455 -6.35 .0000 -.53665 -.28364 LP| .18802*** .02980 6.31 .0000 .12961 .24643 --------+--------------------------------------------------------------------


There is no simple bullet proof strategy for handling this situation. You can try different starting values with ; Start = values for β, λ, σ that differ from OLS, but it is hard to know where these will come from. Moreover, it is likely that you will end up at OLS anyway. As Waldman points out, this is a potentially ill behaved log likelihood function. We offer the preceding as a caution for the practitioner. For the particular data set used here, we can identify a specific culprit. The ‘failure’ of the model emerges in the presence of the variable lm, and does not occur when lm is omitted from the equation. We have no theory, however, for why this should be the case. Simply deleting variables from the model until one which does not have the skewness problem emerges does not seem like an effective strategy. We do note, the failure might signal a misspecified model. For example, for our airlines example, the specification above omits the capital variable. When LK = log(k) is added to the model, we obtain the following quite routine results (albeit with the wrong signs on capital and labor inputs). Normal exit: 13 iterations. Status=0, F= -108.4392 -----------------------------------------------------------------------------------------------------------

Limited Dependent Variable Model - FRONTIER Dependent variable LQ Log likelihood function 108.43918 Estimation based on N = 256, K = 9 Inf.Cr.AIC = -198.9 AIC/N = -.777 Variances: Sigma-squared(v)= .01902 Sigma-squared(u)= .01692 Sigma(v) = .13791 Sigma(u) = .13007 Sigma = Sqr[(s^2(u)+s^2(v)]= .18957 Gamma = sigma(u)^2/sigma^2 = .47074 Var[u]/{Var[u]+Var[v]} = .24425 Stochastic Production Frontier, e = v-u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 108.07431 Chi-sq=2*[LogL(SF)-LogL(LS)] = .730 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 -----------+----------------------------------------------------------------------------------------------- | Standard Prob. 95% Confidence LQ| Coefficient Error z |z|>Z* Interval -----------+----------------------------------------------------------------------------------------------- | Deterministic Component of Stochastic Frontier Model Constant| -2.98823*** .72136 -4.14 .0000 -4.40206 -1.57439 LF| .37257*** .07038 5.29 .0000 .23463 .51052 LE| 2.09473*** .68790 3.05 .0023 .74647 3.44299 LM| .69910*** .07580 9.22 .0000 .55054 .84766 LL| -.42909*** .06315 -6.79 .0000 -.55287 -.30530 LP| .44533*** .09498 4.69 .0000 .25917 .63149 LK| -2.09806*** .76556 -2.74 .0061 -3.59853 -.59759 | Variance parameters for compound error Lambda| .94309*** .16870 5.59 .0000 .61244 1.27373 Sigma| .18957*** .00064 297.81 .0000 .18832 .19082 -----------+-----------------------------------------------------------------------------------------------

We emphasize, the Waldman result, and this particular theoretical outcome, is specific to the normal-half normal model. However, when it occurs, problems of a similar sort will often, but not always, show up in other models. Thus, in spite of a warning, your fitted exponential, or panel data model, may be quite satisfactory.


E62.6 The Ordinary Least Squares Estimator For the simplest specification y = β′x + v - u, u =|U| in which β contains a constant term and both v and U are homoscedastic and have zero means, i.e., in the original half normal or exponential models, the OLS estimator of all elements of β except the constant term are consistent. It is convenient to rewrite the model as y = β0 + β1′x1 + v - u. Under the assumptions, we can write the model as y = (β0 - E[u]) + β1′x1 + v - (u - E[u])

or y = α + β1′x1 + e in which e has zero mean and constant variance, and is orthogonal to (1,x1). Thus, the model as shown can be estimated consistently by OLS. The constant term estimates α = (β0 - E[u]). Assuming that E[u] is estimable, therefore, estimation of β by MLE vs. OLS is a question of efficiency, not consistency. (However, we remain interested in estimation of u, so this may be a moot point.) E62.6.1 Corrected Ordinary Least Squares – COLS The COLS estimator is obtained by turning the least squares estimator into a deterministic frontier model. This is done by shifting the intercept in the OLS estimator upward (for a production frontier) or downward (for a cost frontier) so that all points lie either below or above the estimated function. Figure E62.2 shows the result for estimation of a simple cost frontier for the airlines data. The function is shifted so that it rests on the single most extreme point (residual) in the data. The COLS estimator is requested with FRONTIER ; Lhs = goal variable ; Rhs = one, … ; Model = COLS $ Add ; Cost if the model is a cost frontier. Efficiency values, as discussed below, are obtained as follows: ; Eff = variable name saves the residuals from the deterministic frontier. These are the estimates of ui. Note in Figure E62.1, for a cost frontier, all values of ui are positive. If you fit a production frontier, then all points will lie below the regression and all residuals will be negative. The estimated inefficiency that is saved will be -ei. Thus, in both cases, the values saved by ; Eff = variable are the positive estimates of the size of the deviation of the observation from the frontier. The estimator saved by ; Eff = variable name is the inefficiency estimate, in this model, a direct estimate of ui. The estimator of technical or cost efficiency is Efficiency = exp ˆ( )iu−


If you fit a production frontier, use ; Techeff = variable name to save this variable. For a cost frontier, use ; Costeff = variable name

Figure E62.2 COLS Estimator of Cost Frontier Function

The following shows computation of a COLS estimator for the airlines. The FRONTIER command requests both the inefficiency estimates, ui, and the cost efficiency estimates, eui_cost. The kernel density estimate for the cost efficiency is shown in Figure E62.3. The results for the estimator begin with the standard output for least squares regression. The second panel includes some preliminary results for the stochastic frontier model, including the chi squared test for zero skewness (which is rejected); χ2 = (n/6)(m3/s3)2. The standard normal statistic is the signed (based on m3) square root of χ2. The third panel presents descriptive statistics for ui and exp(-ui).

CREATE ; lc = Log(cost/pp) ; lpkp = Log(pk/pp) ; lplp = Log(pl/pp) ; lpmp = Log(pm/pp) ; lpep = Log(pe/pp) ; lpfp = Log(pf/pp) $

CREATE ; lk = Log(k) $ CREATE ; ly = Log(output) ; ly2 = .5*ly*ly $ FRONTIER ; Lhs = lc ; Rhs = one,ly,ly2,lpkp,lplp,lpmp,lpep,lpfp

; Cost ; Model = COLS ; Costeff = Eui_cost ; Eff = ui $

KERNEL ; Rhs = eui_cost ; Title = Estimated Cost Efficiency Based on COLS Estimator $


-----------------------------------------------------------------------------------------------------------

Corrected OLS Deterministic Frontier Cost Function LHS=LC Mean = 2.84024 Standard deviation = 1.09256 No. of observations = 256 Degrees of freedom Regression Sum of Squares = 300.028 7 Residual Sum of Squares = 4.36487 248 Total Sum of Squares = 304.393 255 Standard error of e = .13267 Fit R-squared = .98566 R-bar squared = .98526 Model test F[ 7, 248] = 2435.25310 Prob F > F* = .00000 Diagnostic Log likelihood = 157.91523 Akaike I.C. = -4.00909 Restricted (b=0) = -385.41031 Bayes I.C. = -3.89830 Chi squared [ 7] = 1086.65108 Prob C2 > C2* = .00000 ---------------------------------------------------------------------

Skewness test for inefficiency based on residuals Normalized skewness = m3/s^3 = .21340 Chi squared test (1 degree of freedom) 1.94294 Critical value= 3.84000 Standard normal test statistic 1.39389 Test value = +/- 1.96000 Estimated Efficiency Values Based on e(i)+Min e(i) -----------+---------------------------------------------------------- | Mean Std.Dev. Minimum Maximum CostInef| .357 .133 .000 .773 Cost Eff| .706 .091 .462 1.000 -----------+----------------------------------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval -----------+----------------------------------------------------------------------------------------------- | Deterministic COLS Frontier Function Constant| 19.4363 27.45697 .71 .4790 -34.3783 73.2510 LY| .94303*** .01809 52.12 .0000 .90757 .97849 LY2| .08248*** .01236 6.67 .0000 .05825 .10671 LPKP| 1.42385 2.14849 .66 .5075 -2.78711 5.63480 LPLP| .01915 .10169 .19 .8506 -.18016 .21847 LPMP| .04504 1.41721 .03 .9746 -2.73264 2.82272 LPEP| -.57070 .67904 -.84 .4007 -1.90159 .76019 LPFP| -.04811** .01986 -2.42 .0154 -.08704 -.00919 -----------+----------------------------------------------------------------------------------------------- Note: ***, **, * ==> Significance at 1%, 5%, 10% level. -----------------------------------------------------------------------------------------------------------

Figure E62.3 Kernel Estimator for Cost Efficiency


E62.6.2 Modified OLS and Starting Values for the MLE Under the specific distributional assumptions of the half normal and exponential models, we do have method of moments estimators of the underlying parameters. They are based on the moment equations Var[e] = Var[v] + Var[u]

and Skewness[e] = Skewness[u] since v is symmetric. The left hand sides can be consistently estimated using the OLS residuals: m2 = (1/n)Σi ei

2

and m3 = (1/n)Σiei3.

Both of the functions on the right hand side are known for the half normal and exponential models. In particular, for the half normal model, the moment equations are m2 = σv

2 + [1 - 2/π]σu2 ,

m3 = (2/π)1/2[1 - 4/π]σu3.

The solutions are: 1/ 3

3 / 2ˆ1 4 /um π

σ = − π

and 22ˆ ˆ(1 2 / )v umσ = − − π σ .

Note that there is no solution for σu if m3 is not negative, which is the problem discussed in Section E62.5. Assuming that this problem does not arise, the corrected constant term is α = a + Est.E[u] = a + ˆ 2 /uσ π . This is the ‘modified least squares’ (MOLS) estimator that is discussed in a number of sources, such as Greene (2005). These are the values used for starting values for the MLE, as well. Looking ahead, note that there is no natural method of moments estimator for the mean parameter in the truncated normal model discussed in Section E63.3. For this model, we use µ /σu = 0. For the normal-exponential model, the moment equations that correspond to the preceding are m2 = σv

2 + 1/θ2

m3 = -2/θ3.

Therefore, 1/ 3

22

3

2ˆ ˆˆand 1/v mm −

θ = σ = − θ

and α = a + ˆ1/θ .


The header information in the results table will display the decomposition of the variance of the composed error in two parts. In the case of the half normal model, Var[u] = [(π-2)/π]σu

2 not σu

2. Therefore, the estimated parameters might be a bit misleading as to the relative influence of u on the total variation in the structural disturbance. We note, these estimators are sometimes quite far from the maximum likelihood estimators, particularly when the sample is small. But, they are generally quite satisfactory as starting values for the MLE. The following demonstrates these results for the airline data, where we use MOLS and MLE to fit a normal-half normal cost frontier. (Note, the signs of the OLS residuals are reversed because we are fitting a cost function.) In the results below, we have imposed the assumption of linear homogeneity in prices in the cost function by normalizing the six input prices, pk, pl, pe, pp, pm, pf, by the property price, pp. The model contains log(pj/pp). To complete the constraint, we have also normalized total cost by pp before taking logs.

CREATE ; lpk = Log(pk) $ CREATE ; lpmpp = lpm - lpp ; lpfpp = lpf - lpp ; lpepp = lpe - lpp

; lplpp = lpl - lpp ; lpkpp = lpk - lpp $ CREATE ; lcp = lc - lpp $

NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp $ REGRESS ; Lhs = lc ; Rhs = x ; Res = e $ CREATE ; e = -e ; e2 = e*e ; e3 = e2*e $ CALC ; m2 = Xbr(e2) ; m3 = Xbr(e3) $ CALC ; List ; su = (m3*Sqr(pi/2)/(1-4/pi))^(1/3) ; sv = Sqr(m2 - (1-2/pi)*su^2) ; a = b(1) + su*Sqr(2/pi) ; lambda = su/sv ; sgma = Sqr(su^2 + sv^2) $ FRONTIER ; Lhs = lc ; Rhs = x ; Cost $ The first set of results below are the OLS estimates with the correction to the constant term and the method of moments estimators of σu and σv used to start the MLE. The maximum likelihood estimators are shown next. The estimates for the stochastic frontier model include the log likelihood and the implied estimates of σu, σv and their squares, based on the estimates of λ = σu/σv and σ2 = σu

2 + σv

2, which are estimated by ML. (The reverse transformations are σu2 = σ2λ2/(1 + λ2) and σv

2 = σ2/(1 + λ2). The MLE is documented further in the next section. ----------------------------------------------------------------------------- Ordinary least squares regression ............ LHS=LC Mean = 2.84024 Standard deviation = 1.09256 No. of observations = 256 Degrees of freedom Regression Sum of Squares = 300.028 7 Residual Sum of Squares = 4.36487 248 Total Sum of Squares = 304.393 255 Standard error of e = .13267 Fit R-squared = .98566 R-bar squared = .98526 Model test F[ 7, 248] = 2435.25310 Prob F > F* = .00000 Diagnostic Log likelihood = 157.91523 Akaike I.C. = -4.00909 Restricted (b=0) = -385.41031 Bayes I.C. = -3.89830 Chi squared [ 7] = 1086.65108 Prob C2 > C2* = .00000


--------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error t |t|>T* Interval --------+-------------------------------------------------------------------- Constant| 19.7932 27.45697 .72 .4717 -34.0214 73.6079 LY| .94303*** .01809 52.12 .0000 .90757 .97849 LY2| .08248*** .01236 6.67 .0000 .05825 .10671 LPKP| 1.42385 2.14849 .66 .5081 -2.78711 5.63480 LPLP| .01915 .10169 .19 .8508 -.18016 .21847 LPMP| .04504 1.41721 .03 .9747 -2.73264 2.82272 LPEP| -.57070 .67904 -.84 .4015 -1.90159 .76019 LPFP| -.04811** .01986 -2.42 .0161 -.08704 -.00919 --------+--------------------------------------------------------------------

[CALC] SU = .1296481 [CALC] SV = .1046056 [CALC] A = 19.8966785 [CALC] LAMBDA = 1.2393989 [CALC] SGMA = .1665862 Calculator: Computed 5 scalar results

----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LCN Log likelihood function 159.20743 Estimation based on N = 256, K = 10 Inf.Cr.AIC = -298.4 AIC/N = -1.166 Variances: Sigma-squared(v)= .01021 Sigma-squared(u)= .01890 Sigma(v) = .10103 Sigma(u) = .13746 Sigma = Sqr[(s^2(u)+s^2(v)]= .17059 Gamma = sigma(u)^2/sigma^2 = .64927 Var[u]/{Var[u]+Var[v]} = .40216 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 157.91523 Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.584 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LCN| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869 LY| .95577*** .01781 53.68 .0000 .92088 .99067 LY2| .09086*** .01198 7.58 .0000 .06738 .11435 LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783 LPLP| .01242 .09676 .13 .8979 -.17722 .20205 LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883 LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275 LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096 |Variance parameters for compound error Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857 Sigma| .17059*** .00058 294.50 .0000 .16946 .17173 --------+--------------------------------------------------------------------


E62.7 Estimating the Normal-Half Normal and Normal-Exponential Models ALS’s canonical form of the model is the normal-half normal model, y = β′x + v - Su, u = |U|, S = +1 for production, -1 for cost,

U ~ N[0,σu2],

v ~ N[0,σv2].

The command for estimating the stochastic frontier model is FRONTIER ; Lhs = y ; Rhs = one, ... $ The default form is the normal-half normal model. In this form, model estimates consist of β,

2 2v uσ = σ + σ and λ = σu/σv, and the usual set of diagnostic statistics for models fit by maximum

likelihood. The other basic form in the ALS model is the exponential model, u ~ θ exp(-θu), u> 0, which has mean inefficiency E[u] = 1/θ and standard deviation, σu= 1/θ. The parameters estimated in the exponential specification are (β,θ,σv). The estimate of σu is reported in the results as well. The following illustrate the estimator, with a normal-half normal cost frontier and a normal-exponential production frontier. The coefficient estimates for the exponential cost frontier are shown as well. FRONTIER ; Cost ; Lhs = lcn ; Rhs = x $ FRONTIER ; Cost ; Lhs = lcn; Rhs = x; Model = Exponential $ The stochastic frontier results include the standard output for MLEs The derived estimates of σu, σv, σu

2, σv2 and σ are shown as well. The value of γ = σu

2/σ2 is given for comparability with other parts of the literature. This ratio, which lies in (0,1) is sometimes reported as a variance decomposition of ε. However, the variance of u = |U| is (1 - 2/π)σu

2, so the appropriate decomposition is (1 - 2/π)σu

2/[σv2 + (1 - 2/π)σu

2]. This is the value shown next under γ in the results. A likelihood ratio test against the hypothesis of no inefficiency follows the variance

estimates. The degrees of freedom for the test are accumulated in the table.. The first is for σu in the base case. The second is for the heteroscedasticity terms in Var[u] when they are introduced in the model. Heteroscedasticity is developed in Chapter E63. The third term is for the truncation parameters in the normal-truncated normal model, also developed in the next chapter. The “degrees of freedom for the inefficiency model” are the sum of these three terms. The likelihood ratio statistic is presented next. This is a nonstandard test because the null value of σu is on the boundary of the parameter space. Appropriate tables for the mixed chi squared test used here are given in Kodde and Palm (1986). (A copy of the relevant parts of the table is kept internally by the program. (See, also, Coelli, Rao and Battese (1998) for further details.)


----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LCN Log likelihood function 159.20743 Estimation based on N = 256, K = 10 Inf.Cr.AIC = -298.4 AIC/N = -1.166 Variances: Sigma-squared(v)= .01021 Sigma-squared(u)= .01890 Sigma(v) = .10103 Sigma(u) = .13746 Sigma = Sqr[(s^2(u)+s^2(v)]= .17059 Gamma = sigma(u)^2/sigma^2 = .64927 Var[u]/{Var[u]+Var[v]} = .40216 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 157.91523 Chi-sq=2*[LogL(SF)-LogL(LS)] = 2.584 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LCN| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 19.8020 25.91115 .76 .4447 -30.9829 70.5869 LY| .95577*** .01781 53.68 .0000 .92088 .99067 LY2| .09086*** .01198 7.58 .0000 .06738 .11435 LPKP| 1.43400 2.02750 .71 .4794 -2.53982 5.40783 LPLP| .01242 .09676 .13 .8979 -.17722 .20205 LPMP| .05744 1.33747 .04 .9657 -2.56396 2.67883 LPEP| -.56860 .64356 -.88 .3770 -1.82995 .69275 LPFP| -.06002*** .01993 -3.01 .0026 -.09907 -.02096 |Variance parameters for compound error Lambda| 1.36059*** .20306 6.70 .0000 .96261 1.75857 Sigma| .17059*** .00058 294.50 .0000 .16946 .17173 --------+-------------------------------------------------------------------- Note: ***, **, * ==> Significance at 1%, 5%, 10% level. ----------------------------------------------------------------------------- Results for the normal-exponential model appear below. It is not possible to use a LR test to choose between these two models. The test has zero degrees of freedom – neither model is obtained by a restriction on the other. One possibility might be a Vuong (1989) statistic, which would be computed as

, log( | ) log( | )i i im

n mV m f normal f exponentials

= = − .

Results of the test are shown below the model results. The statistic is well inside the inconclusive region.


----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LCN Log likelihood function 159.89917 Estimation based on N = 256, K = 10 Inf.Cr.AIC = -299.8 AIC/N = -1.171 Exponential frontier model Variances: Sigma-squared(v)= .01147 Sigma-squared(u)= .00568 Sigma(v) = .10709 Sigma(u) = .07539 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 157.91523 Chi-sq=2*[LogL(SF)-LogL(LS)] = 3.968 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LCN| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038 LY| .96069*** .01892 50.77 .0000 .92360 .99777 LY2| .09281*** .01249 7.43 .0000 .06832 .11729 LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272 LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216 LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275 LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114 LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412 |Variance parameters for compound error Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630 Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629 --------+-------------------------------------------------------------------- FRONTIER ; … half normal model $ CREATE ; fn = logl_obs $ FRONTIER ; … Model = Exponential $ CREATE ; fe = logl_obs ; mi = fn - fe $ CALC ; List ; vuong = Sqr(n) * Xbr(mi)/Sdv(mi) $

[CALC] VUONG = -.9047927


E62.7.1 Log Likelihoods for the Half Normal and Exponential Models As will be evident below, different formulations of the log likelihood are most convenient for estimation of the different forms of the frontier models. (And, different authors sometimes parameterize the models differently.) The base case is the normal-half normal model. In this form, vi~ N[0,σv

2] and ui = |Ui| where Ui ~ N[0,σu2]. It follows that f(ui) = 2φ(ui/σu), ui> 0. The density of

εi = vi- ui has been shown to be f(εi) = (2/σ)φ(εi/σ)Φ(-εiλ/σ). The most common form of the individual term in the log likelihood function (and the one used in LIMDEP) is log Li = ½ log(2/π) - logσ - ½(εi/σ)2 + logΦ[-Sεiλ/σ]

where εi = yi - β′xi

λ = σu / σv,

σ2 = σu2 + σv

2, σv2 = σ2 / (1 + λ2), σu

2 = σ2λ2 / (1 + λ2)

S = +1 for production frontier, -1 for cost frontier Olsen’s transformation is used for maximizing the log likelihood. We reparameterize the function in terms of η = 1/σ andγ = (1/σ)β. Then, log Li = ½ log(2/π) + logη + ½ωi

2 + log Φ(-Sωiλ)

where ωi = ηyi - γ′xi.

Define the functions ai = -Sωiλ

δi = φ(ai)/Φ(ai)

∆i = -aiδi = δi2.

Then, the gradient and Hessian are

0

log / 1 /0 0

i i

i i i i i

i

L y S yλ

∂ ∂ η = ω − + δ −λ + η λ ω

x xγ

2 2

2 2

2 2 2 2

2

0

log / 00 0

S

1 /0

i i

i i i i

i i i i i i i i

i i i i i i i i

i i i ii i i i

L y y

y

y y y SyS Syy

′′ ′∂ ∂ η ∂ η = − − + ′ λ λ

′λ −λ −λω −δ ′ ′∆ −λ λ λω − η δ

′−δ δ ′−λω λω ω

x x 0

x0

x x x x 0 0 x

x 0xx

γ γ


The log likelihood for the exponential model is

log Li= logθ + ½θ2σv2 + θSεi+ logΦ[-Sεi/σv - θσv].

The parameter θ in the exponential model is 1/σu. The Olsen transformation is not useful for this model. Define ci = -Sεi/σv - θσv, δi = φ(ci), ∆i = -ciδi - δi

2 and ai = Sεi/σv - θ. The gradient and Hessian for the exponential model are

2

2 2

2

2 2

2

/log / 1 /

/

/ /

log /

/

ii v

i i v v i

v i v v

i i v i i i v

i i i v v

v v i i v i v i

SSL S

S

S a S

L S a

a S a a

−θ σ ∂ ∂ θ = δ −σ + θ + θσ + ε σ ε σ − θ θ σ

′ σ − σ′ ′∂ ∂ θ θ = ∆ − σ − σ σ σ ′ σ − σ

xx

x x x x

x

x

β

β β

2 2

2 3

0

1 / 2

2 2 /

i i i

i v v i

i i v i i i v

S S

S

S S

− −δ

′+ − − θ + σ θσ − δ ′−δ θσ − δ θ − δ ε σ

x x

x

x

E62.7.2 Alternative Parameterization

Some treatments of the normal-half normal model (e.g., Coelli (1996)) use the alternative parameterization γ = σu

2 / σ2 in the formulation of the log likelihood. This does not change the model, since it is a one to one transformation of the parameters;

1γ

λ =− γ

.

The parameterization in terms of λ is more convenient but does not produce different results.

E62.7.3 Variance Estimator in Frontier 4.1 A number of researchers have used Tim Coelli’s (1996) Frontier 4.1 program for estimation of stochastic frontier models. Frontier 4.1 and LIMDEP use different methods for computing estimators of the asymptotic covariance matrix of the ML estimator. LIMDEP uses either the BHHH estimator or the negative inverse of the Hessian. Frontier 4.1 used the weighting matrix used by the DFP algorithm to approximate the inverse Hessian during the iterations. As a general proposition, we recommend against this ‘estimator,’ and never use it. There is no theoretical assurance of its accuracy if convergence is reached in a finite number of iterations. Nonetheless, we have been asked about this many times. In the interest of methodological advance, LIMDEP provides a command switch, ; F41 that will invoke this estimator. (This is only provided for the stochastic frontier estimators.) No indication is given in the output that this option has been used.


E62.8 Estimating Inefficiency and Efficiency Measures The main objectives of fitting the frontier models is to estimate the inefficiency terms in the stochastic model, ui, by observation. The Jondrow et al. estimator of E[u|v-u] is the standard estimator. This is

2

( )ˆ[ | ] ,1 1 ( )

wE u w v uw

σλ φε = − ε = ± + λ −Φ

, w =Sελ/σ

(This is an indirect estimator of u. Unfortunately, it is not possible to estimate ui directly from any observed sample information. The various surveys noted earlier discuss the computation of and properties of this estimator.) The counterpart for the normal-exponential model is

( )ˆ[ | ]1 ( )v

wE u ww

φε = σ − −Φ

, w = (Sε/σv + θσv)

These are computed and saved as new variables in your data set with ; Eff = variable name The ; List specification will also request a listing of this variable. This form is used for all distributions and all variations of the stochastic frontier model. By adding ; Eff = u to the frontier command, then KERNEL ; Rhs = u $ we obtain the results below. (We also added the title to the command with ; Title = …) Note an important element of the estimation. The ‘Standard Deviation’ reported below is 0.054895, whereas the estimate of σu is 0.13746. The difference arises because the 0.054895 is an estimate of the standard deviation of E[u|ε], not the standard deviation of u. +---------------------------------------+ | Kernel Density Estimator for U | | Observations = 256 | | Points plotted = 256 | | Bandwidth = .016298 | | Statistics for abscissa values---- | | Mean = .109394 | | Standard Deviation = .054895 | | Minimum = .030722 | | Maximum = .350422 | | ---------------------------------- | | Kernel Function = Logistic | | Cross val. M.S.E. = .000000 | | Results matrix = KERNEL | +---------------------------------------+


Figure E62.4 Analysis of Estimated Inefficiencies

E62.8.1 Estimating Technical or Cost Efficiency One might be interested in estimating the ‘efficiency’ of the individuals in the sample. The model is usually specified in logs, of the form log y = β′x + v - u. Under this assumption, the efficiency of the individual would be

EFF = Exp( )

y uOptimal y

≈ −

This can be obtained with ; Techeff = the variable name or ; Costeff = the variable name if you estimate a cost frontier instead. You may compute both inefficiencies and efficiency measures in the same command. Figure E62.5 was obtained by adding ; Costeff = ecu to the FRONTIER command, then requesting the kernel density estimator as before (with the title changed accordingly).


Figure E62.5 Estimated Cost Efficiencies

E62.8.2 Confidence Intervals for Inefficiency and Efficiency Estimates Horrace and Schmidt (1996, 2000) suggest a useful extension of the Jondrow et al. result. JLMS have shown that the distribution of ui|εi is that of a N[μi*,σ*] random variable, truncated from the left at zero, where μi* = -εiλ2/(1+λ2) and σ* = σ

For locating 100(1-α)% of the conditional distribution of ui|εi, we use the following system of equations

λ/(1+λ2). This result and standard results for the truncated normal distribution (see, e.g., Greene (2011)) can be used to obtain the conditional mean and variance of ui|εi. With these in hand, one can construct some of the features of the distribution of ui|εi or E[TEi|εi] = E[exp(-ui|εi]. The literature on this subject, including the important contributions of Bera and Sharma (1999) and Kim and Schmidt (2000) refer generally to ‘confidence intervals’ for ui|εi. For reasons that will be clear shortly, we will not use that term – at least not yet, until we have made more precise what we are estimating.

σ2 = σv2 + σu

2

λ = σu/σv

µi* = -εiσu2/σ2 = -εiλ2/(1+λ2)

σ* = σuσv/σ = σλ/(1 + λ2)

( )( )

12

12

* * 1 (1 ) * / *

* * 1 * / *i

i

LB

UB

−

−

α

α

= µ + σ Φ − − Φ µ σ = µ + σ Φ − Φ µ σ

i i

i i Then, if the elements were the true parameters, the region [LBi,UBi] would encompass 100(1-α)% of the distribution of ui|εi. For constructing ‘confidence intervals’ for technical efficiency, TEi|εi, it is necessary only to compute TEUBi = exp(-LBi) and TELBi = exp(-UBi).


We note two caveats about the estimator. First, the received papers based on classical methods have labeled this a confidence interval for ui. However, it is a range that encompasses 100(1-α)% of the probability in the conditional distribution of ui|εi. based on E[ui|εi], not ui, itself. The interval is ‘centered’ at the estimator of the conditional mean, E[ui|εi], not the estimator of ui, itself, as a conventional ‘confidence interval’ would be. The estimator is actually characterizing the conditional distribution of ui|εi, not constructing any kind of interval that brackets a particular ui – that is not possible. Second, these limits are conditioned on known values of the parameters, so they ignore any variation in the parameter estimates used to construct them. Thus, we regard this as a minimal width interval. You can request computation of these lower and upper bounds by adding ; CI(100( 1 - α )) = lower, upper where 100(1-α) is one of 90, 95, or 99 and lower, upper are names for two variables that will be created. You may use this feature with ; Eff = variable or ; Techeff = variable (or ; Costeff = variable for a cost frontier). If you have both ; Eff and ; Techeff in the command, the confidence intervals are computed for ; Techeff. (You can obtain the interval for ; Eff in this case by computing the negatives of the logs with CREATE.)

We obtained these bounds for our cost function with ; Costeff = euc ; CI(95) = eucl,eucu We followed the estimation with

PLOT ; Rhs = eucl,ecu,eucu ; Title = Upper and Lower Bound Estimates of Cost Efficiency ; Vaxis = Cost Efficiency$

to obtain Figure E62.6.

Figure E62.6 Lower and Upper Bound Estimates of Cost Efficiency


The centipede plot is also a useful device in this context. The following redraws Figure E62.6 using a different view for the lower and upper bounds CREATE ; Firm_i = Trn(1,1) $ PLOT ; Lhs = firm_i ; Rhs = eucl,eucu ; Centipede ; Endpoints = 0,260 ; Grid ; Title = Confidence Limits for Cost Efficiency $

Figure E62.7 Centipede Plot of Efficiency Bounds

E62.8.3 Partial Effects on Efficiencies The variables in the production or cost frontier function begin with either the inputs for the production model or input prices and outputs in the cost model. Analyses of how these variables affect technical or cost efficiency are not likely to be particularly revealing. However, if the function includes environmental variables (we call these zi), it might be of interest to examine how variation in these impacts efficiency. For our example, we consider Log(Cost/Pp) = α + βq logQ + βqq log2Q + Σkβk log(Pk/Pp)

+ δLload factor + δNnodes + δSLog stage length + v + u In this case, it might be interesting to examine how increased load factor, route complexity, or stage length impact efficiency. Expressions for the technical inefficiency values appear at the beginning of Section E62.8. In those expressions, we will use

Efficiency = exp{- ˆ[ | ]E u ε }. The two expressions for the normal and exponential models are functions of a w(ε) that is specific to the model. Each may be written as Efficiency = exp{-τmA[wm(ε)]}


Where m = half normal or exponential, τm = σλ/(1+λ2) for the half normal and 1/σv for the exponential, and wm is defined earlier. We now suppose that ε = y - β′x - δ′z where x is the theoretical inputs to the goal and z are the environmental variables. We require the derivatives with respect to z. For convenience, let W = -w and exploit the symmetry of the normal density. Then, A[wm(ε)] = [φ(W)/Φ(W) + W]. The derivative is ∂Efficiency/∂z = Efficiency×-τm×dA(W)/dW× -1 ×∂wm/∂ε× -δ. The two terms that we need to complete the derivation are ∂wm/∂ε = Sλ/σ for the half normal model and S/σv for the exponential model and

2

( ) ( ) ( )1 ( ).( ) ( )

dA W W W W D WdW W W

φ φ= − − = Φ Φ

Collecting terms,

2 2/(1 )( ) ( )

1

Efficiency Efficiency D W Sorz

λ + λ∂ = × × × × − ∂

δ

We can sign this result, though the magnitude will be empirical. The first three terms are all between zero and one, as is their product. S is either +1 for a production frontier or -1 for a cost frontier. Thus, in total, the derivative is a fraction of the corresponding coefficient, which takes the same sign for a cost frontier and the opposite sign for a production frontier. Partial derivatives and simulations are computed with PARTIALS and SIMULATE. The general approach would be FRONTIER ; Cost (optional) ; Lhs = goal variable

; Rhs = one, x variables, z variables $ The command might also contain ; Eff = variable, ; Techeff = variable or ; Costeff = variable. Then, you may follow it with PARTIALS ; Effects: variables desired ; other options $ or SIMULATE ; Scenario … all options $ The function analyzed in these two commands is the technical or cost efficiency,

Efficiency = exp{- ˆ[ | ]E u ε }.


The following demonstrates using the cost frontier, with variables z = (load factor, log stage length, points served). Data on z are missing for one of the firms. CREATE ; logstage = Log(stage) $ NAMELIST ; x = one,ly,ly2,,lpkp,lplp,lpmp,lpep,lpfp ; z = loadfctr,logstage,points $

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Eff = u ; Costeff = euc ; CI(95) = eucl,eucu $

SIMULATE ; Scenario: & loadfctr = .4(.025)1 ; Plot(ci) $ ----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LC Log likelihood function 215.15699 Estimation based on N = 256, K = 13 Inf.Cr.AIC = -404.3 AIC/N = -1.579 Variances: Sigma-squared(v)= .00820 Sigma-squared(u)= .00753 Sigma(v) = .09054 Sigma(u) = .08676 Sigma = Sqr[(s^2(u)+s^2(v)]= .12539 Gamma = sigma(u)^2/sigma^2 = .47870 Var[u]/{Var[u]+Var[v]} = .25020 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 214.75424 Chi-sq=2*[LogL(SF)-LogL(LS)] = .806 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835 LY| .97398*** .01751 55.63 .0000 .93966 1.00829 LY2| .05123*** .01029 4.98 .0000 .03106 .07140 LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193 LPLP| .13721* .08121 1.69 .0911 -.02195 .29637 LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642 LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818 LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427 LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273 LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979 POINTS| .00164*** .00031 5.20 .0000 .00102 .00225 |Variance parameters for compound error Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890 Sigma| .12539*** .00039 321.29 .0000 .12463 .12616 --------+--------------------------------------------------------------------


--------------------------------------------------------------------- Model Simulation Analysis for JLMS efficiency estimator in SF model --------------------------------------------------------------------- Simulations are computed by average over sample observations --------------------------------------------------------------------- User Function Function Standard (Delta method) Value Error |t| 95% Confidence Interval --------------------------------------------------------------------- Avrg. Function .93354 .00635 147.07 .92110 .94598 LOADFCTR= .40 .95844 .00346 277.19 .95166 .96522 LOADFCTR= .43 .95502 .00344 277.54 .94827 .96176 LOADFCTR= .45 .95123 .00357 266.70 .94424 .95822 LOADFCTR= .48 .94706 .00392 241.56 .93937 .95474 LOADFCTR= .50 .94247 .00456 206.48 .93353 .95142 LOADFCTR= .53 .93746 .00552 169.87 .92664 .94828 (some rows omitted) LOADFCTR= .83 .84622 .03145 26.91 .78458 .90786 LOADFCTR= .85 .83696 .03384 24.73 .77063 .90329 LOADFCTR= .88 .82763 .03616 22.89 .75676 .89850 LOADFCTR= .90 .81827 .03839 21.32 .74303 .89352 LOADFCTR= .93 .80892 .04053 19.96 .72947 .88836 LOADFCTR= .95 .79958 .04259 18.78 .71611 .88305 LOADFCTR= .98 .79029 .04455 17.74 .70296 .87761

Figure E62.8 Simulated Cost Efficiency Values

We have also analyzed the partial effects.

FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $ PARTIALS ; Effects: loadfctr & loadfctr = .4(.025)1 ; Plot(ci) $ PARTIALS ; Effects: z ; Summary $


--------------------------------------------------------------------- Partial Effects Analysis for JLMS efficiency estimator in SF model --------------------------------------------------------------------- Effects on function with respect to LOADFCTR Results are computed by average over sample observations Partial effects for continuous LOADFCTR computed by differentiation Effect is computed as derivative = df(.)/dx --------------------------------------------------------------------- df/dLOADFCTR Partial Standard (Delta method) Effect Error |t| 95% Confidence Interval --------------------------------------------------------------------- APE. Function -.22444 .06690 3.35 -.35557 -.09331 LOADFCTR= .40 -.13020 .02575 5.06 -.18067 -.07973 LOADFCTR= .43 -.14405 .03134 4.60 -.20547 -.08263 LOADFCTR= .45 -.15900 .03766 4.22 -.23281 -.08519 LOADFCTR= .48 -.17497 .04464 3.92 -.26246 -.08748 (Some rows omitted) LOADFCTR= .85 -.37205 .09615 3.87 -.56051 -.18359 LOADFCTR= .88 -.37392 .09265 4.04 -.55551 -.19234 LOADFCTR= .90 -.37452 .08896 4.21 -.54887 -.20017 LOADFCTR= .93 -.37403 .08524 4.39 -.54109 -.20697 LOADFCTR= .95 -.37265 .08160 4.57 -.53259 -.21271 LOADFCTR= .98 -.37054 .07813 4.74 -.52368 -.21739

Figure E62.9 Partial Effects of Load Factor

--------------------------------------------------------------------- Partial Effects for JLMS efficiency estimator in SF model Partial Effects Averaged Over Observations * ==> Partial Effect for a Binary Variable --------------------------------------------------------------------- Partial Standard (Delta method) Effect Error |t| 95% Confidence Interval --------------------------------------------------------------------- LOADFCTR -.25723 .07389 3.48 -.40205 -.11240 LOGSTAGE -.04620 .01292 3.58 -.07153 -.02088 POINTS .00035 .00012 2.95 .00012 .00058 ---------------------------------------------------------------------


E62.8.4 Partial Effects of Model Variables on Efficiencies The preceding has examined the partial effects with respect to z in the model y = β′x + δ′z + v-Su. It was noted that partial effects with respect to x are not likely to be particularly interesting. Nonetheless, they could be computed. NOTE: Partial effects of variables in the stochastic frontier efficiency models may be computed with respect to any variable in any model, regardless of where those variables appear in the model. That includes x in the original frontier model, z in the means of the truncated regression formats, and z in the variances of the heteroscedasticity models. To continue the earlier example, the partial effect of LogQ could be computed in the cost function using

NAMELIST ; x = one,lq,lq^2,lpmpp,lpfpp,lpepp,lplpp,lpkpp $ NAMELIST ; z = loadfctr,logstage,points $ FRONTIER ; Cost ; Lhs = lcp ; Rhs = x,z $ PARTIALS ; Effects : lq ; summary $

Note that the specification will correctly account for the fact that the square of LogQ appears in the cost function when it computes the partial effects. E62.8.5 Examining Ranks of Inefficiencies Researchers often analyze outcome data in which the absolute values of the inefficiencies are not necessarily of interest. Rather, it is the ranking of observations that they wish to analyze. The WHO analysis of health care attainment (see Section E62.4.2) is a prominent example. LIMDEP provides several tools for examining ranks of inefficiencies. First, to rank the raw observations on efficiency or inefficiency, use CREATE ; rank variable = Rnk(variable) $ The Rnk function sorts the data for you and creates the ranking variable. The observation with the highest value gets the rank of one. The lowest gets a rank of n. Note, tied observations do not get the same rank. Tied observations are ranked in the order in which they appear in the data. For example, in a sample of 100, if 10 observations are tied for third place, they will receive ranks 3 through 12. Two CALC functions provide descriptive measures for ranks. For two sets of ranks, the Spearman rank correlation coefficient is computed as ρ = 1 - 6 Σidi

2 /n(n2 - 1),

di= variable1i - variable2i


The function for computing this is

CALC ; List ; Rkc(variable1,variable2) $ The rank correlation is a correlation coefficient, so it has a natural range of measurement. (See the application below.) For more than two sets of ranks, a useful statistic is Kendall’s coefficient of concordance, W = 12

1

n

i=∑ (Si - S )2/[nK2(n2 - 1)]

where Si = Σkrankk,i. To compute this measure, use

CALC ; List ; Cnc(ranks1,...,ranksK) $

The concordance coefficient is not a correlation coefficient, so its magnitude is ambiguous. It can be used for a large sample test of discordance. Under the null hypothesis that the sets of ranks are independent, the statistic has a large sample chi squared distribution. In particular, K(n-1)W → χ2[K(n-1)] To illustrate these computations, we have analyzed the WHO data described in Section E62.4.2. We have fit identical stochastic frontier models for the two attainment variables, lcomp, the log of the composite measure, and ldale, the log of disability adjusted life expectancy. We then computed the ranks for the 191 countries and plotted the ranks for the two measures as well as the raw efficiency measures. The simple correlation for the efficiency measures and the rank correlation for the ranks are displayed. The commands are as follows: NAMELIST ; x = one,logebar,loghbar,loghbar2 $ NAMELIST ; z = gini,lpopden,lgdpc,geff,voice,oecd,lpubthe,tropics $ FRONTIER ; Lhs = logdbar ; Rhs = x,z ; Eff = udale ; techeff = edale $ FRONTIER ; Lhs = logcbar ; Rhs = x,z ; Eff = ucomp ; techeff = ecomp $ CREATE ; dalerank = 192 - Rnk(edale) $ CREATE ; comprank = 192 - Rnk(ecomp) $ PLOT ; Lhs = dalerank ; Rhs = comprank ; Endpoints = 0,200 ; Limits = 0,200 ; Title = Ranks of Efficiencies: DALE vs. COMP $ PLOT ; Lhs = edale ; Rhs = ecomp ; Endpoints = .8,1 ; Grid ; Title = Efficiencies: DALE vs. COMP $ CALC ; List ; Rkc(dalerank,comprank) $ CALC ; List ; Cor(edale,ecomp) $


----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LOGDBAR Log likelihood function 155.83849 Estimation based on N = 191, K = 14 Inf.Cr.AIC = -283.7 AIC/N = -1.485 Variances: Sigma-squared(v)= .00145 Sigma-squared(u)= .03288 Sigma(v) = .03808 Sigma(u) = .18134 Sigma = Sqr[(s^2(u)+s^2(v)]= .18529 Gamma = sigma(u)^2/sigma^2 = .95777 Var[u]/{Var[u]+Var[v]} = .89180 Stochastic Production Frontier, e = v-u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 141.59006 Chi-sq=2*[LogL(SF)-LogL(LS)] = 28.497 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LOGDBAR| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 2.60812*** .18255 14.29 .0000 2.25034 2.96590 LOGEBAR| .11227*** .01869 6.01 .0000 .07564 .14891 LOGHBAR| .30118*** .05072 5.94 .0000 .20177 .40059 LOGHBAR2| -.02710*** .00455 -5.96 .0000 -.03601 -.01818 GINI| -.30417*** .10600 -2.87 .0041 -.51192 -.09642 LPOPDEN| .00213 .00402 .53 .5955 -.00574 .01001 LGDPC| .07541*** .02424 3.11 .0019 .02789 .12293 GEFF| -.00673 .01551 -.43 .6642 -.03714 .02367 VOICE| .02093* .01113 1.88 .0601 -.00089 .04275 OECD| .01608 .03055 .53 .5987 -.04381 .07596 LPUBTHE| .00974 .01497 .65 .5150 -.01959 .03908 TROPICS| -.03703** .01714 -2.16 .0307 -.07063 -.00344 |Variance parameters for compound error Lambda| 4.76248*** 1.22054 3.90 .0001 2.37026 7.15470 Sigma| .18529*** .00086 214.30 .0000 .18360 .18698 --------+--------------------------------------------------------------------


----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LOGCBAR Log likelihood function 248.18065 Estimation based on N = 191, K = 14 Inf.Cr.AIC = -468.4 AIC/N = -2.452 Variances: Sigma-squared(v)= .00142 Sigma-squared(u)= .00888 Sigma(v) = .03768 Sigma(u) = .09421 Sigma = Sqr[(s^2(u)+s^2(v)]= .10147 Gamma = sigma(u)^2/sigma^2 = .86207 Var[u]/{Var[u]+Var[v]} = .69429 Stochastic Production Frontier, e = v-u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 241.57767 Chi-sq=2*[LogL(SF)-LogL(LS)] = 13.206 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LOGCBAR| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 3.21081*** .10704 30.00 .0000 3.00101 3.42060 LOGEBAR| .06590*** .01319 4.99 .0000 .04004 .09177 LOGHBAR| .18617*** .03763 4.95 .0000 .11240 .25993 LOGHBAR2| -.01509*** .00328 -4.61 .0000 -.02151 -.00867 GINI| -.25334*** .07579 -3.34 .0008 -.40189 -.10478 LPOPDEN| .00523* .00281 1.86 .0628 -.00028 .01073 LGDPC| .05747*** .01681 3.42 .0006 .02453 .09040 GEFF| .00290 .01068 .27 .7858 -.01803 .02384 VOICE| .02082** .00872 2.39 .0170 .00373 .03791 OECD| .01699 .01946 .87 .3827 -.02115 .05513 LPUBTHE| .01798** .00903 1.99 .0466 .00027 .03568 TROPICS| -.02365** .01191 -1.99 .0471 -.04700 -.00031 |Variance parameters for compound error Lambda| 2.50000*** .41784 5.98 .0000 1.68104 3.31896 Sigma| .10147*** .00045 224.53 .0000 .10058 .10235 --------+-------------------------------------------------------------------- [CALC] *Result*= .6353076 [CALC] *Result*= .6062125


Figure E62.10 Ranks and Estimates of Efficiency


E62.9 Partially Nonparametric Stochastic Frontier Model The stochastic frontier is fully parametric in both the deterministic part of the frontier and the distribution of the components of εi. This section examines a partially nonparametric model of the form y = g(x,z) + v – Su. The estimator is based on the locally linear regression in Section E9.5. The underlying logic is the result that in the SF model, apart from the constant term, OLS consistently estimates the slope parameters of the model and estimates the constant term with a known bias. For the constant, a, the bias is E[u], the unconditional mean, which in the stochastic frontier model is

E[u] = 2 /uσ π . Continuing this approach, then, the least squares residuals estimate εi + E[u]. In addition, the least squares residual variance, e′e/n, consistently estimates Var[εi] = θ2 = σv

2 + [(1 – 2/π)σu2]. The

implication is that the only parameter remaining to estimate is σu2. In Section E62.6.2, we used the

third moment of the OLS residuals and the method of moments to estimate σu, then used this estimate to estimate α, the constant term in the frontier function. The approach proposed here uses this same method with three differences.

1. The residuals used to compute the variance estimator are based on a locally linear, nonparametric estimator of the deterministic function.

2. The remaining parameter to be estimated in this case is λ rather than σu. We will base the

estimation on the result 2 2 2 2/ (1 ).uσ = σ λ + λ

3. The approach will be based on a maximum likelihood estimator rather than the method of moments.

Estimation uses the following steps: We begin with estimation of the conventional normal-half normal frontier model with a linear frontier function in order to obtain an initial estimator of λ and of θ2. The LOWESS estimator developed in Chapter E9.5 is then employed to estimate g(x,z) for each point in the sample. The residuals from the estimated functions are used with the estimate of θ2 for estimation of λ. With θ2 and λ in hand, we can compute the constant term, a set of residuals, and the JLMS estimators of technical or cost efficiency. Technical details appear in Section E62.9.2.


E62.9.1 Application We have reestimated the airlines cost frontier with the semiparametric estimator. The frontier functions differ noticeably, primarily in the parameter estimates that are statistically insignificant. The kernel estimators suggest, however, that the difference in the estimates of inefficiency are quite modest. The descriptive statistics suggest the same pattern. The final plot shows more graphically how the nonparametric function has changed the estimates. The fact that most of the estimates from the nonparametric estimator lie below the 45 degree line is consistent with the appearance that generally, they are smaller than the parametric values. The last set of results are the ordinary (Pearson) correlation and Kendall’s tau.

FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Costeff = eup $ FRONTIER ; Cost ; Lhs = lc ; Rhs = x,z ; Lowess ; Costeff = eunp$ KERNEL ; Rhs = eunp,eup

; Title = Estimated Inefficiencies from Parametric and Nonparametric Frontiers $

DSTAT ; Rhs = eup,eunp $ PLOT ; Lhs = eup ; Rhs = eunp ; Rh2 = eup ; Fill ; Grid ; Vaxis = EUNP

; Title = Nonparametric vs. Parametric Estimates $ CALC ; List; Cor(eup,eunp) ; Ktr(eup,eunp) $

----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LC Log likelihood function 215.15699 Estimation based on N = 256, K = 13 Variances: Sigma-squared(v)= .00820 Sigma-squared(u)= .00753 Sigma(v) = .09054 Sigma(u) = .08676 Sigma = Sqr[(s^2(u)+s^2(v)]= .12539 Gamma = sigma(u)^2/sigma^2 = .47870 Var[u]/{Var[u]+Var[v]} = .25020 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 214.75424 Chi-sq=2*[LogL(SF)-LogL(LS)] = .806 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 -----------------------------------------------------------------------------


--------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 9.19939 21.64273 .43 .6708 -33.21957 51.61835 LY| .97398*** .01751 55.63 .0000 .93966 1.00829 LY2| .05123*** .01029 4.98 .0000 .03106 .07140 LPKP| .49455 1.69257 .29 .7701 -2.82283 3.81193 LPLP| .13721* .08121 1.69 .0911 -.02195 .29637 LPMP| .45863 1.11624 .41 .6812 -1.72915 2.64642 LPEP| -.10302 .53634 -.19 .8477 -1.15422 .94818 LPFP| -.02090 .01794 -1.16 .2441 -.05607 .01427 LOADFCTR| -.99466*** .17446 -5.70 .0000 -1.33660 -.65273 LOGSTAGE| -.17940*** .02531 -7.09 .0000 -.22902 -.12979 POINTS| .00164*** .00031 5.20 .0000 .00102 .00225 |Variance parameters for compound error Lambda| .95827*** .16869 5.68 .0000 .62763 1.28890 Sigma| .12539*** .00039 321.29 .0000 .12463 .12616 --------+-------------------------------------------------------------------- +-----------------------------------------------+ | Locally linear weighted regression estimation | | Sample size 256 | | Model size 11 | | Band width .500000 | | LOESS Sum of Squared Residuals 1.69637 | | OLS Sum of Squared Residuals 2.79975 | | Derivatives Matrix LOCLBETA | +-----------------------------------------------+ Reestimating lambda using residuals based on LOWESS regression Normal exit: 3 iterations. Status=0, F= -337.3385 ----------------------------------------------------------------------------- Partially Nonparametric Stochastic Frontier Fit by LOWESS Dependent variable LC Estmation based on N = 256, K = 11 Variances: Sigma-squared(u)= .00438 Sigma(u) = .06616 Sigma-squared(v)= .00504 Sigma(v) = .07096 Sigma = Sqr[(s^2(u)+s^2(v)]= .09702 Lambda = .93233 Stochastic Cost Frontier Model, e = v+u ----------------------------------------------------------------------------- Statistical results are for the sample means of the LOWESS estimated betas. They are not moments of an asymptotic distribution. --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- Constant| 34.8551 23.42958 1.49 .1368 -11.0661 80.7762 LY| .98897*** .05040 19.62 .0000 .89018 1.08775 LY2| .04598*** .01677 2.74 .0061 .01310 .07885 LPKP| 2.48149 1.78813 1.39 .1652 -1.02319 5.98616 LPLP| .09976 .10851 .92 .3579 -.11292 .31244 LPMP| -.85374 1.34656 -.63 .5261 -3.49295 1.78547 LPEP| -.71103 .43514 -1.63 .1023 -1.56389 .14183 LPFP| -.02183 .03324 -.66 .5114 -.08698 .04332 LOADFCTR| -.78691 .65061 -1.21 .2265 -2.06208 .48826 LOGSTAGE| -.20490* .11308 -1.81 .0700 -.42653 .01672 POINTS| .00225 .00205 1.10 .2710 -.00176 .00627 --------+-------------------------------------------------------------------- Note: ***, **, * ==> Significance at 1%, 5%, 10% level. -----------------------------------------------------------------------------


Descriptive Statistics --------+--------------------------------------------------------------------- Variable| Mean Std.Dev. Minimum Maximum Cases Missing --------+--------------------------------------------------------------------- EUP| .933537 .025027 .812486 .975689 256 0 EUNP| .948487 .019528 .844732 .983878 256 0 --------+--------------------------------------------------------------------- [CALC] *Result*= .8690148 [CALC] *Result*= .6339461 Calculator: Computed 2 scalar results

Figure E62.11 Kernel Estimators of Inefficiency Distributions

Figure E62.12 Plot of Nonparametric Estimates vs. Parametric Estimates


E62.9.2 Technical Details The log likelihood function for the normal-half normal model is the sum of log Li = ½ log(2/π) - logσ - ½(εi/σ)2 + logΦ[-Sεiλ/σ]. The value of θ2= σv

2 + [(1 – 2/π)σu2]is estimated using the squared LOWESS residuals; it is the

sample variance = q2. The LOWESS residuals, themselves, are estimates of εi + E[ui]. With q2 and the residuals in hand, the log likelihood is a function only of λ. During the iteration, we compute a = λ/(1+λ2)1/2,

s2 = q2 / (1 – (2/π)a2), then s

m = as 2 / π

ei = residuali - m. These residuals and s are used to compute logLi and the derivative with respect to λ. This estimation step provides the estimator of λ that we need to compute the efficiencies. After estimation of λ, computation of the JLMS estimates of inefficiency is done the same as in the parametric form of the model, using the LOWESS residuals. E62.10 The Normal-Gamma Model The normal-gamma model is the remaining distributional form of the stochastic frontier model. Under this specification,

ui ~ 1exp( ) , 0, 0, 0.

( )

P Pi i

iu u u P

P

−θ −θ≥ > θ >

Γ

This model is more flexible than the half normal or exponential model in that with two parameters, it allows the both the shape and location to vary independently. (The truncation model does likewise, but it is considerably more difficult to estimate.) To specify the gamma model, use ; Model = Gamma (or ; Model = G) The normal-gamma model is estimated by the method of simulated maximum likelihood. (See Greene (2000b) and the details in Section E62.10.2.) The counterpart to the JLMS estimator of the inefficiency, E[u|ε] must also be estimated by simulation.


E62.10.1 Application of the Normal-Gamma Model We illustrate the gamma model by fitting a cost frontier model with normal-gamma inefficiency. For comparison, we have also fit the exponential model, which results when P is constrained to equal one. (The exponential model is fit directly by its own log likelihood, not by constraining P to equal one in the gamma model.) We have also computed the inefficiencies for the two models, and plotted kernel density estimators to compare them. The commands are FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Gamma ; Costeff = eucg ; Pts = 50 ; Halton $ FRONTIER ; Lhs = lc ; Rhs = x ; Cost ; Model = Exponential ; Costeff = euce $ KERNEL ; Rhs = eucg,euce ; Title = Kernel Density Estimates for E[u|e,exponential and gamma] $ We note by the Wald and likelihood ratio tests, we cannot reject the hypothesis of the exponential model (P is close to one). The similarity of the kernel density estimators is consistent with this finding. ----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable LC Log likelihood function 159.94270 Estimation based on N = 256, K = 11 Inf.Cr.AIC = -297.9 AIC/N = -1.164 Model estimated: Aug 22, 2011, 22:09:16 Normal-Gamma frontier model Variances: Sigma-squared(v)= .01169 Sigma-squared(u)= .00547 Sigma(v) = .10814 Sigma(u) = .07399 Stochastic Cost Frontier Model, e = v+u Half Normal:u(i)=|U(i)|; frontier model LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 157.91523 Chi-sq=2*[LogL(SF)-LogL(LS)] = 4.055 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 22.9007 27.13658 .84 .3987 -30.2860 76.0874 LY| .96086*** .02028 47.38 .0000 .92112 1.00061 LY2| .09283*** .01327 7.00 .0000 .06682 .11883 LPKP| 1.67283 2.12387 .79 .4309 -2.48987 5.83553 LPLP| -.01112 .06724 -.17 .8687 -.14290 .12066 LPMP| -.07676 1.37564 -.06 .9555 -2.77297 2.61944 LPEP| -.63376 .68533 -.92 .3551 -1.97698 .70946 LPFP| -.06405*** .02311 -2.77 .0056 -.10934 -.01876 |Variance parameters for compound error Theta| 12.4180** 5.05037 2.46 .0139 2.5194 22.3165 P| .84426 .69128 1.22 .2220 -.51062 2.19913 Sigmav| .10814*** .01148 9.42 .0000 .08563 .13064 --------+--------------------------------------------------------------------


----------------------------------------------------------------------------- Log likelihood function 159.89917 Exponential frontier model Variances: Sigma-squared(v)= .01147 Sigma-squared(u)= .00568 Sigma(v) = .10709 Sigma(u) = .07539 Stochastic Cost Frontier Model, e = v+u LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 157.91523 Chi-sq=2*[LogL(SF)-LogL(LS)] = 3.968 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence LC| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| 22.6569 25.48354 .89 .3740 -27.2899 72.6038 LY| .96069*** .01892 50.77 .0000 .92360 .99777 LY2| .09281*** .01249 7.43 .0000 .06832 .11729 LPKP| 1.65439 1.99409 .83 .4067 -2.25395 5.56272 LPLP| -.00962 .09785 -.10 .9217 -.20140 .18216 LPMP| -.06595 1.31569 -.05 .9600 -2.64465 2.51275 LPEP| -.62841 .63243 -.99 .3204 -1.86795 .61114 LPFP| -.06397*** .02033 -3.15 .0017 -.10381 -.02412 |Variance parameters for compound error Theta| 13.2651*** 2.90719 4.56 .0000 7.5671 18.9630 Sigmav| .10709*** .00980 10.93 .0000 .08788 .12629 --------+--------------------------------------------------------------------

Figure E62.13 Kernel Density Estimates for Gamma and Exponential Inefficiencies


E62.10.2 Technical Details on Normal-Gamma Model The log likelihood for this model is equal to the log likelihood for the normal-exponential

model plus a term that is produced by the difference between the exponential and the gamma distributions;

Log L = Log L(exponential)

+ n[(P-1)logθ - logΓ(P)] + Σi log h(P-1,εi)

where h(r,εi) = ( ) ( )

( ) ( )0

0

1/ ( ) /

1/ ( ) / v i v

v i v

rz z dz

z dz

∞

∞

σ φ − η σ

σ φ −η σ

∫∫

, ηi = -εi - θσv2.

The normal-exponential model results if P = 1. Computation of the function h(r,εi) is the obstacle to estimation. Beckers and Hammond (1987) derived a closed form expression, but the result has never been operationalized – it is complex in the extreme. Greene (1990) attempted estimation by using a crude approximation with Simpson’s rule, but failed to obtain reasonable results. (See Ritter and Simar (1997).)

A satisfactory solution is produced by the technique of maximum simulated likelihood. The integral and its derivatives can be estimated consistently by Monte Carlo simulation. The crucial result is that h(r,εi) is the expectation of a random variable;

h(r,εi) = E[zr | z≥ 0]

where z ~ N[ηi, σv2]

ηi = -εi- θσv2

Therefore, h(r,εi) is the expected value of zr where z has a truncated at zero normal distribution. Thus, we estimate h(r,εi) by using the mean of a sample of draws from this distribution. For given values of εi and ηi (i.e., yi, xi, β, σv, θ, r), h(r,εi) is consistently estimated by

1

1ˆ Q ri iqq

h zQ −

= ∑

where ziq is a random draw from the truncated normal distribution with mean parameter ηi and variance parameter σv. This produces the simulated log likelihood function Log LS = Log L(exponential)

+ n[(P-1)logθ - logΓ(P)] + Σi log h (P-1,εi) which for a given set of draws is a smooth and continuous function of the parameters.


Random draws from the truncated distribution are obtained using Geweke’s method as follows: Let

L = truncation point = 0 for this application

µ = the mean of untruncated distribution = -εi - θσv2

σ = the standard deviation of the untruncated distribution = σv

PL = Φ[(L - µ) / σ]

F = one draw from U[0,1]

z = µ + σΦ-1[PL + F×(1 - PL)]

Then, z = the draw from the truncated distribution.

Collecting all terms, then, this produces the simulated log likelihood function:

Log L = n{logθ + ½ σv2θ2} + Σi{θdεi + logΦ[-(dεi/σv + θσv)]}

+ n[(P-1)logθ - logΓ(P)]

+ Σi log

Φ−+Φσ+µ

−−

= σµ−∑

11

1)1(1

P

v

iiqiqvi

Qq

FFQ

εi = yi - β′xi

µi = -εi- θσv2

and Fiq is a fixed set of Q draws from U[0,1] specific to the individual. Derivatives of h(r,εi) and log h(r,εi) are also estimated by simulation. The JLMS efficiency measure has the simple form E[u|ε] = h(P,εi) / h(P-1,εi). The final consideration is the method of obtaining the draws. The default method is to use the random number generators. Since this is a very computation intensive model, it is usually more efficient to use Halton draws – you can use many fewer Halton draws than random draws to obtain the same quality results. Halton draws are discussed in Section R24.7. To use Halton draws with this estimator, add ; Halton to the command. The number of points for either method is specified with ; Pts = the desired number of draws We have used this feature in the example in the previous section.


E62.11 Sample Selection in a Stochastic Frontier Model This model is a counterpart to familiar models of sample selection. See Greene (2010) for details on the methodology. Additional results appear in Terza (2010). The model is a familiar sample selection form d* = α′z + w, d = 1(d* > 0)

y = β′x + v - u

u = |U| with U ~ N[0,σu2]

(v,w) ~ bivariate normal with [(0,0),(σv2, ρσv, 1)]

(y,x) only observed when d = 1. Thus, the selection operates through the heterogeneity component of the production model, not the inefficiency. (Thus, observation is not viewed as a function of the level of inefficiency.) The model is fit by maximum simulated likelihood. To request it, use LIMDEP’s usual format for sample selection models, PROBIT ; Lhs = d ; Rhs = variables in w ; Hold $ FRONTIER ; Lhs = y ; Rhs = variables in x; Selection $ The model must be the base case, half normal, with no panel data application, no truncation, or heteroscedasticity, etc. You may control the simulations with ; Halton and ; Pts for the simulation. Efficiency and inefficiency estimates are saved as with other models with ; Eff and ;Techeff. However, observations in the nonselected part of the sample are given missing values (-999) for any of these computations. The PARTIALS and SIMULATE commands do not inherit the selection model – these commands are not available after fitting this model. E62.11.1 Application The following creates a data set that conforms exactly to the assumptions of the model.

CALC ; Ran(123457) $ SAMPLE ;1-2000 $ CREATE ; z1 = Rnn(0,1) ; z2 = Rnn(0,1) $ CREATE ; v1 = Rnn(0,1) ; v2 = Rnn(0,1) $ CREATE ; e1 = v1 ; e2 = .7071 * (v1+v2) $ CREATE ; ds = z1 + z2 + e1 ; d = ds > 0 $ CREATE ; u = Abs(Rnn(0,1)) ; x1 = Rnn(0,1) ; x2 = Rnn(0,1) $ CREATE ; y = x1 + x2 + e2 - u $ PROBIT ; Lhs = d ; Rhs = one,z1,z2 ; Hold $ FRONTIER ; Lhs = y ; Rhs = one,x1,x2 ; Selection $


----------------------------------------------------------------------------- Binomial Probit Model Dependent variable D Log likelihood function -825.27526 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence D| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Index function for probability Constant| .03616 .03525 1.03 .3051 -.03294 .10525 Z1| .96314*** .04604 20.92 .0000 .87291 1.05338 Z2| 1.01534*** .04702 21.59 .0000 .92318 1.10750 --------+-------------------------------------------------------------------- Warning 141: Iterations:current or start estimate of sigma nonpositive Normal exit: 14 iterations. Status=0, F= 1916.202 ----------------------------------------------------------------------------- Limited Dependent Variable Model - FRONTIER Dependent variable Y Log likelihood function -1916.20216 Estimation based on N = 2000, K = 6 Inf.Cr.AIC = 3844.4 AIC/N = 1.922 Variances: Sigma-squared(v)= 1.00545 Sigma-squared(u)= 1.07396 Sigma(u) = 1.03632 Sigma(v) = 1.00272 Sigma = 1.44202 Lambda = 1.03351 Sample Selection/Frontier Model Murphy/Topel Corrected VC Matrix LR test for inefficiency vs. OLS v only Deg. freedom for sigma-squared(u): 1 Deg. freedom for heteroscedasticity: 0 Deg. freedom for truncation mean: 0 Deg. freedom for inefficiency model: 1 LogL when sigma(u)=0 -1662.32532 Chi-sq=2*[LogL(SF)-LogL(LS)] = -507.754 Kodde-Palm C*: 95%: 2.706, 99%: 5.412 ----------------------------------------------------------------------------- --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence Y| Coefficient Error z |z|>Z* Interval --------+-------------------------------------------------------------------- |Deterministic Component of Stochastic Frontier Model Constant| -.04492 .10971 -.41 .6822 -.25994 .17011 X1| 1.00102*** .03357 29.82 .0000 .93522 1.06682 X2| .95627*** .03195 29.93 .0000 .89364 1.01890 Sigma(u)| 1.03632*** .13217 7.84 .0000 .77728 1.29537 Sigma(v)| 1.00272*** .05471 18.33 .0000 .89549 1.10995 Rho(w,v)| .77553*** .06187 12.54 .0000 .65427 .89679 --------+--------------------------------------------------------------------


E62.11.2 Log Likelihood and Estimation Method Write the model structure as d* = α′z + w, w ~ N[0,1], d = 1(d* > 0)

y = β′x + σvv - σu u

u = |U| with U ~ N[0,1]

(v,w) ~ bivariate normal with [(0,0),(1, ρ, 1)]

(y,x) only observed when d = 1. (Note for convenience later, we have moved the scale parameters into the structural model.) To set up the estimator, we now write w in its conditional on v form, w|v = ρv + h where h ~ N[0, (1 - ρ2)] and h is independent of v.

Therefore, d*|v = α′z +ρv + h, d = 1(d* > 0|v)

Then, Prob[d = 1 or 0 | z,v] = 2

(2 1)1z vd

′ + ρ Φ − − ρ

α

For the selected observations, d = 1, conditioned on v, the joint density for y and d is the product of the marginals since conditioned on v, y and d are independent; f(y, d = 1|x,z,v) = f(y|x,v) Prob(d = 1|z,v). We have the second part above. For the first part, y|x,v = (β′x + σvv ) - σuu where u is the truncation at zero of a standard normal variable, so f(u) = 2φ(u), u>0. The Jacobian of the transformation from u to y is 1/σu, so by the change of variable, the conditional density is

( )2( | , ) ,( ) 0.vv

u u

v yf y v v y′ + σ − ′= φ + σ − ≥ σ σ

xx xββ

Therefore, the joint conditional density is

2

( )2( , 1| , , )1

x zx z v

u u

v y vf y d v ′ ′+ σ − + ρ = = φ Φ σ σ −ρ

β α.


To obtain the unconditional density, it is necessary to integrate v out of the conditional density. Thus,

2

( ))2( , 1| , ) ( )1

vv

u u

v y vf y d f v d v ′ ′σ − − + ρ = = φ Φ σ σ −ρ

∫x zx z β α

.

The relevant term in the log likelihood is log f(y,d=1|x,z). For the nonselected observations, the contribution to the log likelihood is the log of the unconditional probability of nonselection, which is

Prob(d = 0|z) = 2

( )1z

v

v f v dv ′ + ρ Φ − − ρ

∫α

.

The integrals do not exist in closed form, so these terms cannot be evaluated as is. Before proceeding, we note the additional complication, β′x + σvv - y = σuu> 0, so the density f(v) is not the standard normal that intuition might suggest; it is a truncated normal. The integrals can be computed by simulation. By construction,

2 2

) )2 2( )1 1

x + x +z zv vvv

u u u u

v y v yv vf v dv E ′ ′ ′ ′σ − σ −+ ρ + ρ φ Φ = φ Φ σ σ σ σ − ρ −ρ

∫β βα α

so by sampling from the distribution of v, we can compute the function of v and average to obtain the integrals. In order to sample the draws on v, we note the implied truncation, v> (y - β′x)/σv or v>ε/σv. Draws from the truncated normal can be obtained using result (E-1) in Greene (2011). Let A equal a draw from the uniform (0,1) population. The desired draw from the truncated normal distribution will be vr = Φ-1 [Φ(ε/σv) + ArΦ(-ε/σv)]. Collecting all terms, then, the simulated log likelihood will be

1 2 2

)1 2log log (1- )1 1

R v ir ir irS i ii r

u u

v y v vL d dR =

′ ′ ′ σ − + ρ ρ = φ Φ Φ σ σ − ρ −ρ ∑ ∑ x + z z -+β α −α

where the draws on vir are as shown above. Derivatives of this simulated log likelihood are obtained numerically using finite differences.

e62: stochastic frontier models and efficiency...

Documents