general linear models in small area estimation: an assessment in agricultural surveys carlo russo,...

23
General linear models in small area estimation: an assessment in agricultural surveys Carlo Russo, Massimo Sabbatini, and Renato Salvatore University of Cassino, Italy The MEXSAI Conference

Upload: cuthbert-beasley

Post on 25-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

General linear models in small area estimation: an assessment in

agricultural surveys

Carlo Russo, Massimo Sabbatini, and Renato SalvatoreUniversity of Cassino, Italy

 

The MEXSAI Conference

Some small area estimation references

• Ghosh M., Rao J. N. K. (1994), Small area estimation: an appraisal, Statistical Science, Vol. 9, No. 1, pp. 55-93

• He Z., Sun D. (2000), Hierarchical Bayes estimation of hunting success rates with spatial correlations, Biometrics, 56, 102-109

• Malec D., Sedransk J., Moriarity C. L., LeClere F. B. (1997), Small area inference for binary variables in the National Health Intgerview Survey, Journal of the American Statistical Association, Vol. 92, 439, 815-826

• Rao J. N. K. (2002), Small area estimation with applications to Agriculture, Proceedings of the Conference on agricultural and environmental statistical appications in Rome, Vol. III, 555-564

• Rao J. N. K. (2003), Small area estimation, Wiley, London

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The term small area usually denote a small geographical area, such as a county, a province, an administrative area or a census division

• From a statistical point of view the small area is a small domain, that is a small subpopulation constituted by specific demographic and socioeconomic group of people, within a larger geographical areas

• Sample survey data provide effective reliable estimators of totals and means for large areas and domains. But it is recognized that the usual direct survey estimators performing statistics for a small area, have unacceptably large standard errors, due to the circumstance of small sample size in the area

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The MEXSAI Conference

Small area estimation: a simple outline

• In fact, sample sizes in small areas are reduced, due to the circumstance that the overall sample size in a survey is usually determined to provide specific accuracy at a macro area level of aggregation, that is national territories, regions ad so on (Ghosh and Rao, 1994)

• Small area statistics are important tools for planning agricultural policies in specific regional and administrative areas

• But important is also the information demand from other sectors, such as private, especially for questions related to local social and economics conditions, in local area marketing research, and so on

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The small area statistics are based on a collection of statistical methods that “borrow strength” form related or similar small areas through statistics models that connect variables of interest in small areas with vectors of supplementary data, such as demographic, behavioral, economic notices, coming from administratvive, census and specific sample surveys records

• Small area efficient statistics provide, in addition of this, excellent statistics for local estimation of population, farms, and other characteristics of interest in post-censual years

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The most commonly used tecniques for small area estimation are the empirical Bayes (EB) procedures, the hierarchical Bayes (HB) and the empirical best linear unbiased prediction (EBLUP) procedures (Rao, 2003)

• Some utilization of this tecniques in agrigultural statistics are related to the implementation of satellite data, and, in general, of differently-oriented sumpley surveys in model-based frameworks

• There are two types of small area models that include random area-specific effects: in the first type, the basic area level model, connection through response and area specific auxiliary variables is established, because the limited availability at such type of data at unit level

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The second type are the unit level area models, in which element-specific auxiliary data are available for the population elements (Ghosh and Rao, 1994; Rao, 2002)

• The simplest way to perform small area statistics is, however, to derive synthetic estimates from large area data assumptions on related local areas: sinthetic estimators are generally used because of their applicability to general sampling designs and of their improving efficiency in relation to exploiting information from similar small areas

• The problem is that such type of estimators are potentially design-biased. Following the composite estimate approach to small area analyis, the way of balance the bias of synthetic estimator against the instability of a direct estimator is to take a weighted average of the two estimators. The result is a composite estimator

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• Small area models include random area-specific effects in regression-synthetic area estimators

The basic area level model

1)

2)

3)

1) is the vector of auxiliary data2) the parameters of interest that are assumed to be related to

the vector 1)3) iid random effects (normal)

Tipii xx ),...,( 1x

miuz iiTii ,...,βθ 1x

20 uii uVuE σ)(,)(

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

4)4) is the direct area estimator with sampling errors

• Combining 2) and 4):

5)5) this model involves design random variables and, at the

same time, the model-based random variables. It is an example of general linear mixed model (GLMM) with diagonal covariance structure

mieiii ,...,θθ̂ 1

mieuz iiiTii ,...,βθ̂ 1x

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The BLUP (best linear unbiased prediction) estimator is a weighted average of the design-based estimator and the regression-synthetic estimator

• The MSE of the BLUP estimator depends on the variance parameter of the random area effects

• In practical applications this parameter is unknown, and it is replaced by an estimator

• Then, we have a two-stage estimator, called empirical BLUP (EBLUP)

• Since the MSE of the EBLUP estimator is insensitive to the choiche of the random area effect varaince estimator, it is larger than the BLUP estimator

• Assuming normality of random effects, the related variance area parameters can be estimated either by maximum likelihood (ML) or restricted maximum likelihood (REML) methods

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• The first step of the EB (empirical Bayes) approach is derive the posterior distribution of the parameters of interest, given the data, assuming that the model parameters are known

• Then the model parameters are estimated from the marginal distribution of the data, and within-small area parameters of interest analysis is based on the estimated posterior distribution

• Various methods have been proposed for avoid some problem related to the EB approach, connected to the underestimation of the true posterior variance, like bootstrap methods

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• Instead of EBLUP and EB, if we follow the HB (hierarchical Bayes) approach, first a prior distribution on the model parameters is specified, and then the posterior distribution of the parameters of interest is obtained

• The usually estimation small area problem are solved exploiting the posterior distribution framework. The evaluation of parameters of interest is obtained by its posterior mean-based estimate, and the precision of the estimate in terms of its MSE is measured with the posterior variance

• The HB approach is computationally intensive, involving in much cases high dimensional integration

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Small area estimation: a simple outline

• Some tools, such as Gibbs sampling and importance sampling, the latter jointly employed with Monte Carlo numerical integration methods, are commonly used in order to overcome some computational problems

• In the recent years, comparative studies concerning the EBLUP, EB, and HB approaches lead in general to close values of predictors. All of the three in certain particular situations can work better than others

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data

• Qualitative data are becoming relevant in the agricultural economics field for two major reasons: firstly theoretical development stress the relevance of discrete and intrinsically qualitative phenomena, secondly the increasing sophistication of the statistics approach in the field allows economist to draw quantitative conclusions from discrete data

• The qualitative data about households, including the role of women, services availability, presence/absence of infrastructures are considered as relevant factor in the analysis that require close consideration in any economic model in the field

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data

• Agricultural economists are also interested to the social analysis of the rural territory. The segmentation of the universe, based on qualitative variables (such as gender, age, education) becomes relevant to define the dynamics of specific groups and to analyze issues of interest

• The shift of the policy focus from producers’ support to rural development in high income countries is one of the major factor determining the new interest in the analysis of the qualitative aspects of agriculture

• In the contest of qualitative data analyses, both continuos and binary or nonnumeric data are available by the large data sets exploration of some arrays, such as in agricultural census data. The complete exploitation of that large number of informations about farms is often feasible only with some explorative data analyses, in particular homogeneity and correspondence analysis

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Qualitative data

• On the other hand, it is recognized that some complex aspects of farms structure are correctly pointed out if we implement in economic models, at the same time, all possible information

• Small area statistics are powerful methods in estimating small area farms characteristics, but some agricultural policies need further information, especially those related with particular classes of farms

• The apparteinance of farms in well-recognized classes, jointly used with other area information, is then a basic policy-makers tool

• From this standpoint is very useful try to achieve small area random effects models that combine continuous and categorical predictors and use binary response variables. The goal is to estimate proportions of farms falling in some qualitative classes and in certain small areas

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model

small area analysis• The extension of the GLM models to binary response variables

small area analysis is given in Malec et al., 1997 and 1999. The related unit level model combine, in the paper application example, small area-specific covariates with unit level demographic and socioeconomic data. Then estimates was stated relating individuals and classes, using a HB approach

• In that GLM model, it is assumed that each individual in the population is assigned to one of mutually exclusive and exhaustive classes, based on the individual’s demographic and socioeconomic status

• Given a vector of random effects the estimation of parameters of GLMM model for binary responses requests computation of high dimensional integrals, with dimension equal to the number of levels of the random factors

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model

small area analysis• One approach in literature was done in the contest of

HB framework. He and Sun, 2000, given an example of hierarchical Bayes estimation procedure of a logistic-linear mixed model in hunting success rates at the sub-area level for post-season harvest surveys

• The model implements fixed week effects and random geographic effects, in the contest of autoregressive (AR) and conditional autoregressive (CAR) approach to the analysis of spatial correlations between neighboring sub-areas. The process of estimation needs, as in the case of the GLM represented by the logistic-linear model above, Gibbs sampling procedures

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model

small area analysis• We introduce in the paper a Monte Carlo Newton-Raphson

ML procedure (McCulloch, 1997) in estimating parameters in the following general logistic-linear mixed model

• The estimation problem in closed form likelihood integral expressions is proposed to solve numerically via Monte Carlo approach.

• Another problem is how to generate starting values of the parameters in likelihood expressions if, previously, we don’t specify the vector of random effects. A natural way to solve the problem is to adopt the Metropolis algorithm, that is a simple Markov Chain Monte Carlo (MCMC) algorithm

iiTkikikik uppp β)logit())/(log( X1

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

The general logistic-linear mixed model

small area analysis• The basic characteristic of a MCMC is that the sequence of

generated points takes a kind of random walk in parameter space, instead of each point being generated, one independently from another

• Moreover, the probability of jumping from one point to an other depends only on the last point and not on the entire previous history (this is the peculiar property of a Markov chain)

• The paper shows the Monte Carlo approach to the Newton-Raphson procedure of estimating logistic linear parameters estimation via an iterative procedure that leads to convergent MLE estimates, under assumption of normality

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Conclusions

• In this paper, a Monte Carlo Newton-Raphson algorithm has been outlined, assuming normality of random area effects, in order to approach the MLE estimation issues related to the logistic-linear mixed model, in the context of qualitative small area estimation

• As generally recognized, the focus of the recent economic theory on qualitative data can be summarize in two major points: the increasing interest in the analysis of discrete phenomena, and the explanatory power of qualitative variable in describing the current trend in the agricultural sector

• Statistical methods able to convey the qualitative information in the estimation models are able to increase efficiency

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Conclusions

• Due to the availability of large sets of informations about sample units and, in general, local areas of interest, provided by the full exploitation and analysis of the survey questionnaires, one of the most important questions is how to implement large sets of continuous and categorical variables in small area models

• In fact, many basic informations about units and areas are both continuous and categorical, and in many cases only the categorical ones can lead to appropriate assessments of specific issues

• From this poin of view, the logistic-linear mixed model can be an useful tool, measuring random area-specific effects and performing satisfactory area level analyses

The MEXSAI Conference

GENERAL LINEAR MODELS IN SMALL AREA ESTIMATION: AN ASSESSMENT IN AGRICULTURAL SURVEYS

Thank you

Please find much more methodological details in the paper available on the conference website

e-mail to: [email protected]