optimal allocation algorithm for a mulilti-way stratifi i ... · optimal allocation algorithm for a...

28
Optimal allocation algorithm for a li ifi i d i multi-way stratification design P.D. Falorsi, P. Righi, Italian National Statistical Institute NTTS 2011 Conference 22 – 24 February 2011, Brussels

Upload: others

Post on 13-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Optimal allocation algorithm for a l i ifi i d imulti-way stratification design

P.D. Falorsi, P. Righi,, g ,Italian National Statistical Institute

NTTS 2011 Conference 22 – 24 February 2011, Brussels

Outline

Overview

Multi-way Sampling Design Multi-way optimal allocation y p

procedure Monte Carlo simulation

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 2

Overview

Large scale surveys in Official Statisticsll d ti t f t f

usually produce estimates for a set ofparameters by a huge number of highlydetailed estimation domains

These domains generally define notnested partitions of the target population

When the domain indicator variables areavailable at framework level, we may plana sample covering each domaina sample covering each domain

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 3

Overview

Why fix a sample size in each domain:

Allows to apply direct estimators When planning the sample an evaluation

of the sampling errors on the mainof the sampling errors on the mainestimates is possible

When direct estimator is not reliable(small area problem) having units in thedomains allows to:

b d th bi f ll i di t bound the bias of small area indirectestimators;

use models with specific small area effects.

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 4

Overview

Standard solution for fixing the

sample sizes in domainsbelonging to two or more

titipartitions: Stratified the sample with strata

gi en b o l ifi tion ofgiven by cross-classification ofvariables defining the differentpartitions(cross-classification orpartitions(cross-classification orone-way stratified design)

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 5

Overview

Main drawbacks:

Too detailed stratification Risk of sample size explosion Inefficient sample allocation Risk of statistical burden

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 6

Overview

Some examples (1):

Inefficient sample allocation

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 7

Overview

Some examples (2): statistical burden

Strata distrib tion b n mber of enterprises in the Small and Medi mStrata distribution by number of enterprises in the Small and Medium Enterprises Survey (2003)Number stratum enterprises

Absolute frequency

Cumulative frequency % Frequency % Cumulative

frequency1 4 700 4 700 18 7 18 71 4,700 4,700 18.7 18.72 2,512 7,212 10.0 28.7

3-5 3,816 11,028 15.2 43.96-10 2,815 13,843 11.2 55.1>10 11 286 25 129 44 9 100 0

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 8

>10 11,286 25,129 44.9 100.0

Overview

Some examples (3):

Italian Graduates’ Career Survey. 2010sample size about 90,000 units

Number of domain in not nested partition and number of cross-classified strata

Type of degreeSample size explosionI ffi i t l ll tiType of degree

3 years LongFirst partition 448 425Second partition 94 198

Inefficient sample allocation

Second partition 94 198

Strata 2,981 4,778

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 9

Multi-way Sampling Design

Multi-way (or incomplete)

stratification design (MWD)satisfies sample allocation atd i l l ith tdomain level without cross-classification the sizes of the combining strata are the sizes of the combining strata are

random variables Main problem of MWD: a random Main problem of MWD: a random

selection procedure

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 10

Multi-way Sampling Design

Use of Cube Method (Deville and Tillé,2004)

2004) The method select balanced samples in the

model assisted framework

A sample s is balanced on a set of auxiliaryvariables z (balancing variables) if the

MWD is a special case of balanced sample

zz zz ttUk ksk kkht

MWD is a special case of balanced sample The method works well with a large

population and a lot of domains

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 11

Multi-way optimal allocation procedure

o The aim of the work is define a proceduredefining the optimal allocation and a selection

defining the optimal allocation and a selectionmethod suitable for large scale surveys

o The procedure is based on three main stepso The procedure is based on three main steps1. Sample allocation (optimization step, vector )

o minimizes the overall sample size n guaranteeing that the sampling variances are lower than

π

prefixed level of precision thresholdso Deal with a multivariate-multidomain problem

2 Definition of the final incl sion p obabilities 2. Definition of the final inclusion probabilities (calibration step)

3. Sample selection (balancing step)

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 12

Multi-way optimal allocation procedure

Notation and essential terms :

Domain b partition d : ; Domain indicator variable: ; Parameter of interest and estimator

B l i i bl Balancing variables

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 13

Multi-way optimal allocation procedure

Variance approximation of

balanced sampling :

With

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 14

Multi-way optimal allocation procedure

1. Theoretical Constrained

Optimization problem(optimization step):

Constraints

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 15

Multi-way optimal allocation procedure

Equivalent problem:

GivenWith The inequality constraints are

equivalents toq

with

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 16

Multi-way optimal allocation procedure

Issues of the theoretical ti i ti bl

optimization problem: Solution by means of modified

Chromy algorithm taking into account Chromy algorithm taking into account the constraints

Iterative procedure because the unknown terms are in the left unknown terms are in the left and right side of the inequality constraints

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 17

Multi-way optimal allocation procedure

Optimal allocation algorithm:

Give values to the unknown terms on the right side of the inequality (initialization values or values obtain (in the previous iteration)

Keep fixed these values and use modified Chromy algorithm to obtain modified Chromy algorithm to obtain the values in the left side

Iterate the modified Chromy algorithm til th it i i until the convergence criterion is

satisfied using the left values of the previous iteration for the right side

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 18

Multi-way optimal allocation procedure

Predicted Constrained Optimization

problem: In practice we do not know the term

nd m t e p edi tionand must use a prediction Given a superpopulation model

express

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 19

Multi-way optimal allocation procedure

For taking into account the uncertainty f th d l l th i

of the model we replace the variance with the Anticipated Variances An upward approximation is given byp pp g y

being obtained by means of the g ypredicted value

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 20

Multi-way Sampling Design

Remark: cross-classification stratified d i fi d l ti

design assumes a fixed superpopulation model defined in each stratum

hkyE hrk stratum)( , 2)(yVar 0)( yyCov, )( hrkyVar 0)( ,, rlrk yyCov

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 21

Multi-way optimal allocation procedure

2. Definition of the final inclusion probabilities

probabilities (calibration step) :

Given the vector by means of a π ycalibration procedure calculate

S h th t h i i t

ππ

Such that each is an integer

3. Sample selection (balancing step) with cube method

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 22

Monte Carlo Simulation

Objectives of simulation:

Test the convergence of the optimization algorithm (optimization step)step)

Verify the expect AV with respect to the Monte Carlo empirical AVp

Comparison with standard cross-classified stratified design

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 23

Monte Carlo Simulation

Data:Subpopulation of the Istat Italian Graduates’

Subpopulation of the Istat Italian Graduates’ Career Survey (3,427 units)

Driving allocation variables:emplo ed stat s ( es/no) employed status (yes/no) ;

actively seeking work (yes/no) . We generate the values of the two variables

by means a logistic additive model (Prediction by means a logistic additive model (Prediction model)

Explicative variables: degree mark, sex, age class and aggregation of subject area degree class and aggregation of subject area degree (different for and )

The parameters are estimated with the data from the previous survey

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 24

from the previous survey

Monte Carlo Simulation

Survey target estimates: 8 types of estimation domains;

Two partitions define the most disaggregate domains: First partition: university by subject area

d (9 l )degree (9 classes); Second partition degree by sex; Domains:448+94; Strata 2,981

(university, degree, sex) In the simulation: domains 20+15;strata 91

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 25

Monte Carlo Simulation

Errors thresholds fixed in terms of CV(%)

Results: assuming as known Iterations modified Chromy algorithm: 6 Iterations modified Chromy algorithm: 6 Optimal sample size 171, after calibration 182

Results: assuming predicted Iterations modified Chromy algorithm: 3Iterations modified Chromy algorithm: 3 Optimal sample size 699, after calibration 707

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 26

Monte Carlo Simulation

Analysis of the allocation with the predicted values

predicted values The sample allocation procedure uses an

approximation of the AV

Average of Expectected Anticipated CV(%) Partition

1 8 1 17 81y 2y

Average of Empirical (10,000 Monte Carlo simulations) Anticipated CV(%) Partition

1 6 7 14 71y 2y

1 8.1 17.82 9.2 19.1

1 6.7 14.72 7.4 15.5

The simulation confirms the input AV is an upward approximation of the real AV

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 27

Monte Carlo Simulation

Comparison with the standard approach:

approach: The implicit model is similar to the model

used in our approach;The allocation differences depend on the The allocation differences depend on the unit minimum number constraint (2) in each stratum.

The sample size is 751 units (+7 4%) The sample size is 751 units (+7.4%) Taking into account the domains with

small population strata (<10 units in average per stratum) standard approach produces +14.4% sample size

NTTS 2011 22-24 February 2011 P.D. Falorsi and P. Righi 28