joint unece/eurostat meeting on population and housing censuses (13-15 may 2008) sample results...

34
Joint UNECE/Eurostat Meeting on Population and Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Housing Censuses (13-15 May 2008) Sample results expected Sample results expected accuracy in the Italian accuracy in the Italian Population and Housing Census Population and Housing Census Giancarlo Carbonetti, Marco Fortini Istat – Italian National Statistical Institute General Censuses Directorate May 13th 2008

Upload: joan-willis

Post on 23-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE/Eurostat Meeting on Population and Housing Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008)Censuses (13-15 May 2008)

Sample results expected accuracy in the Sample results expected accuracy in the Italian Population and Housing CensusItalian Population and Housing Census

Giancarlo Carbonetti, Marco FortiniIstat – Italian National Statistical Institute

General Censuses Directorate

May 13th 2008

Page 2: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

2

OutlineOutline

Introduction

Some aspects related to the use of samples of households for long form enumerations

Sampling strategies

Simulation study

Some results

Conclusions

Page 3: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

3

Introduction - 1Introduction - 1

Main critical issue of the last Census

Huge organizational (and economical) effort of Municipal Census Offices sudden and time-concentrated increase of workload for largest municipalities, massive network of enumerators

and coordinators to be trained and managed lack of adequately skilled resources, high turn over rates

Main objectives for the next Census

to improve the census operations efficiency to reduce the municipalities workload to keep an high level of quality

Page 4: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

4

Introduction - 2Introduction - 2

Innovations proposed to reach the objectives the use of population registers mail out of census forms mixed mode of data collection mainly based on mail and web

Expected consequences with the innovations the increase of “back office” work the reduction of enumerators number (“front office” work)

How it is possible increasing the response rates

A proposal: the use of a “short form” version of the questionnaire is considered to reach high response rates.

Page 5: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

5

Introduction - 3Introduction - 3

Consequences of the use of short form

increasing the response rates reducing as much as possible the response time delay

This approach risks information loss!!!

How to preserve the richness of the census information

by a selection of a sample of households to which a “long form” version of the questionnaire is supplied

Strategy: the simultaneously use of short and long forms.

Page 6: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

6

Some aspects related to the use of samples of households Some aspects related to the use of samples of households for long form enumeration - 1for long form enumeration - 1

Which type of information can be surveyed by means of a sample of long forms and which must be collected on the whole population?

The overall set of census variables is partitioned into two subsets the demographic variables (gender, date of birth, marital

status, nationality, …) the remaining variables (educational level, occupational status,

commuting)

Short form accounts for merely the first set of variables whereas long form accounts for the whole set

Page 7: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

7

Some aspects related to the use of samples of households Some aspects related to the use of samples of households for long form enumeration - 2for long form enumeration - 2

Which is the population municipality threshold under which the sampling strategy cannot be adopted?

An option we are taking into consideration is to sample in municipalities with more than 5,000 inhabitants long forms will be submitted to a sample of households short forms will be administered to remaining households

In municipalities smaller than 5,000 inhabitants long forms will be submitted to the whole population

Page 8: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

8

Some aspects related to the use of samples of households Some aspects related to the use of samples of households for long form enumeration - 3for long form enumeration - 3

Which domains have to be considered to plan the sample and to produce accurate estimates?

New “census domains” have been defined an appropriate methodology was adopted to build up census

domains by aggregating the smallest census areas the new “areas” are referred to sub-municipal level

Accuracy of sampling estimates for different territorial levels a similar precision is expected for estimates among areas higher precision is expected for larger territorial reference

(from sub-municipal to nationwide level)

Page 9: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

9

Some aspects related to the use of samples of households Some aspects related to the use of samples of households for long form enumeration - 4for long form enumeration - 4

Which statistical methodology performs the most accurate estimation?

… in terms of …

sampling design

use of appropriate lists

efficient estimation methods

sampling error assessment

The answer to this question is the aim of the study of which some results will be presented.

Page 10: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

10

Sampling strategiesSampling strategies

Two different sampling designs have been tested Simple Random Sampling of HOUseholds from Administrative

Registers (SRSHOU) managed by municipalities Area Frame Sampling based on a Simple Random Sampling of

ENumeration Areas (SRSENA) which implies a complete data collection of households dwelling in the selected enumeration areas (from Digital Geocoded Database)

Different studies have been conducted

To compare the two different approaches (with a sampling ratio of about one third of the whole population considered)

To evaluate in the SRSHOU the improvement of the estimates precision for increasing sampling ratio (10%, 15%, 20%, 33%)

To introduce some stratifications of the units involved

Page 11: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

11

Simulation study - 1Simulation study - 1

Main features of the sampling designs Domains: the “new areas” referred to sub-municipal districts Target variables: “variables” related to cross-classification of

educational level, employment status and commuting with demographic variables

Sampling units: “households” or “enumeration area” Estimator: “calibrated estimators” by using final weights

properly modified so to make the sample more representative

The sampling strategies were compared to each other through Monte Carlo sampling replications (carried out on 2001 Italian Census data) in order to assess the sampling error defined by the coefficient of variation (CV) which represents an accuracy measurement of the sampling estimates.

Page 12: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

12

Simulation study - 2Simulation study - 2

Geographical area

Classes of population size (a)Total

10,000-20,000 20,000-100,000more than 100,000

North 4 6 6 16

Center 2 3 3 8

South 4 6 6 16

Total 10 15 15 40(a) It has been considered the legal (official) population date referred to the 2001 Census of Population.

Because of the strong differences among the Italian municipalities, 40 of them with different population size and from different regions of Italy were considered

Page 13: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

13

Simulation study - 3Simulation study - 3

 Sampled Units Universe %

Areas 497 3,347(*) 14.85%

Enumeration areas 30,890 382,534 8.08%

Households 2,243,511 21,810,676 10.29%

Individuals 5,537,582 56,594,021 9.78%

(*) Estimated number      

Amount of units involved by the simulation studyAmount of units involved by the simulation study

Page 14: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

14

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40 45 50 55

p%

cv

%

Scatter plot of cvScatter plot of cv and p (estimates) for each census area. and p (estimates) for each census area. SRSHOU design (sampling ratio=33%). City of Perugia.SRSHOU design (sampling ratio=33%). City of Perugia.

1%2%

3%

Page 15: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

15

Distribution of median Distribution of median cvcv for classes of for classes of pp for SRSHOU design and for SRSHOU design and SRSENA design (both with sampling ratio=33%). SRSENA design (both with sampling ratio=33%). Comparison of 4 municipalities.Comparison of 4 municipalities.

Classes of p

Milano (111 areas) Bologna (32 areas) Padova (18 areas) Livorno (13 areas)

SRSHOU SRSENA SRSHOU SRSENA SRSHOU SRSENA SRSHOU SRSENA

< 0.05% 97.78 94.12 96.52 94.31 99.65 98.34 102.21 101.61

0.05%├0.1% 51.61 51.59 50.67 49.54 51.70 54.13 50.69 52.06

0.1%├0.25% 34.67 34.92 35.00 35.20 35.37 36.03 35.08 35.67

0.25%├0.5% 22.96 24.38 24.17 24.73 25.58 26.45 23.70 24.37

0.5%├1% 16.86 18.71 16.85 18.32 16.95 17.81 17.16 18.72

1%├2.5% 10.61 12.21 10.74 11.95 11.07 12.00 11.34 12.90

2.5%├5% 7.02 8.53 7.07 8.25 7.35 8.48 7.17 9.00

5%├10% 4.84 5.97 4.75 5.74 5.05 5.85 4.88 6.39

10%├15% 3.17 4.41 3.09 4.09 3.19 4.37 3.06 4.82

15%├20% 2.44 3.46 2.38 3.12 2.44 3.14 2.44 3.39

20%├30% 1.89 2.61 1.92 2.48 2.08 2.73 2.05 2.88

≥ 30% 1.35 1.78 1.32 1.60 1.39 1.72 1.40 2.00

THIS IS DUE TO THE CLUSTER EFFECT

Page 16: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

16

Loss of efficiency (in terms of CV for classes of p) of estimation with Loss of efficiency (in terms of CV for classes of p) of estimation with SRSENA with respect to SRSHOU design (both with sampling SRSENA with respect to SRSHOU design (both with sampling ratio=33%). Comparison of 4 municipalities.ratio=33%). Comparison of 4 municipalities.

Classes of p

Milano (111 areas)

Bologna (32 areas)

Padova (18 areas)

Livorno (13 areas)

< 0.05% 3.65 2.21 1.31 0.60

0.05%├0.1% 0.03 1.13 -2.43 -1.37

0.1%├0.25% -0.25 -0.20 -0.66 -0.59

0.25%├0.5% -1.42 -0.56 -0.87 -0.68

0.5%├1% -1.85 -1.47 -0.87 -1.56

1%├2.5% -1.60 -1.22 -0.93 -1.56

2.5%├5% -1.51 -1.18 -1.13 -1.83

5%├10% -1.13 -0.99 -0.80 -1.51

10%├15% -1.24 -1.01 -1.18 -1.76

15%├20% -1.02 -0.73 -0.70 -0.95

20%├30% -0.72 -0.56 -0.65 -0.82

≥ 30% -0.43 -0.29 -0.33 -0.60

[CV(SRSHOU_s.r. 33%)-CV(SRSENA_s.r. 33%)]

Page 17: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

17

Distribution of median Distribution of median cvcv for classes of for classes of pp. Comparison of 4 different . Comparison of 4 different sampling ratios with the SRSHOU design.sampling ratios with the SRSHOU design.

Classes of p

sampling ratio= 10%

sampling ratio= 15%

sampling ratio= 20%

sampling ratio= 33%

170 areas 140 areas 111 areas 204 areas

< 0.05% 220.51 157.20 142.00 98.21

0.05%├0.1% 111.48 87.22 74.20 51.14

0.1%├0.25% 75.57 59.83 49.97 34.76

0.25%├0.5% 50.70 39.92 33.97 23.44

0.5%├1% 35.54 28.10 23.74 16.56

1%├2.5% 23.62 18.56 15.33 10.68

2.5%├5% 15.50 12.29 10.09 7.04

5%├10% 10.46 8.26 6.93 4.82

10%├15% 7.06 5.40 4.40 3.13

15%├20% 5.57 4.27 3.54 2.42

20%├30% 4.50 3.48 2.84 1.93

≥ 30% 3.20 2.42 1.94 1.34

Page 18: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

18

Gain of efficiency (in terms of CV for classes of p) of estimation with Gain of efficiency (in terms of CV for classes of p) of estimation with SRSHOU design by increasing sampling ratio from 10% to 33% .SRSHOU design by increasing sampling ratio from 10% to 33% .

Classes of pincreasing s.r.

from 10% to 15%increasing s.r.

from 10% to 20%increasing s.r.

from 10% to 33%

< 0.05% 28.71 35.60 55.46

0.05%├0.1% 21.76 33.44 54.13

0.1%├0.25% 20.83 33.88 54.00

0.25%├0.5% 21.26 33.00 53.77

0.5%├1% 20.93 33.20 53.40

1%├2.5% 21.42 35.10 54.78

2.5%├5% 20.71 34.90 54.58

5%├10% 21.03 33.75 53.92

10%├15% 23.51 37.68 55.67

15%├20% 23.34 36.45 56.55

20%├30% 22.67 36.89 57.11

≥ 30% 24.38 39.38 58.13

[CV(SRSHOU_s.r. 10%)-CV(SRSHOU_s.r. N%)]x100/[CV(SRSHOU_s.r. 10%)]

Gain between 21-23 percent

Gain between 33-38 percent

Gain between 53-58 percent

Page 19: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

19

Distribution of median Distribution of median cvcv for five classes of for five classes of pp and three classes of and three classes of area (according to population size). Comparison of 4 different area (according to population size). Comparison of 4 different sampling ratios with the SRSHOU design.sampling ratios with the SRSHOU design.

Classes of p

Population by area

(thousands)

Sampling ratio

10% 15% 20% 33%

0.1%├0.25%

<10 90.00 71.27 59.48 40.20

10├12 76.23 60.03 50.45 34.41

≥ 12 66.65 53.04 43.51 30.58

0.5%├1%

< 10 43.11 33.50 28.97 19.53

10├12 35.08 27.46 22.99 16.48

≥ 12 31.25 24.95 20.97 14.85

2.5%├5%

< 10 19.12 14.68 12.22 8.25

10├12 15.58 12.36 9.89 7.08

≥ 12 14.00 10.98 9.06 6.35

10%├15%

< 10 8.78 6.44 5.22 3.67

10├12 7.00 5.46 4.41 3.13

≥ 12 6.29 4.79 3.89 2.83

20%├30%

< 10 5.46 4.16 3.44 2.27

10├12 4.57 3.42 2.94 2.01

≥ 12 4.05 3.20 2.59 1.77

Page 20: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

20

Median Median CV CV for some classes of for some classes of pp and for three classes of area (according to and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size less than 10,000 inhabitants.SRSHOU design. Graph referred to area size less than 10,000 inhabitants.

Area size<10,000

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

classes of p

med

ian

cv%

s.r.=10%

s.r.=15%

s.r.=20%

s.r.=33%

<10 10├12 ≥ 12

10% 90.0 76.2 66.715% 71.3 60.0 53.020% 59.5 50.5 43.533% 40.2 34.4 30.610% 43.1 35.1 31.315% 33.5 27.5 25.020% 29.0 23.0 21.033% 19.5 16.5 14.910% 19.1 15.6 14.015% 14.7 12.4 11.020% 12.2 9.9 9.133% 8.3 7.1 6.410% 8.8 7.0 6.315% 6.4 5.5 4.820% 5.2 4.4 3.933% 3.7 3.1 2.810% 5.5 4.6 4.115% 4.2 3.4 3.220% 3.4 2.9 2.633% 2.3 2.0 1.8

0.5-1%

2.5-5%

10-15%

20-30%

Classes of p

s.r.Area size (thousands)

0.1-0.25%

Page 21: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

21

Area size between 10-12,000

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

classes of p

med

ian

cv%

s.r.=10%

s.r.=15%

s.r.=20%

s.r.=33%

<10 10├12 ≥ 12

10% 90.0 76.2 66.715% 71.3 60.0 53.020% 59.5 50.5 43.533% 40.2 34.4 30.610% 43.1 35.1 31.315% 33.5 27.5 25.020% 29.0 23.0 21.033% 19.5 16.5 14.910% 19.1 15.6 14.015% 14.7 12.4 11.020% 12.2 9.9 9.133% 8.3 7.1 6.410% 8.8 7.0 6.315% 6.4 5.5 4.820% 5.2 4.4 3.933% 3.7 3.1 2.810% 5.5 4.6 4.115% 4.2 3.4 3.220% 3.4 2.9 2.633% 2.3 2.0 1.8

0.5-1%

2.5-5%

10-15%

20-30%

Classes of p

s.r.Area size (thousands)

0.1-0.25%

Median Median CV CV for some classes of for some classes of pp and for three classes of area (according to and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size between 10,000 and 12,000 inhabitants.design. Graph referred to area size between 10,000 and 12,000 inhabitants.

The gain of efficiency (in terms of CV) for census areas with size between 10,000 and 12,000 with respect to census areas with less than 10,000 is about 14-20 percent. Similar results are obtained for all tested sampling ratios.

Page 22: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

22

Area size>12,000

0

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

classes of p

med

ian

cv%

s.r.=10%

s.r.=15%

s.r.=20%

s.r.=33%

<10 10├12 ≥ 12

10% 90.0 76.2 66.715% 71.3 60.0 53.020% 59.5 50.5 43.533% 40.2 34.4 30.610% 43.1 35.1 31.315% 33.5 27.5 25.020% 29.0 23.0 21.033% 19.5 16.5 14.910% 19.1 15.6 14.015% 14.7 12.4 11.020% 12.2 9.9 9.133% 8.3 7.1 6.410% 8.8 7.0 6.315% 6.4 5.5 4.820% 5.2 4.4 3.933% 3.7 3.1 2.810% 5.5 4.6 4.115% 4.2 3.4 3.220% 3.4 2.9 2.633% 2.3 2.0 1.8

0.5-1%

2.5-5%

10-15%

20-30%

Classes of p

s.r.Area size (thousands)

0.1-0.25%

Median Median CV CV for some classes of for some classes of pp and for three classes of area (according to and for three classes of area (according to population size). Comparison of 4 different sampling ratios (s.r.) with the population size). Comparison of 4 different sampling ratios (s.r.) with the SRSHOU design. Graph referred to area size more than 12,000 inhabitants.SRSHOU design. Graph referred to area size more than 12,000 inhabitants.

The gain of efficiency (in terms of CV) for census areas with size more than 12,000 with respect to census areas with less than 10,000 is about 22-28 percent. As before, similar results are obtained for all tested sampling ratios.

Page 23: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

23

Distribution of the estimates referred to areas larger than 12,000 Distribution of the estimates referred to areas larger than 12,000 inhabitants for classes of inhabitants for classes of cvcv. Comparison of percentage frequencies . Comparison of percentage frequencies for 4 different sampling ratios with the SRSHOU design.for 4 different sampling ratios with the SRSHOU design.

Classes of coefficient of variation %

Sampling ratio

10% 15% 20% 33%

< 2% 0.57 2.69 6.39 13.14

2%├5% 13.04 17.53 18.40 23.64

5%├10% 16.18 18.02 26.28 28.64

10%├20% 29.14 30.16 23.54 16.20

20%├50% 25.09 19.71 16.75 13.32

50%├100% 9.32 7.21 5.69 3.44

100%├200% 4.40 3.65 2.00 1.61

≥ 200% 2.25 1.03 0.95 -

HA – high accuracy

MA – medium accuracy

LA – low accuracy

Page 24: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

24

Distribution of the estimates referred to areas larger than 12,000 Distribution of the estimates referred to areas larger than 12,000 inhabitants for classes of inhabitants for classes of cvcv. Comparison of percentage frequencies . Comparison of percentage frequencies for 4 different sampling ratios with the SRSHOU design - 2for 4 different sampling ratios with the SRSHOU design - 2

Classes of cv%Sampling ratio

10% 15% 20% 33%

< 10% 29.80 38.25 51.07 65.42

10%├50% 54.23 49.87 40.29 29.52

≥ 50% 15.97 11.89 8.64 5.05

0

10

20

30

40

50

60

70

10% 15% 20% 33%Sampling ratio

% o

f e

sti

ma

tes cv%<10%

cv% in 10-50%

cv%>50%

HA - high accuracy

MA - medium accuracy

LA - low accuracy

Page 25: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

25

Generic sampled area a aa p̂CVXp̂

Territory RS given by aggregation of K sampled areas

aRR p̂CVK

1p̂CVXp̂

SS

100K

11red%

Percentage expected

reduction of CV in RS

Estimates of Estimates of pp referred to territory given by aggregation of areas. referred to territory given by aggregation of areas.

Territory R given by aggregation of sampled areas and not sampled areas

aRR p̂CVK

γp̂CVXp̂

100K

γ1red%

NNγ S

Quote of sub-population of R elegible for drawing the LF sample.

Percentage expected

reduction of CV in R

Page 26: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

26

Conclusions - 1Conclusions - 1

As expected, the most accurate estimates were obtained for: simple random sampling of households from administrative

registers largest sampling ratio

Better efficiency of estimates for largest areas (>12,000 inhabitants) this result could represent a suggestion for planning the sampling

design by defining larger census areas (of about 15,000 people)

The estimates referred to large domains given by aggregation of areas show high accuracy the accuracy increases with the domain’s number in case in which a part of the large domain is totally surveyed, the

estimates show a further increasing in accuracy

Page 27: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

27

Conclusions - 2Conclusions - 2

However area frame sampling is only slightly less efficient than SRSHOU, thus it could be adopted where reliable administrative registers are not available

Sampling ratio will be chosen considering trade-off between: needed financial savings accuracy required at different territorial domains

Further analyses will be conducted on small area estimation techniques to produce more accurate estimates for: smallest territorial levels rare populations

Page 28: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

28

Thank you for your attention and …

Page 29: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

29

… have a good lunch!!!

Page 30: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

30

Page 31: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

31

Simulation study - 4Simulation study - 4

Cross-classification cells educational level, employment status, commuting and gender 90 simple estimation cells

Calibration constraints defined by cross-classifying gender by age, and gender by marital status

Computational algorithm implemented by SAS code for each municipality and for each alternative sampling design: step 1) selection of a sample (of households or enumeration areas) step 2) computation of final weights step 3) estimation of the relative frequency p for each target cell step 4) iteration of steps 1), 2) and 3) for 1,000 sampling replications step 5) computation of sampling distribution mean and standard error

for each one of the 90 frequency cells

Page 32: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

32

Evaluation criterion: the coefficient of variationEvaluation criterion: the coefficient of variation

In order to compare the sampling strategies has been considered as evaluation criterion the coefficient of variation CV :

which represents an accuracy measurement of the sampling estimates.

Consequently, the percentage maximum expected error can be computed: Δ% ≈ 1.96 · CV

which is implied (with a probability of 0.95) by the estimation method.

100p̂E

p̂σcv

x

xx

The distribution of the empirical CV’s for all the 90 target cells was determined.

After having classified the target cells depending on their value p , CV’s distribution related to the cells in the same p group has been studied.

Page 33: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

33

Estimate referred to the generic sampled area a

aa p̂CVXp̂

Estimate referred to the territory RS given by aggregation of K sampled areas

S

S

RaaaR p̂WXp̂

SRaa NNW where

aR p̂CVK

1p̂CV

S

100K

11red%

Percentage expected reduction of CV

40%

50%

60%

70%

80%

90%

100%

0 500 1000 1500 2000 2500 3000

Numero di aree KRiduzione percentuale "attesa" del CV

for K>5 → red%>50%

for K>30 → red%>80%

for K>100 → red%>90%

Number of areas K─ Percentage expected reduction of CV

Estimates of Estimates of pp referred to territory given by aggregation of areas. referred to territory given by aggregation of areas. Case 1Case 1: aggregation of sampled areas.: aggregation of sampled areas.

Page 34: Joint UNECE/Eurostat Meeting on Population and Housing Censuses (13-15 May 2008) Sample results expected accuracy in the Italian Population and Housing

Joint UNECE Eurostat Meeting

34

80%

85%

90%

95%

100%

0 500 1000 1500 2000 2500 3000

Numero di aree KLFc_50% LFc_60% LFc_70% LFc_100%

Territory RS referred to Sampled areas: long form to a sample of households.

Territory RNS of Not Sampled areas: long form to all the households.

NSS RRR

NNγ S

NSSR pγ1p̂γp̂

100K

γ1red%

aR p̂CVK

1γp̂CV

100 400

Sub-population of R elegible for drawing the LF sample.

Number of areas K

Estimates of Estimates of pp referred to territory given by aggregation of areas. referred to territory given by aggregation of areas. Case 2Case 2: aggregation of sampled and not sampled areas.: aggregation of sampled and not sampled areas.

─ γ=1 ─ γ=0.7 ─ γ=0.6 ─ γ=0.5

Percentage expected reduction of CV