oversampling the capital cities in the eu safety survey (eu-sasu) task force on victimization...
TRANSCRIPT
Oversampling the capital cities in the EU SAfety SUrvey (EU-SASU)
Task Force on VictimizationEurostat, 17-18 February 2010
Guillaume OsierService Central de la Statistique et des Etudes Economiques (STATEC)
Social Statistics [email protected]
Outline
I. Some theory1 . Definitions and concepts
2 . How to over-sample?3 . Why over-sample?4 . Impact on national accuracy
II. Over-sampling the capital cities in the EU-SASU1 . Is this proposal (statistically) relevant?
2 . How to determine the over-sampling rates?3 . Impact on the national accuracy
III. Specific issues in relation to over-sampling
Definitions and concepts(i) A sub-group (d) in the population is said to be over-sampled (or
over-represented) when the proportion of units from the sub-group is, on average, higher in the sample than in the reference population:
(ii) Conversely, a sub-group is said to be under-sampled (or under-represented) when the proportion of units from the sub-group is, on average, lower in the sample than in the reference population:
(iii) When a sub-group is neither over-sampled nor under-sampled, it is said to be well-sampled (or well-represented)
N
N
n
nE
dd
Proportion of units from (d) in the population
Average proportion of units from (d) in the sample
How to over-sample?
In order to get implemented, over-sampling requires the units in the sub-group to be identified in advance of sampling (issue with telephone surveys)
Two main techniques to over-sample:
• Stratification using unequal sampling fractions in the strata
• More general « proportional-to-size » sampling (ps, pps…)
Over-sampling rate for (d):
NN
nE
nEOR d
d
d
Expected sample size in (d) under
no over-sampling (i.e. under Simple Random Sampling)
Expected sample size in (d)
Why over-sample? 1/2
By selecting more people from certain groups than would typically be done if everyone in the sample had an equal chance of being selected, over-sampling leads to more accurate estimates for those groups.
The technique has proven particularly suitable to:• Small sub-populations;• Sub-populations having severe non-response
problems;• Sub-populations with large internal variability on the
key variables (e.g., household wealth)
Why over-sample? 2/2
More generally, one can resort to over-sampling whenever the sample size doesn’t allow us to reach specified precision targets over certain sub-populations.
Besides, in cross-national surveys (like the EU-SASU), over-sampling is essential for precision and hypothesis testing in cross-country comparisons.
The choice of the sub-groups to over-sample is policy-driven (political matter)
Impact on national accuracy 1/3
Optimal (Neyman) allocation: in order to maximize the precision of the national sample under stratified simple random sampling, the sample size in stratum h depends both on the stratum population Nh and the standard deviation Sh of the study variable
Stratum 1Size N1
St. deviation S1
Stratum 2Size N2
St. deviation S2
Stratum HSize NH
St. deviation SH
…
Total population aged 16+
H
kkk
hhopth
SN
SNnn
1
Impact on national accuracy 2/3
According to the previous formula, a larger sample should be taken if:* the stratum is larger* the stratum is more variable internally
These national considerations may conflict with more “local” considerations: as said, from a local point of view, over-sampling often focus on small sub-populations, while national considerations lead to taking larger samples from the largest strata. Nevertheless, the loss in national accuracy is often limited:
211 g
σ
σopt
opt
h
hopt
h
Hh n
nnmaxg 1
Impact on national accuracy 3/3
Thus, if g=20%, we have /(opt) 1.02, which makes an increase in accuracy (as measured by the standard error) of 2%. Similarly, if g=30%, we have /(opt) 1.04, which makes an increase of 4%. In this sense the optimum can be described as flat.
As a result, the impact of over-sampling on national accuracy should be limited, provided the sample sizes are not “extremely” different from the optimal ones. The impact is all the more limited given that the national sample sizes are generally large (thousands of units). Besides, by using powerful auxiliary information at national level, one may hope to increase sample precision a posteriori.
Over-sampling the capital cities in the EU-SASU: is this proposal relevant?
Capital city = most populated city of the country
Always the same as the political capital (except for Switzerland)
Is the proposal (statistically) relevant?• Sample size of individuals over the capital cities: is it enough to
draw reliable conclusions?• Victimization rates in the capital cities: are they generally higher
than those for the rest of the country?• Higher non-response in the capital cities? (often correct)
Minimum sample sizes for the capital cities
276
329
341
351
355
364
402
474
558
572
574
594
641
684
712
725
769
804
916
921
966
992
1025
1345
1375
1453
1462
1902
2600
0 500 1000 1500 2000 2500 3000
France
Germany
Switzerland
Italy
Poland
Netherlands
Portugal
Slovakia
Denmark
Greece
Spain
Sweden
Finland
Norway
Romania
Ireland
Belgium
Slovenia
Czech Republic
Luxembourg
Lithuania
United Kingdom
Bulgaria
Hungary
Austria
Estonia
Cyprus
Latvia
Malta
NONCALIBCALIB EVarYVar
0
10
20
30
40
50
60
70
Victimization rate (%) - National Victimization rate (%) - Capital city
Source: International Crime and Victimization Survey (ICVS), 2005
Victimization rates in capital cities
Victimization rates are higher in the capital cities than in the rest of the countries
How to determine the over-sampling rates? 1/4
Step 1: set up a precision target for every capital citiesStep 2: determine the minimum sample size needed to achieve the
level of precision specified at Step 1
Precision target (1): under simple random sampling, a relative margin of error of % in each capital city for any victimization rate higher than P%
1
11962
Pαnmin
0
5000
10000
15000
20000
25000
30000
35000
40000
0 10 20 30 40 50 60
P
nmin
= 10%
How to determine the over-sampling rates? 2/4
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 10 20 30 40 50 60
alpha
nmin
P = 20%
How to determine the over-sampling rates? 3/4
Precision target (2): under simple random sampling, an absolute margin of error of % points in each capital city for any victimization rate higher than P%
PPα
nmin
1196
2
How to determine the over-sampling rates? 4/4
Consider the national victimization rate for the 10 main crimes as used in the International Crime and Victimization Survey (ICVS):
Impact on the national accuracy 1/8
NCNC
CC P
~N
NP~
N
NP~
Victimization rate in the capital city Victimization rate in
the rest of the country
P~
nP~
P~
NN
nP~
P~
NN
RMENC
NCNCNC
C
CCC
11
19622
Variance:
Impact on the national accuracy 2/8
NC
NCNCNC
C
CCC
n
P~
P~
N
N
n
P~
P~
N
NV
1122
Relative margin of error:
NC
NCNCNC
C
CCC
n
P~
P~
N
N
n
P~
P~
N
NAME
11196
22
Absolute margin of error:
Case 1: fixed national sample size
Impact on the national accuracy 3/8
CNC
C
nnn
n,Pα
Minn 111196
2
Impact on the national accuracy 4/8Table 3: Relative margin of error (%) for the national victimization rate – fixed sample size at national level (Case 1)
CountryOver-sampling
No over-sampling P=0.1 P=0.2 P=0.3 P=0.4 P=0.5
France 7.5 6.3 6.0 5.9 5.9 5.9
Germany 7.0 5.9 5.7 5.6 5.6 5.6
Switzerland 6.6 5.3 5.1 5.0 5.0 5.0
Italy 7.2 6.1 5.8 5.8 5.7 5.8
Poland 6.5 5.5 5.3 5.2 5.2 5.2
Netherlands 5.5 4.6 4.4 4.4 4.4 4.4
Portugal 8.0 6.8 6.5 6.4 6.4 6.4
Denmark 7.1 5.4 5.2 5.2 5.3 5.2
Greece 7.1 6.0 5.8 5.8 5.9 5.8
Spain 8.1 7.0 6.8 6.9 7.1 6.9
Sweden 6.6 5.4 5.2 5.3 5.4 5.3
Finland 8.4 6.6 6.4 6.5 6.8 6.5
Norway 7.4 5.8 5.7 5.8 6.0 5.7
Ireland 6.2 4.8 4.7 4.7 4.9 4.7
Belgium 5.5 4.7 4.7 4.7 4.9 4.7
United Kingdom 4.5 3.9 4.0 4.1 4.4 3.9
Hungary 6.9 6.4 6.9 7.6 8.5 6.5
Austria 6.6 6.2 6.8 7.6 8.7 6.3
Estonia 5.5 4.9 5.6 6.5 7.7 4.9
Impact on the national accuracy 5/8
Table 4: Absolute margin of error (% points) for the national victimization rate – fixed sample size at national level (Case 1)
CountryOver-sampling
No over-sampling P=0.1 P=0.2 P=0.3 P=0.4 P=0.5
France 0.9 0.8 0.7 0.7 0.7 0.7
Germany 0.9 0.8 0.7 0.7 0.7 0.7
Switzerland 1.2 1.0 0.9 0.9 0.9 0.9
Italy 0.9 0.8 0.7 0.7 0.7 0.7
Poland 1.0 0.8 0.8 0.8 0.8 0.8
Netherlands 1.1 0.9 0.9 0.9 0.9 0.9
Portugal 0.8 0.7 0.7 0.7 0.7 0.7
Denmark 1.3 1.0 1.0 1.0 1.0 1.0
Greece 0.9 0.7 0.7 0.7 0.7 0.7
Spain 0.7 0.6 0.6 0.6 0.6 0.6
Sweden 1.1 0.9 0.8 0.8 0.9 0.8
Finland 1.1 0.8 0.8 0.8 0.9 0.8
Norway 1.2 0.9 0.9 0.9 1.0 0.9
Ireland 1.4 1.1 1.0 1.0 1.1 1.0
Belgium 1.0 0.8 0.8 0.8 0.9 0.8
United Kingdom 0.9 0.8 0.8 0.9 0.9 0.8
Hungary 0.7 0.6 0.7 0.8 0.8 0.6
Austria 0.8 0.7 0.8 0.9 1.0 0.7
Estonia 1.1 1.0 1.1 1.3 1.6 1.0
Case 2: national sample size not fixed
Impact on the national accuracy 6/8
N
Nn
N
Nnnn
N
Nn,
PαMaxn
NCCNC
CC 1
11962
Impact on the national accuracy 7/8Table 5: Relative margin of error (%) for the national victimization rate – national sample size not fixed (Case 2)
CountryOver-sampling
No over-sampling P=0.1 P=0.2 P=0.3 P=0.4 P=0.5
France 5.7 5.8 5.8 5.8 5.9 5.9
Germany 5.4 5.4 5.5 5.5 5.6 5.6
Switzerland 4.8 4.8 4.9 4.9 5.0 5.0
Italy 5.6 5.6 5.6 5.7 5.7 5.8
Poland 5.0 5.0 5.1 5.1 5.2 5.2
Netherlands 4.2 4.2 4.3 4.3 4.4 4.4
Portugal 6.2 6.3 6.3 6.4 6.4 6.4
Denmark 4.9 4.9 5.0 5.2 5.2 5.2
Greece 5.5 5.6 5.7 5.8 5.8 5.8
Spain 6.4 6.5 6.7 6.9 6.9 6.9
Sweden 4.9 5.0 5.1 5.3 5.3 5.3
Finland 5.9 6.0 6.3 6.5 6.5 6.5
Norway 5.2 5.4 5.6 5.7 5.7 5.7
Ireland 4.3 4.5 4.6 4.7 4.7 4.7
Belgium 4.4 4.5 4.6 4.7 4.7 4.7
United Kingdom 3.6 3.8 3.9 3.9 3.9 3.9
Hungary 5.8 6.3 6.5 6.5 6.5 6.5
Austria 5.5 6.2 6.3 6.3 6.3 6.3
Estonia 4.0 4.8 4.9 4.9 4.9 4.9
Impact on the national accuracy 8/8Table 6: Absolute margin of error (% points) for the national victimization rate – national sample size not fixed (Case 2)
CountryOver-sampling
No over-sampling P=0.1 P=0.2 P=0.3 P=0.4 P=0.5
France 0.7 0.7 0.7 0.7 0.7 0.7
Germany 0.7 0.7 0.7 0.7 0.7 0.7
Switzerland 0.9 0.9 0.9 0.9 0.9 0.9
Italy 0.7 0.7 0.7 0.7 0.7 0.7
Poland 0.8 0.8 0.8 0.8 0.8 0.8
Netherlands 0.8 0.8 0.8 0.8 0.9 0.9
Portugal 0.6 0.7 0.7 0.7 0.7 0.7
Denmark 0.9 0.9 0.9 1.0 1.0 1.0
Greece 0.7 0.7 0.7 0.7 0.7 0.7
Spain 0.6 0.6 0.6 0.6 0.6 0.6
Sweden 0.8 0.8 0.8 0.8 0.8 0.8
Finland 0.7 0.8 0.8 0.8 0.8 0.8
Norway 0.8 0.8 0.9 0.9 0.9 0.9
Ireland 1.0 1.0 1.0 1.0 1.0 1.0
Belgium 0.8 0.8 0.8 0.8 0.8 0.8
United Kingdom 0.8 0.8 0.8 0.8 0.8 0.8
Hungary 0.6 0.6 0.6 0.6 0.6 0.6
Austria 0.6 0.7 0.7 0.7 0.7 0.7
Estonia 0.8 1.0 1.0 1.0 1.0 1.0
Specific issues• The initial difficulty is in obtaining the sampling frame
appropriate for the over-sampling the inhabitants of the capital cities. For the countries conducting a face-to-face survey, this should not be a serious issue. On the other hand, the countries which plan to conduct the survey by telephone might be unable to do so; unless specific phone numbers are allocated to the households in the capital city (e.g., when the first digits of a phone number represent the city code)
• Since individuals in capital cities are in general more difficult to contact, over-sampling them will necessitate more attempted contacts; which will likely imply higher costs and more time to reach the minimum sample size required for the survey.
• Finally, over-sampling might make the problem of anonymisation of the data more acute
Questions for the TF
1. Is over-sampling the habitants of the capital cities policy relevant? Which geographical areas might be over-sampled instead?
• NUTS2 or NUTS3 regions• Groups of cities (like in Eurostat’s Urban Audit)• Densely populated areas (based on degree or urbanization)• City areas….
2. What level of accuracy is needed for the capital cities/other geographical areas?
3. What about higher non-response?
4. What about telephone surveys?