use of administrative data sources and registers in the ... · use of administrative sources and...
TRANSCRIPT
Use of administrative sources and
registers in the Finnish EU-SILC survey
Workshop on best practices for EU-SILC revision
Marie Reijo, Senior Researcher
Content
• Preconditions for good registers utilisation
• Register use in the Finnish SILC/IDS, overview
• Register use by the Finnish SILC/IDS survey stages
Sampling design and sample selection
Weighting and unit non-response correction
Data collection and processing
Data analysis
• Integrated modules, e.g. HCFS 2013
19.12.2016
Preconditions: comprehensive and reliable register system
• Basic registers
• Major registers (incl. statistical registers by Statistics Finland)
• Statistics production and releasing by Statistics Finland
Efficient information system for collecting registers
Register-based census system created with the 1970 census, from 1987 census entirely from administrative sources
Totally register-based statistics, e.g. Statistics on taxable income since 1969, Total statistics on income distribution (TSID) since 1995
Unified identification codes, exact matching
Registers used for sample based surveys since 1970’s, HBS originates in 1966 and Income distribution statistics (IDS) in 1977 with integrated SILC 2004
Legislative basis for statistical purposes
Public approval
Best practices (Statistics Finland 2004; UN/ECE 2007; UN 2012; see also Wallgren & Wallgren 2014)
19.12.2016
Registers use in the Finnish SILC/IDS, overview
19.12.2016
Stage Sources Linkage units Methods Aim
Sampling (1. phase)
The Population Information System
- Direct use. Sampling frame, sample selection (master sample) and update
Sampling (2. phase)
The Population Information System, Taxation register
Person, household-dwelling unit
Deterministic record linkage.
Strata construction, sample selection of selected persons from the master sample by stratum.
Data collection
Several register sources.
Person, enterprise, region
Deterministic record linkage.
Auxiliary data to the sample for CATI Blaise questionnaire: data editing in interviews. Replaced interview and substitutive information for target variables: data collection for target variables.
Data processing
Several register sources.
Person, enterprise, region Person, dwelling, building, enterprise, region
Deterministic record linkage. Deterministic record linkage, and methods to derive, estimate impute and code variables, e.g. regression estimation, stratification.
Auxiliary information for interviewed data checking and editing, detecting and correcting errors (e.g. inconsistencies at unit level) for target variables. Auxiliary and substitutive information for editing, imputing of missing information for target variables. Using information combined with interviewed or register information to derive and form target variables.
Estimation, Weighting
The register data on household-dwelling population by Statistics Finland, The TSID data.
Person, household-dwelling unit, region
Deterministic record linkage, several methods, e.g. regression estimation, calibration methods.
Information for unit non-response analysis, unit non-response correction, adjusting data to the target (total) population. Using data on crucial frequencies and income and income receiver sums.
Quality analysis
Total data, e.g. TSID Person Direct use, Deterministic record linkage.
Data comparisons, unit non-response (e.g. panel attrition) and other analysis
Register sources in sampling
Registers:
Basic register: Population Information System of the Population Register Centre National Board of Taxes
Data of Statistics Finland:
19.12.2016
Persons
Buildings and
dwellings
Master sample,
Master sample by
stratum
SILC/IDS sample
Sample frame:
total data copy of
persons, buildings
and dwellings
Taxation
Registers use for two-phase stratified sampling
• Sample frame of the Population Information System, up-to-date
Persons residing permanently in Finland at the end of the year, ordered by domicile code (address)
Unified identification codes for persons
Selected systematically for the 1st phase master sample (about 50 000)
Over-coverage (persons not in the target population syt-1;31.12.) excluded, checked against updated register data
• Socioeconomic strata for the 2nd phase sample selection
Socioeconomic strata: data linked from taxation register (syt-2) to the persons living in sample person’s household dwelling unit
-> 12 strata: information on taxable income type and level, defined by the highest earner in the household-dwelling unit
SILC/IDS gross sample (about 13 500 persons) selected by simple random sampling with non-proportional allocation from strata
Use of taxation registers data for stratification ensures less biased estimates for important output measures.
19.12.2016
Register sources in weighting and unit non-response correction
Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare
......................................................................................................................................
Statistics Finland,
Data:
Statistics:
19.12.2016
Taxation
Population
data
Persons,
buildings and
dwellings
Pensions
Social insurance
Social
assistance
Education
fund
State
Treasury
Ministry of
Agriculture
and
Forestry
Household-
dwelling
units
Financial
Supervision
Authority
Treasury
Total statistic
on income
distribution
data
SILC/IDS
Registers use for weighting and unit non-response correction
• Unit non-response analysis by register data
• Calibration of non-response adjusted design weights by frequencies and sums from the household-dwelling units and TSID data by Statistics Finland (register household-dwelling population and household-dwellings syt-1;31.12 and their income for the syt-1):
Number of households
Sex * age (5-year) groups of household-dwelling population, the oldest age group 85+
Number of members in household-dwelling unit (1,2,..,6+)
Region (nuts3, Helsinki and capital area separated)
Degree of urbanisation
Sums of the 12 income components
Number of the 3 income component receivers
• Standard methods and calibration variables are used over the years
19.12.2016
Total disposable household income means by strata, 1st wave
19.12.2016
0
10
20
30
40
50
60
70
80
90
100
Source: IDS/SILC sy2015
1000 euros
Mean(sample)
Mean (designweight, non-responseadjusted)
Mean (calibratedweight)
Total disposable household income means by strata, 4th wave
19.12.2016
0
10
20
30
40
50
60
70
80
90
100
Source: IDS/SILC sy2015
1000 euros
Mean(sample)
Mean (designweight, non-responseadjusted)
Mean (calibratedweight)
Register sources in data collection and processing
Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare
Statistics Finland ...................................................................................................................................... Registers,
Data:
Statistics:
19.12.2016
Taxation
Population
data
Persons,
buildings and
dwellings
Pensions
Social insurance
Social
assistance
Education
fund
State
Treasury
Ministry of
Agriculture
and
Forestry
Household-
dwelling
units
Financial
Supervision
Authority
Treasury
Total statistic
on income
distribution
data , incl.
indebtedness
SILC/IDS
Families
Business
register
Student
register
Register on
degrees
Registers use in data collection and processing
• Detecting and correcting erroneous responses for target variables during the interview. Auxiliary information is prefilled to household-dwellingI wave or housekeeping unitII-IV waves persons in the CATI/CAPI -Blaise questionnaire by exact matching. HH-memberssy t are determined first in the interview, if exact match, information is used.
• Automatic coding during the interview.
• Editing and coding interviewed data for variables in statistics’ data base system automatically programmed or manually (loaded to editing system display). Register data linked to persons (exact matching).
• Forming target variables by record linkage, e.g. data on income, or by editing or imputing non-responded items of objective type of variables by statistical methods. Exact matching.
Standard editing rules, if no changes in sources or definitions.
Consistencies of data from different sources are ensured for units.
19.12.2016
Data collection for variables from registers
• Registers use have many advantages: e.g. lower response burden and costs, better accuracy
• Assessing registers exploitation, which is efficient and sufficient enough for the SILC data quality? Relevance?
• Definitions: SILC variables vs. register variables
Opinions, subjective type of data rarely available from registers
All factual variables are not available at all from registers
Validity of factual data which are available from registers
• Comprehensiveness and completeness
• Reference time periods and time points
Register data: no information available from interview time point
• => Data consistency of multipurpose survey data in particular
Consistency within domains
Consistency between domains
• Statistical domain registers’ delay, SILC timeliness
• Coherence of statistics in statistical system
19.12.2016
Case: Income
• Almost all of the SILC/IDS income from registers, about 98−99 %
• Statistical data on household dwelling population data by Statistics Finland as base data, many comprehensive registers sources:
Earliest register received in April, others mostly in August to November
The final taxation register received in November
TSID released in December (survey year)
• Errors may possible (e.g. missing units, missing or erroneous items), then need for updated data from register providers
• Preliminary error detecting first by Data Collection Unit of Statistics Finland
• Data filled both in TSID and SILC/IDS sample data base files
Common, consistent income classification by detailed register items, information on changes beforehand for data collection and planning
Unified data compilation, e.g. edited and derived variables formed to total data and sample, apart from register files and variables. Original register, interviewed and derived variables in separate files of statistics production data base.
Contents described in meta data system.
Macro and micro checks, sample for error detecting at unit level
Early registers for interviewed data editing, checked against final data
19.12.2016
Case: Main activity
• Income from registers for calendar year, many main activity variables filter by PL031(Current=December), definitions are based on person’s own perception.
• Interviewed IDS activity months are edited against registers during the reference year: decision rules are based on income type and level and other factual information on person’s economic position.
• Overlapping activities are allowed for edited IDS months: sum = 12 or >12.
• SILC PL073 − PL090 and PL211A − PL211L: PL211L = PL031 (December).
• Final IDS months: edited to 11 % of persons
• Final December (PL031): edited to 4 % of persons
• Final PL073 − PL090: edited to 15 % of persons. The number of months for both sources were equal to 85 % of persons.
• PL211A − PL211L: Months are same for 86,5 %, errors corrected for about 2 %, if the same main activity (incl. PL031) lasted for the whole year. No other corrections.
• Consistency with SILC and IDS months, IDS months used for socio-economic groups classification.
19.12.2016
Case: Housing
• Discrepancy between household definitions (housekeeping and household-dwelling units): sharing the same dwelling (i.e. rentals) with other household, dispersing across many dwellings
• Discrepancy between interviewed and register dwellings: incl. variables irrespective of household definition (HH010, HH021):
Definitions: household’s main vs. permanently or usual residence
Measurement error, reference time: responded, registrations
Measurement error, quality: responded, registrations
• However, e.g. dwelling municipality is same for 99 %: + dwelling type (apartments or flats vs. others) for 96 %, + housing tenure for 88 %, but + number of rooms for only 50 % of the sample units(= S-R). Number of rooms differ in detached houses with 5 or more rooms.
• When detecting dwelling for all persons responsible for accommodation hb080, hb090 the dwelling municipality is same for 99 %, dwelling type 96 % of persons, no changes (see above)
• Register data is used primarily for automatic editing (erroneous, missing values) of objective type of data, linked to S-R.
• More efforts for exploitation registers? More efforts for decision rules for validating responded main dwelling of the housekeeping unit.
19.12.2016
Data analysis: systematic comparisons of estimates
• Comparisons with household-dwelling population and TSID data:
Analyzing sampling and estimation effect. Variables from registers linked to SILC/IDS sample units, adjusting away household and other definitional effects: comparisons of total sums and frequencies.
Household definition
Income discrepancy due to interviewed income items
Other discrepancies, e.g. income classifications
• Comparisons of sums, frequencies, classifications with register statistics by Statistics Finland, e.g. NA, TSID.
• Comparisons of frequencies and sums, classifications with external register statistics, e.g. the ESSPROS statistics by the National Institute for Health and Welfare
19.12.2016
Integration HFCS with SILC 2013 survey
• The Finnish SILC sample for HFCS (2nd wave) compilation.
• Clearly defined domain, related to income data
• Used many register and other statistical data sources (in addition to major registers) and many focused techniques for the hard-to-interview HFCS data:
Unit linking from registers (comprehensive sources)
Register-based estimation, imputing methods based on available data for statistical units from external sources, e.g. separate valuation, perpetual inventory method
Statistical matching from HBS by common register variables, e.g. predictive mean matching, file concatenation
Some of the wealth data, e.g. opinion types, were interviewed,
Additional variables in calibration
• Methods are developed further for the next HFCS (3rd wave) in the 2017 SILC survey, as combined with the SILC ad hoc module on wealth and consumption
19.12.2016