use of administrative data sources and registers in the ... · use of administrative sources and...

19
Use of administrative sources and registers in the Finnish EU-SILC survey Workshop on best practices for EU-SILC revision Marie Reijo, Senior Researcher

Upload: vulien

Post on 03-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Use of administrative sources and

registers in the Finnish EU-SILC survey

Workshop on best practices for EU-SILC revision

Marie Reijo, Senior Researcher

Content

• Preconditions for good registers utilisation

• Register use in the Finnish SILC/IDS, overview

• Register use by the Finnish SILC/IDS survey stages

Sampling design and sample selection

Weighting and unit non-response correction

Data collection and processing

Data analysis

• Integrated modules, e.g. HCFS 2013

19.12.2016

Preconditions: comprehensive and reliable register system

• Basic registers

• Major registers (incl. statistical registers by Statistics Finland)

• Statistics production and releasing by Statistics Finland

Efficient information system for collecting registers

Register-based census system created with the 1970 census, from 1987 census entirely from administrative sources

Totally register-based statistics, e.g. Statistics on taxable income since 1969, Total statistics on income distribution (TSID) since 1995

Unified identification codes, exact matching

Registers used for sample based surveys since 1970’s, HBS originates in 1966 and Income distribution statistics (IDS) in 1977 with integrated SILC 2004

Legislative basis for statistical purposes

Public approval

Best practices (Statistics Finland 2004; UN/ECE 2007; UN 2012; see also Wallgren & Wallgren 2014)

19.12.2016

Registers use in the Finnish SILC/IDS, overview

19.12.2016

Stage Sources Linkage units Methods Aim

Sampling (1. phase)

The Population Information System

- Direct use. Sampling frame, sample selection (master sample) and update

Sampling (2. phase)

The Population Information System, Taxation register

Person, household-dwelling unit

Deterministic record linkage.

Strata construction, sample selection of selected persons from the master sample by stratum.

Data collection

Several register sources.

Person, enterprise, region

Deterministic record linkage.

Auxiliary data to the sample for CATI Blaise questionnaire: data editing in interviews. Replaced interview and substitutive information for target variables: data collection for target variables.

Data processing

Several register sources.

Person, enterprise, region Person, dwelling, building, enterprise, region

Deterministic record linkage. Deterministic record linkage, and methods to derive, estimate impute and code variables, e.g. regression estimation, stratification.

Auxiliary information for interviewed data checking and editing, detecting and correcting errors (e.g. inconsistencies at unit level) for target variables. Auxiliary and substitutive information for editing, imputing of missing information for target variables. Using information combined with interviewed or register information to derive and form target variables.

Estimation, Weighting

The register data on household-dwelling population by Statistics Finland, The TSID data.

Person, household-dwelling unit, region

Deterministic record linkage, several methods, e.g. regression estimation, calibration methods.

Information for unit non-response analysis, unit non-response correction, adjusting data to the target (total) population. Using data on crucial frequencies and income and income receiver sums.

Quality analysis

Total data, e.g. TSID Person Direct use, Deterministic record linkage.

Data comparisons, unit non-response (e.g. panel attrition) and other analysis

Register sources in sampling

Registers:

Basic register: Population Information System of the Population Register Centre National Board of Taxes

Data of Statistics Finland:

19.12.2016

Persons

Buildings and

dwellings

Master sample,

Master sample by

stratum

SILC/IDS sample

Sample frame:

total data copy of

persons, buildings

and dwellings

Taxation

Registers use for two-phase stratified sampling

• Sample frame of the Population Information System, up-to-date

Persons residing permanently in Finland at the end of the year, ordered by domicile code (address)

Unified identification codes for persons

Selected systematically for the 1st phase master sample (about 50 000)

Over-coverage (persons not in the target population syt-1;31.12.) excluded, checked against updated register data

• Socioeconomic strata for the 2nd phase sample selection

Socioeconomic strata: data linked from taxation register (syt-2) to the persons living in sample person’s household dwelling unit

-> 12 strata: information on taxable income type and level, defined by the highest earner in the household-dwelling unit

SILC/IDS gross sample (about 13 500 persons) selected by simple random sampling with non-proportional allocation from strata

Use of taxation registers data for stratification ensures less biased estimates for important output measures.

19.12.2016

Register sources in weighting and unit non-response correction

Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare

......................................................................................................................................

Statistics Finland,

Data:

Statistics:

19.12.2016

Taxation

Population

data

Persons,

buildings and

dwellings

Pensions

Social insurance

Social

assistance

Education

fund

State

Treasury

Ministry of

Agriculture

and

Forestry

Household-

dwelling

units

Financial

Supervision

Authority

Treasury

Total statistic

on income

distribution

data

SILC/IDS

Registers use for weighting and unit non-response correction

• Unit non-response analysis by register data

• Calibration of non-response adjusted design weights by frequencies and sums from the household-dwelling units and TSID data by Statistics Finland (register household-dwelling population and household-dwellings syt-1;31.12 and their income for the syt-1):

Number of households

Sex * age (5-year) groups of household-dwelling population, the oldest age group 85+

Number of members in household-dwelling unit (1,2,..,6+)

Region (nuts3, Helsinki and capital area separated)

Degree of urbanisation

Sums of the 12 income components

Number of the 3 income component receivers

• Standard methods and calibration variables are used over the years

19.12.2016

Total disposable household income means by strata, 1st wave

19.12.2016

0

10

20

30

40

50

60

70

80

90

100

Source: IDS/SILC sy2015

1000 euros

Mean(sample)

Mean (designweight, non-responseadjusted)

Mean (calibratedweight)

Total disposable household income means by strata, 4th wave

19.12.2016

0

10

20

30

40

50

60

70

80

90

100

Source: IDS/SILC sy2015

1000 euros

Mean(sample)

Mean (designweight, non-responseadjusted)

Mean (calibratedweight)

Register sources in data collection and processing

Administrative registers: Population Register Centre National Board Finnish Centre Social Insurance National Institute for Other register sources: of Taxes for Pensions Institution: Health and Welfare

Statistics Finland ...................................................................................................................................... Registers,

Data:

Statistics:

19.12.2016

Taxation

Population

data

Persons,

buildings and

dwellings

Pensions

Social insurance

Social

assistance

Education

fund

State

Treasury

Ministry of

Agriculture

and

Forestry

Household-

dwelling

units

Financial

Supervision

Authority

Treasury

Total statistic

on income

distribution

data , incl.

indebtedness

SILC/IDS

Families

Business

register

Student

register

Register on

degrees

Registers use in data collection and processing

• Detecting and correcting erroneous responses for target variables during the interview. Auxiliary information is prefilled to household-dwellingI wave or housekeeping unitII-IV waves persons in the CATI/CAPI -Blaise questionnaire by exact matching. HH-memberssy t are determined first in the interview, if exact match, information is used.

• Automatic coding during the interview.

• Editing and coding interviewed data for variables in statistics’ data base system automatically programmed or manually (loaded to editing system display). Register data linked to persons (exact matching).

• Forming target variables by record linkage, e.g. data on income, or by editing or imputing non-responded items of objective type of variables by statistical methods. Exact matching.

Standard editing rules, if no changes in sources or definitions.

Consistencies of data from different sources are ensured for units.

19.12.2016

Data collection for variables from registers

• Registers use have many advantages: e.g. lower response burden and costs, better accuracy

• Assessing registers exploitation, which is efficient and sufficient enough for the SILC data quality? Relevance?

• Definitions: SILC variables vs. register variables

Opinions, subjective type of data rarely available from registers

All factual variables are not available at all from registers

Validity of factual data which are available from registers

• Comprehensiveness and completeness

• Reference time periods and time points

Register data: no information available from interview time point

• => Data consistency of multipurpose survey data in particular

Consistency within domains

Consistency between domains

• Statistical domain registers’ delay, SILC timeliness

• Coherence of statistics in statistical system

19.12.2016

Case: Income

• Almost all of the SILC/IDS income from registers, about 98−99 %

• Statistical data on household dwelling population data by Statistics Finland as base data, many comprehensive registers sources:

Earliest register received in April, others mostly in August to November

The final taxation register received in November

TSID released in December (survey year)

• Errors may possible (e.g. missing units, missing or erroneous items), then need for updated data from register providers

• Preliminary error detecting first by Data Collection Unit of Statistics Finland

• Data filled both in TSID and SILC/IDS sample data base files

Common, consistent income classification by detailed register items, information on changes beforehand for data collection and planning

Unified data compilation, e.g. edited and derived variables formed to total data and sample, apart from register files and variables. Original register, interviewed and derived variables in separate files of statistics production data base.

Contents described in meta data system.

Macro and micro checks, sample for error detecting at unit level

Early registers for interviewed data editing, checked against final data

19.12.2016

Case: Main activity

• Income from registers for calendar year, many main activity variables filter by PL031(Current=December), definitions are based on person’s own perception.

• Interviewed IDS activity months are edited against registers during the reference year: decision rules are based on income type and level and other factual information on person’s economic position.

• Overlapping activities are allowed for edited IDS months: sum = 12 or >12.

• SILC PL073 − PL090 and PL211A − PL211L: PL211L = PL031 (December).

• Final IDS months: edited to 11 % of persons

• Final December (PL031): edited to 4 % of persons

• Final PL073 − PL090: edited to 15 % of persons. The number of months for both sources were equal to 85 % of persons.

• PL211A − PL211L: Months are same for 86,5 %, errors corrected for about 2 %, if the same main activity (incl. PL031) lasted for the whole year. No other corrections.

• Consistency with SILC and IDS months, IDS months used for socio-economic groups classification.

19.12.2016

Case: Housing

• Discrepancy between household definitions (housekeeping and household-dwelling units): sharing the same dwelling (i.e. rentals) with other household, dispersing across many dwellings

• Discrepancy between interviewed and register dwellings: incl. variables irrespective of household definition (HH010, HH021):

Definitions: household’s main vs. permanently or usual residence

Measurement error, reference time: responded, registrations

Measurement error, quality: responded, registrations

• However, e.g. dwelling municipality is same for 99 %: + dwelling type (apartments or flats vs. others) for 96 %, + housing tenure for 88 %, but + number of rooms for only 50 % of the sample units(= S-R). Number of rooms differ in detached houses with 5 or more rooms.

• When detecting dwelling for all persons responsible for accommodation hb080, hb090 the dwelling municipality is same for 99 %, dwelling type 96 % of persons, no changes (see above)

• Register data is used primarily for automatic editing (erroneous, missing values) of objective type of data, linked to S-R.

• More efforts for exploitation registers? More efforts for decision rules for validating responded main dwelling of the housekeeping unit.

19.12.2016

Data analysis: systematic comparisons of estimates

• Comparisons with household-dwelling population and TSID data:

Analyzing sampling and estimation effect. Variables from registers linked to SILC/IDS sample units, adjusting away household and other definitional effects: comparisons of total sums and frequencies.

Household definition

Income discrepancy due to interviewed income items

Other discrepancies, e.g. income classifications

• Comparisons of sums, frequencies, classifications with register statistics by Statistics Finland, e.g. NA, TSID.

• Comparisons of frequencies and sums, classifications with external register statistics, e.g. the ESSPROS statistics by the National Institute for Health and Welfare

19.12.2016

Integration HFCS with SILC 2013 survey

• The Finnish SILC sample for HFCS (2nd wave) compilation.

• Clearly defined domain, related to income data

• Used many register and other statistical data sources (in addition to major registers) and many focused techniques for the hard-to-interview HFCS data:

Unit linking from registers (comprehensive sources)

Register-based estimation, imputing methods based on available data for statistical units from external sources, e.g. separate valuation, perpetual inventory method

Statistical matching from HBS by common register variables, e.g. predictive mean matching, file concatenation

Some of the wealth data, e.g. opinion types, were interviewed,

Additional variables in calibration

• Methods are developed further for the next HFCS (3rd wave) in the 2017 SILC survey, as combined with the SILC ad hoc module on wealth and consumption

19.12.2016

Thank you for your attention

19.12.2016