theoretical solutions - methods of integrating data from different … · 1. preparation of...
TRANSCRIPT
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
Theoretical solutions - methods of integrating data from different
sources (deterministic, stochastic) based on best solutions
developed and identified for selected variables (marital status).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
1
Table of contents
Introduction ................................................................................................................................................... 2
1. Data linkage procedure ..................................................................................................................... 5
1.1. Sources of data .............................................................................................................................. 5
1.1.1. Analysis of information resources in PBSSP .......................................................................... 5
1.2. Outline of data linkage .................................................................................................................. 8
1.2.1. Representative surveys used in the project .......................................................................... 8
1.2.2. Integration of data using PESEL ........................................................................................... 11
1.2.3. Data integration using a created key .................................................................................. 13
1.2.4. Method of integrating representative surveys ................................................................... 15
2. Data quality ..................................................................................................................................... 18
2.1. Standardisation of data before integration ................................................................................. 18
2.1.1. Standardisation of register data .......................................................................................... 18
2.1.2. Standardisation of data in representative surveys ............................................................. 20
2.2. Quality of administrative registers .............................................................................................. 24
2.2.1. Marital status register and information systems on divorce and separation rulings ......... 24
2.2.2. The National System for Monitoring Family Benefits and the Central Register of Data on Beneficiaries of the Alimony Fund ...................................................................................................... 26
3. Quality assessment of the integrated data sets .............................................................................. 27
3.1. Quality of survey data integration............................................................................................... 27
3.1.1. Quality of the linkage between the PESEL register and the surveys ................................... 27
3.1.2. Detailed analysis of linkage quality ..................................................................................... 29
4. Estimation of the actual marital status using the integrated sets .................................................. 34
4.1. Determination of the actual marital status ................................................................................. 34
4.1.1. Description of the algorithm ............................................................................................... 34
4.1.2. Analysis of the undetermined marital status ...................................................................... 37
4.2. Estimation of the actual marital status ....................................................................................... 39
4.2.1. Post-stratification of the actual marital status .................................................................... 39
4.2.2. Analysis of actual marital status estimates ......................................................................... 41
Summary ..................................................................................................................................................... 46
List of figures ............................................................................................................................................... 48
List of Tables ................................................................................................................................................ 49
A Detailed tables - Information on the actual marital status ............................................................. 51
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
2
Introduction
This report is a summary of works carried out in the course of the following tasks:
1. Preparation of theoretical solutions – methods of integrating data from different sources
(deterministic and stochastic integration methods) based on best solutions developed and
identified (for selected variables).
2. Preparation of data imputation methods using information from administrative registers
and surveys.
3. Description of methods of calibrating data.
4. Testing of methods – empirical, pilot application of the developed methods of integrating
data from selected sources.
The main task was to estimate the actual marital status on the basis of existing, available
sources of data. The levels of the legal marital status include: (1) single; (2) married; (3)
surviving spouse; (4) divorced; (5) undetermined. On the other hand, in the case of the actual
marital status, the following additional levels were distinguished: (6) partner (cohabitation) and
(7) (legally) separated.
A number of assumptions were adopted in the project, which are reflected in this final report.
The most important are:
the reference date is 30.04.2017 (the day of the last update of the PESEL register, which
was made available for the survey),
the target population– population aged 15 + (Polish citizens),
the empirical study was only conducted for data from Wielkopolskie Province, and the
results are presented at the level of districts (LAU 1 level),
both deterministic and probabilistic linkage was used, which was described in detail in a
separate report dedicated to the review of literature related to data integration methods,
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
3
datasets which were made available contained only the necessary variables,
the issue of determining the current place of residence on the basis of registers was not
addressed,
the PESEL register was used as the main register, to which other information was linked,
the data linkage process was separated into three stages:
the first stage involved the linkage of available data from representative surveys
carried out by official statistics,
the second stage involved the linkage of information from administrative registers
which were made available during the project works,
during the third stage data obtained in the first and the second stage were linked to
the PESEL register.
This process is described in detail in Section 1.2 and illustrated in Figure 1.
the legal marital status was determined using data from the PESEL register, which was
updated with information from other registers and two representative surveys, taking into
account the time of update (from the register) or the interview date (in the case of
surveys),
the actual marital status was determined using two sources:
administrative registers – e.g. the "cohabiting partner" status,
representative surveys – e.g. the degree of kinship with the household head, and
information about the type of relationship, separations.
the "undetermined" category was treated as missing data, which was then reweighted.
The datasets combined during the project works contained only the relevant variables required
to complete the project task, namely to estimate the actual marital status of the target
population. All calculations were performed using the R statistical package and RStudio
environment.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
4
The current version of the report consists of four chapters, which describe the procedure of
data linkage (Chapter 1), input data quality (Chapter 2), the quality of linking data from
representative surveys, and the PESEL register, as well as selected results concerning the marital
status (Chapter 3), and the estimation of the marital status (Chapter 4). The authors are aware
of the limitations resulting from the sole reliance on the PESEL register as the main source to
which other registers and representative surveys were linked, but the approach presented in
the report was based on the sets that had been made available. Since the main goal of the
project to conduct integration and estimation on the basis of several data sources, the
researchers focused mainly on these aspects.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
5
1. Data linkage procedure
1.1. Sources of data
1.1.1. Analysis of information resources in PBSSP
Preparation for the completion of tasks planned in the VIP ADMIN project was preceded by an
in-depth analysis of both the statistical survey program of official statistics (PBSSP) and the
statistical analysis programme (POS). The purpose of this analysis was to identify datasets
containing information about the marital status. Given information about the identifier of the
survey and datsets used for its implementation, it was possible to browse the catalogue of
datasets of the National Statistical Information Data Records (ISODS) and locate an appropriate
dataset with its ISODS identifier, provided that it existed in the resources. As a result, it was
possible to compile a list of datasets that could be useful for purposes of conducting the tasks
under the VIP ADMIN project (Table 1). The list contains information about the source number
in the POS, the data administrator, the location of the dataset and the reference date for the
data.
Table 1. List of identified data sets containing information on the marital status.
Data source number (PBSSP) Data administrator
Dataset description from PBSSP 2016 Identification information Data as at
1.21.01-01-16 registry offices vital records (births) Data maintained by Statistical Office in Olsztyn
2010-12-31 2011-12-31 2012-12-31 2013-12-31 2014-12-31 2015-12-31 2016-12-31
1.21.02-01-16 registry offices vital records (marriages) Data maintained by Statistical Office in Olsztyn
2010-12-31 2011-12-31 2012-12-31 2013-12-31 2014-12-31 2015-12-31 2016-12-31
1.21.02-03-16 Ministry of Digitisation
PESEL register ISODS-1359
ISODS-1167
2017-02-16 2015-12-31
1.21.02-04-16 district courts information systems containing divorce and separation rulings
Data maintained by Statistical Office in Olsztyn
2010-12-31 2011-12-31 2012-12-31
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
6
2013-12-31 2014-12-31 2015-12-31 2016-12-31
1.21.03-03-16 commune offices registers of inhabitants and resident registers of foreigners
Data maintained by Statistical Office in Olsztyn
2010-12-31 2011-12-31 2012-12-31 2013-12-31 2014-12-31 2015-12-31 2016-12-31
1.21.09-01-16 registry offices vital records (deaths) Data maintained by Statistical Office in Olsztyn
2010-12-31 2011-12-31 2012-12-31 2013-12-31 2014-12-31 2015-12-31 2016-12-31
1.21.09-03-16 Ministry of Digitisation
PESEL register ISODS-1360 ISODS-1166 2017-02-16 2015-12-31
1.25.11-01-16 Ministry of Family, Labour and Social Policy
National System for Monitoring Social Assistance
ISODS-1203 ISODS-1204 ISODS-1205 ISODS-1206 ISODS-1399 ISODS-1400 ISODS-1401 ISODS-1402
2015-03-31 2015-06-30 2015-09-30 2015-12-31 2016-03-31 2016-06-30 2016-09-30 2016-12-31
1.25.11-04-16 district labour offices registers of unemployed and job-seekers
pup2015_osoby pup2016_osoby
2015-12-31 2016-12-31
1.25.15-01-16 Ministry of Family, Labour and Social Policy
National System for Monitoring Family Benefits
ISODS-1297 ISODS-1298 ISODS-1299 ISODS-1300 ISODS-1403 ISODS-1404 ISODS-1405 ISODS-1406
2015-03-31 2015-06-30 2015-09-30 2015-12-31 2016-03-31 2016-06-30 2016-09-30 2016-12-31
1.25.15-04-16 Ministry of Family, Labour and Social Policy
Central register of data on alimony fund beneficiaries
ISODS-1274 ISODS-1275 ISODS-1276 ISODS-1277 ISODS-1407 ISODS-1408 ISODS-1409 ISODS-1410
2015-03-31 2015-06-30 2015-09-30 2015-12-31 2016-03-31 2016-06-30 2016-09-30 2016-12-31
1.80.02-01-16 The Ministry of Digitisation
PESEL register ISODS-1244
ISODS-1434
2015-12-31 2017-05-04
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
7
The next stage of work in the VIP ADMIN project consisted in evaluating the collected datasets
taking into account their suitability for purposes of integration. After the content of all datasets
had been analysed, only those resources were selected that included the required variables
(PESEL identifier, date of birth, full address data), which are essential in order to create a
database containing information on the marital status. Information resources which meet the
these criteria include:
Vital records:
marriages,
births,
deaths,
Information systems containing divorce and separation rulings,
The National System for Monitoring Family Benefits ,
The Central Register of Data on Beneficiaries of the Alimony Fund,
The PESEL Register.
The key dataset in the list above is the PESEL register, to which information from other
information resources are to linked. These resources can be divided into two types: datasets
containing the PESEL variable (unique identification number for each person) and resources
without this variable. The first group includes vital records and information systems with divorce
and separation rulings.
The second group includes the National System for Monitoring Family Benefits and the Central
Register of Data on Beneficiaries of the Alimony Fund. Depending on which group a given
dataset belongs to, it is linked either using the PESEL identifier or by means of a key created by
combing the address and the date of birth, assuming exact linkage (an exact match on all
identifiers).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
8
1.2. Outline of data linkage
1.2.1. Representative surveys used in the project
As in the case of administrative registers, the first stage was to indicate potential representative
surveys which contain information on the legal marital status and the actual status (e.g.
informal relationship). After analysing the PBSSP and POS, the following representative surveys
were selected: The Household Budget Survey (HBS), the European Union Statistics on Income
and Living Conditions (EU-SILC), the Labour Force Survey (LFS) and the Household Condition
Survey (HCS). Table 2 contains information about these data sets.
Table 2. List of surveys used for the purposes of the VIP ADMIN project
Data source number (PBSSP)
Data administrator Dataset description Identification information Refernce years
1.25.01(063) Central Statistical Office, Social Surveys and Living Conditions Department
HBS (data on housing and population)
None; the appropriate department needs to be contacted
2015–2016
1.25.08(066) Central Statistical Office, Social Surveys and Living Conditions Department
EU-SILC (data on housing and population)
None; the appropriate department needs to be contacted
2015–2016
1.25.02(064) Central Statistical Office, Social Surveys and Living Conditions Department
Condition of households (data on housing and population)
None; the appropriate department needs to be contacted
2015–2016
1.23.01(043) Central Statistical Office, Social Surveys and Living Conditions Department
LFS (data on housing and population)
None; the appropriate department needs to be contacted
2015–2016
In keeping with the goal of the project, the following questions addressed to members of
households were identified as relevant (original wording of the questions, numbers of forms
from http://forms.stat.gov.pl/BadaniaAnkietowe/2017/harmonogram.htm):
1. With regard to the degree of kinship with the household head:
Degree of kinship or relationship with the reference person (HBS BR-01a),
Degree of kinship or relationship with the household head (EU-SILC EU-SILC-G),
Degree of kinship with the household head (LFS ZG).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
9
2. With regard to the informal relationship:
Are you married to or in an informal relationship with a person in this household?
(HBS BR-01a),
Do you live in a relationship with a person in this household? (EU-SILC EU-SILC-G),
Do you live in a relationship with a person in this household? (HCS).
3. With regard to the legal status – the legal marital status (HBS BR-01a, EU-SILC EU-SILC-G,
HCS, LFS ZG) – it should be noted that, in most cases, the legal marital status contains
information on legal separation, which is a category of the actual marital status.
The answer options for the questions addressed to respondents differed and had to be
standardised before integration. The standardisation procedure is described in section [marital-
status-coding].
The final report contains results of integrating data from two representative surveys – the
Labour Force Survey (LFS) and the European Union Statistics on Income and Living Conditions
(EU-SILC). In both cases, two separate datasets were obtained containing information about
sampled dwellings, individual persons and households.
It should be noted that information about informal relationships refers only to members of the
household. Based on what we know about current representative surveys, there are no
questions concerning informal relationships with persons outside the household. The only
exception is the Social Cohesion Survey but it cannot be used to update administrative sources
because it is not conducted every year.
To make the data linkage method easier to understand, it is illustrated in a schematic diagram
shown in Figure 1. The diagram includes information from both representative surveys and
available registers.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
10
Figure 1. Schematic diagram of linking administrative registers and representative surveys, where PESEL is the main register . Continuous lines denote deterministic linkage (using a linkage key), and dashed lines denote linkage using an artificial key or probabilistic linkage
The first stage of linkage involved the creation of one database for individual datsets. For
example, the EU-SILC and the LFS were combined into one set containing all respondents for
whom information on the legal and actual marital status was available. Such a linkage is
acceptable, since the likelihood that a given person is present in both surveys is low and
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
11
individual persons cannot be identified. We should note that for both surveys it was first
necessary to link information about sampled dwellings and respondents.
The same approach was adopted in the case of registers: integrated databases were created
combining information from all the datasets obtained for the purposes of the project. Both the
Central Register of Data on Beneficiaries of the Alimony Fund and the National System for
Monitoring Family Benefits had been made available as sets of quarterly data. In each case, the
quarterly datasets were combined into one set, which was then deduplicated. Similar
procedures were applied in the case of sets from the marital status register and information
systems maintained by district courts with data for the years 2010–2016.
After creating datasets containing different registers and surveys (labelled as databases in the
figure) individual records were linked. In the case of registers containing PESEL, deterministic
linkage was used (continuous line), other datasets were linked using an artificial key or
probabilistic linkage (dashed line).
1.2.2. Integration of data using PESEL
The registers integrated in the first place were those including the PESEL variable, which is
crucial for deterministic data integration. These registers contained data for the period of seven
years, namely from 2010 to 2016. Because the PESEL variable was missing for relevant
categories (spouses, persons in separation, divorcing persons, parents and surviving spouses) in
some years, only data for selected years were used:
marriages contracted in the years 2011–2016,
divorces in 2016,
separations in 2016,
births in the years 2011–2016,
deaths in the years 2015–2016.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
12
After deciding that resources containing information on the marital status could be used in the
project, it was necessary to determine the marital status for each person included in the
selected datasets. Datasets spanning more than one year were first merged; further actions
were identical for all data sets. Records for which PESEL was missing or which did not consists of
eleven digits were rejected. Then, five separate strata were created, each with a unique PESEL
number and information about the date of a specific event assigned to it (marriage, divorce,
birth of a child, separation and death of the spouse). Next, from each stratum the latest date of
an event was selected, e.g. for persons marrying more than once in the years 2011–2016, it was
the latest date of marriage, for persons who got divorced more than once, it was the latest date
of divorce, for persons in separation – the latest date of separation, for persons who became
parents – the latest date of birth and the parent’s marital status at the time of the child’s birth,
for datasets with "deaths", it was the latest date of the spouse’s death. This procedure was
necessary because the same person could have changed the marital status many times in the
reference period. In this procedure it is important to correctly determine which event was the
last one, and thus properly assign the marital status to a given person. For this purpose, a
unique PESEL stratum was created, consisting of the five strata distinguished earlier, and then,
the PESEL identifier was used to assign information collected in the course of previous activities,
i.e. information about the date and the type of event. As a result, a marital status database was
created, containing "the history of events" collected from the marital status register and
information systems containing divorce and separation, recorded for each person. This set
contained eleven variables: the PESEL identifier, five variables representing the occurrence of a
given event (marriage, divorce, birth of a child, etc.) and five other ones, containing respective
dates of each occurrence. One record (person) could contain information from up to five
sources (marriages, divorces, births, deaths, separations). If a given event did not occur for a
given person, there was no entry and the field was marked as missing data. In the next step, the
stratum described above could be combined with the PESEL register.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
13
1.2.3. Data integration using a created key
The purpose of the next stage of works was to integrate the PESEL register with other
administrative registers which do not have the PESEL variable but include information on the
date of birth and full address data (TERYT code of the commune of residence (LAU 2 level),
name of locality, street, building/dwelling number). Datasets containing a set of such variables
can be integrated using a key consisting of a combination of variables. The following datasets
were taken into account:
The National System for Monitoring Family Benefits,
The Central Register of Data on Beneficiaries of the Alimony Fund,
The next step consisted in preparing datasets, that is, first of all, standardising and transforming
variables to be used as the linkage key. In the first place, it was necessary to determine the
unique population of persons receiving family benefits and benefits from the alimony fund. As
already mentioned, this information is collected quarterly, which means that the number of
beneficiaries in particular quarters varied. Some persons received financial assistance in all
quarters in the two reference years, while others only in some quarters. In order to select the
unique population of persons, data from all quarters from the years 2015–2016 were combined
into one dataset (separately for persons receiving family benefits and alimony benefits).
Table 3. The KSMŚR database - created using data from the National System for Monitoring Family Benefits for the years 2015–2016
Year Quarter Number of
persons The share of persons in the general
population of the created set (%)
2015 1 19 590 2.42
2015 2 26 358 3.26
2015 3 32 710 4.04
2015 4 64 487 7.97
2016 1 34 865 4.31
2016 2 33 383 4.13
2016 3 45 976 5.68
2016 4 551 654 68.19
Total 809 023 100.00
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
14
Table 4. The FA database - created using data from the Central Register of Data on Beneficiaries of the Alimony Fund for the years 2015–2016.
Year Quarter Number of
persons The share of persons in the general population of the created set (%)
2015 1 2 080 1.59
2015 2 9 585 7.33
2015 3 7 407 5.67
2015 4 3 850 2.94
2016 1 3 770 2.88
2016 2 3 368 2.58
2016 3 8 483 6.49
2016 4 92 197 70.52
Total 130 740 100.00
During the next stage a personal identifier was created, consisting of the date of birth, TERYT
code of the commune of residence, name of locality, street, building/dwelling number. Then,
using the identifier, the researchers selected only these persons who, in particular quarters of a
given year, occurred only once. Then, from this database, the researchers selected unique
persons with information on the marital status in the latest quarter. For instance, if a given
person occurred in all quarters, then the latest information on their marital status was the entry
for the 4th quarter of 2016, which is the record that was selected. By applying this procedure,
two data sets were created – one with information from the National System for Monitoring
Family Benefits (the KSMŚR database) and the second one – with information from the Central
Register of Data on Beneficiaries of the Alimony Fund (the FA database), containing only
information on the marital status of persons and variables necessary to link the datasets. The
number of persons in a given quarter and year, and their percentage share in the general
number of persons for both sets are presented in Table 3 and Table 4.
The next step was the integration of data from the sets obtained as a result of the above
procedures - KSMŚR and FA databases - with the PESEL dataset. These datasets were linked
using a linkage key consisting of the following variables: date of birth, TERYT code of the
commune of residence, name of locality, street, building/dwelling number. However, before
the datasets could be integrated, these variables had to be standardised and harmonised to
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
15
remove inconsistencies in TERYT codes and names of localities between the datasets, which are
described in section 2.1.1
1.2.4. Method of integrating representative surveys
The following statistical datasets were made available, some of which were used during the first
stage of the project:
LFS – data for persons (2015–2016), data for dwellings (2015–2016),
HCS – data for persons (2015–2016), data for dwellings (2015–2016),
EU-SILC – data for persons (2015–2016), data for dwellings (a part of 2015 and 2016),
HBS – data for persons (2015–2016) data for dwellings (2016).
Some respondents in the EU-SILC dataset, who had participated in the survey multiple times,
may have had a different address of residence in 2016, compared to 2015. The EU-SILC
database contained the current address of residence of respondents updated for 2016. This
means that there was no full information about the history of residence for 2015.
The HBS dataset contained data for dwellings only for 2016, since, according to information
provided by the administrator, in 2015 data were not recorded in an electronic form. As a result,
the dataset was had to limited to persons surveyed in 2016.
In the case of the HCS survey, there was a problem with identifiers of dwellings. The dataset
containing addresses for one year included dwellings and households with non-unique
identifiers. For this reason, no attempt was made to integrate the HCS dataset with other
sources at this stage; only the LFS and EU-SILC datasets were used. Table 5 presents information
about the size of sample for Wielkopolskie province in each of two surveys.
Table 5. Realised sample size in Wielkopolskie Province in LFS and EU-SILC (unique records)
Year Set LFS EU-SILC
2015 persons 10 344 2 520
dwellings 4 269 815
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
16
2016 persons 8 566 2 442
dwellings 3 642 810
Because of missing identifiers in the representative surveys, it was necessary to identify
variables that could be used to link information with other sources. Such variables can be
divided into two groups – variables related to dwellings and those related to persons. Variables
in the first group indicate which dwellings were sampled; variables in the second one describe
person characteristics which were used for linkage.
The first set included (1) commune code, (2) street name, (3) building number (4) dwelling
number. Street names were standardised by removing abbreviations such as "ul.", "Al." or "os."
and then converted into lowercase to make sure that the letter size does not affect the linkage
process. Variables relating to persons included (1) sex, (2) year of birth, (3) month of birth and
(4) day of birth. These data were also standardised, e.g. numbers of months were recorded as
integers instead of text strings (e.g. '01', '02').
The representative surveys required probabilistic linkage, which consists in pairwise comparison
of records from a larger set with those from a smaller set. To reduce the number of
combinations, the following information was used for blocking:
commune code,
name of locality,
building number,
dwelling number,
sex,
year of birth,
month of birth.
Then, it turned out that street names and days of birth may differ. In the case of street names,
the problem was solved by using the method of comparing distances between texts - Jaro-
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
17
Winkler function. The reason why it was necessary to conduct such comparisons was that some
street names had various versions, e.g. "powstańców wielkopolskich" and "powstańców wlkp.",
"św. marcin" and "święty marcin" or "księdza serafina opałki" and "ks. serefina opałki". It should
be noted that before the linkage procedure, street names were iteratively standardised, which
is why this variable is not included on the list of variables with possible differences.
Unfortunately, given the number of possible variants and time constraints, we were not able to
fully standardise addresses between the surveys and the PESEL register.
Inconsistencies in the day of birth can be attributed to errors made by interviewers while
entering imprecise information for a given respondent or entering their own value if the
respondent could not indicate the exact date of birth (e.g. there were records where all persons
from the household had the same date of birth).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
18
2. Data quality
2.1. Standardisation of data before integration
2.1.1. Standardisation of register data
Standardisation of TERYT
The PESEL dataset contains separate TERYT codes for urban and rural parts of urban-rural
communes and separate districts of the five largest Polish cities: Warsaw, Kraków, Łódź,
Wrocław and Poznań (Table 6). Standardisation of the TERYT code consisted in finding those
records for which the TERYT code contained "4" or "5" in the last (seventh) position and then
replacing this last digit with "3". For example: in the PESEL register the town of Czempiń is
classified as an urban-rural commune with the code 3011024, while its rural part has the code
3011025. After standardisation, both territorial units receive the same general code, which
denotes an urban-rural commune, namely 3011023. Codes for districts of Warsaw, Krakow,
Łódź, Wrocław and Poznań were changed in a similar way - the specific code digits were
replaced with the general code ending in "011". Thus, the first 4 digits of the code remained
unchanged, while the subsequent three were changed to "011". In the case of Poznań, the
TERYT code was changed as follows: Poznań consists of five districts: Poznań-Grunwald
(3064029), Poznań-Jeżyce (3064039), Poznań-Nowe Miasto (3064049), Poznań-Stare Miasto
(3064059), Poznań-Wilda (3064069). The first four digits were retained ("3064"), while the three
subsequent ones were replaced with "011"; in this way all districts of the city of Poznań
received the general code for this city, i.e. 3064011. The same procedure was applied for the
other four cities.
TERYT standardisation in the dataset consisting of data from the Central Register of Data on
Beneficiaries of the Alimony Fund (the FA database - Table 7) and from the National System for
Monitoring of Family Benefits (the KSMŚR database, Table 8) involved the creation of a TERYT
commune identifier by combining two digits of the NTS code starting from the second position
and five digits starting from the sixth position (located in column nts_kod_QUIC and
nts_kod_ALIM).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
19
Table 6. The TERYT commune code and the name of the locality in the PESEL register
adr_meld_kod_gmn_PESEL adr_meld_naz_msc_PESEL
2261011 GDAŃSK
2605085 WÓLKA ZYCHOWA
1213072 POLANKA WIELKA
2465011 DĄBROWA GÓRNICZA
1437015 KOCEWO
2207011 KWIDZYN
1425011 PIONKI
1061069 ŁÓDŹ-WIDZEW
3064029 POZNAŃ-GRUNWALD
1061069 ŁÓDŹ-WIDZEW
2404042 WANATY
1009014 DZIAŁOSZYN
3211044 POLICE
1219022 WINIARY
0219084 ŻARÓW
0264029 WROCŁAW-FABRYCZNA
1816025 BŁAŻOWA GÓRNA
Standardisation of names of localities
Names of localities recorded in the PESEL dataset for Warsaw, Krakow, Łódź, Wrocław and
Poznań include the names of city districts, e.g. Poznań-Grunwald. For this reason, to match the
standardised TERYT codes (0264011, 1061011, 1261011, 1465011, 3064011), the corresponding
names of localities were changed accordingly: Wrocław, Łódź, Krakow, Warsaw, Poznań. In the
names of variables used in the linkage key, spaces were removed and capital letters were
changed to lowercase.
Table 7. Standardised locality names and TERYT commune codes in the Central Register of Data on Beneficiaries of the Alimony Fund (the FA database)
adr_naz_msc_ALIM nts_kod_ALIM
KRAKOW 5126261011PP
ŁÓDŹ 5106261011PP
POZNAŃ 5306264011PP
WARSAW 5146265011PP
WROCŁAW 5026264011PP
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
20
Table 8. Standardised locality names and TERYT commune codes in the National System for Monitoring Family Benefits (the KSMŚR database)
adr_naz_msc_QUIC nts_kod_QUIC
KRAKOW 5126261011PP
ŁÓDŹ 5106261011PP
POZNAŃ 5306264011PP
WARSAW 5146265011PP
WROCŁAW 5026264011PP
The operations described above were performed for the PESEL dataset, the dataset containing
information from the National System for Monitoring of Family Benefits (the KSMŚR database)
and the dataset with information from the Central Register of Data on Beneficiaries of the
Alimony Fund (the FA database).
The variables (TERYT code and locality name) in these datasets had to be standardised because
in order to enable the integration of these dataset, and therefore had to have the same format.
Without these three operations, any attempt to link the PESEL dataset with the other two
would have resulted in a substantial loss of information and would have made it impossible to
fully exploit the potential which these sets undoubtedly have.
2.1.2. Standardisation of data in representative surveys
Standardisation of street and locality names
Like in the case of administrative registers, it was also necessary to standardise TERYT codes,
names of streets and localities in datasets with survey data.
Names of localities were standardised in the first place to match names included in the PESEL
register. Before standardising street names, all abbreviations were removed, such as ul., Al. or
os. to match the format used in the PESEL register. Data in both sets were standardised and
converted to lowercase, unnecessary spaces were removed, and NA values were entered where
street name was missing. In the course of this procedure, more than ten records had been
identified that existed only in representative surveys and were not observed in the PESEL
register. Table 9 presents an example of a locality that existed only in the LFS and the EU-SILC. In
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
21
the Table its occurrence is denoted by YES, while the number in brackets indicates the number
of records associated with that locality. The table is the result of, among other things, analysing
whether the TERYT code matches the locality or the street name from the LFS and the EU-SILC,
in comparison with the PESEL register. Problematic communes and localities were excluded
from further analysis.
Table 9. List of localities observed only in the surveys (different TERYT code)
TERYT Locality LFS EU-SILC
3003105 piłka YES (1)
3003105 popielarze YES (1)
3006012 panienka YES (13)
3007085 szulec YES (8)
3010015 kolebki YES (5)
3010062 brzeźno-parcele YES (4)
3020032 czarnuszka YES (6)
3023052 dolina YES (5)
3027075 józinki YES (3)
3030012 janowo YES (2)
Coding of the marital status
All the above surveys contained information on the legal marital status, which had been elicited
from respondents using questions with the same wording (no measurement error). On the
other hand, the answer options in the LFS and EU-SILC (and other surveys not considered in this
report) differed. Table 10 presents the coding of the marital status in the LFS and the EU-SILC. It
should be noted that the EU-SILC distinguishes between divorced persons and those in a state of
(legal) separation. Nonetheless, codes for levels (in the first column) were different and needed
to be standardised. Table 11 presents sample sizes in the LFS and EU-SILC for codes defined in
Table 10 (before standardisation).
Table 10. Coding of legal marital status in LFS and EU-SILC
Code LFS EU-SILC
1 single man, single woman single man/woman
2 married man, married woman married man/married woman
3 widower, widow in legal separation
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
22
4 divorced, in separation widower/widow
5 – divorced man/woman
Table 11. Size of LFS and EU-SILC sample for Wielkopolskie province by sex and marital status (combined unweighted samples, before standardisation)
Year Marital Status Men Men (%)
Women Women (%)
LFS
2015 1 2434 28.63 1913 19.81
2 5590 65.74 5697 58.98
3 296 3.46 1664 17.23
4 185 2.18 385 3.99
2016 1 2050 29.23 1595 20.40
2 4524 64.50 4581 58.59
3 262 3.74 1283 16.41
4 178 2.54 360 4.60
EU-SILC
2015 1 300 31.15 288 20.63
2 605 62.82 609 55.11
3 1 0.10 2 0.18
4 26 2.70 211 19.10
5 31 3.22 55 4.98
2016 1 276 29.21 197 18.67
2 608 64.34 609 57.73
3 1 0.11 3 0.28
4 31 3.28 192 18.20
5 29 3.07 54 5.12
It should be noted that the EU-SILC sample for Wielkopolskie province included only 3–4
respondents declaring legal separation. For this reason, results for this level should be
interpreted with caution.
There was a similar problem with respect to the variable defining the degree of kinship with the
household head. Even the question respondents were asked was different. In the LFS, it read:
"The degree of kinship with the head of the household" while in EU-SILC: "The degree of kinship
or relationship with the head of the household".
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
23
Answers in the LFS were more aggregated compared to those in the EU-SILC. For example
"grandfather, grandmother, granddaughter, grandson, great-grandson, great-granddaughter",
versus "grandfather, grandmother" / "granddaughter, grandson", or "uncle, aunt, further
relative" versus "other relative". Nonetheless, the level which was of most interest to us in the
context of the actual marital status, i.e. “partner”, was coded in the same way in both surveys.
Table 12. Coding of the degree of kinship with the head of the household in LFS and EU-SILC
Code LFS EUSILC
1 head of the household head
2 husband, wife husband, wife
3 partner partner
4 son, daughter son, daughter
5 son-in-law, daughter-in-law father, mother
6 father, mother, father-in-law, mother-in-law father-in-law, mother-in-law
7 grandfather, grandmother, granddaughter, grandson, great-grandson, great-granddaughter
grandfather, grandmother
8 brother, sister son-in-law, daughter-in-law
9 uncle, aunt, further relative brother, sister
10 not related family member (e.g. home help) granddaughter, grandson
11 – other relative
12 – other person
Table 13 presents information about the sample size broken down by survey, year and sex,
including information whether or not the respondent is in a cohabiting union with the head of
the household. The size of the sample in both surveys is very small and account for about 2% of
the all respondents surveyed in the entire province.
Table 13. The size of LFS and EU-SILC samples for Wielkopolskie province by sex and information on whether the respondent’s partner is the household head (combined unweighted samples)
Year Partner? Men Men (%) Women Women (%)
LFS
2015 NO 8446 99,33 9548 98.85
YES 57 0.67 111 1.15
2016 NO 6953 99,13 7666 98,04
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
24
YES 61 0.87 153 1.96
EU-SILC
2015 NO 946 98.23 1088 98,46
YES 17 1,77 17 1.54
2016 NO 927 98,10 1037 98,29
YES 18 1.90 18 1.71
2.2. Quality of administrative registers
2.2.1. Marital status register and information systems on divorce and separation rulings
When analysing the collected datasets, one cannot ignore the problem of data quality in the
context of operations that are to be carried out on them. The quality of input data contained in
all the datasets affects the quantity and quality of information obtained in the output. For
datasets where the PESEL number is the linkage key, the most important requirement is that the
number of missing values for this variable should be as small as possible. In addition, it is
necessary to check its format to make sure that it consists of eleven digits. Only records that
meet these two requirements can be used in further analysis. This verification was conducted
for datasets with information on marriages, divorces, separations, births and deaths, collected
in the period 2010–2016. The data were checked for missing PESEL values and cells not
containing eleven digits. Tables presented below provide the completeness of the PESEL
variable in the marital status register and information systems on divorce and separation rulings
(Table 14, Table 15, Table 16, Table 17, Table 18).
Table 14. Completeness of PESEL variable in the datasets on marriages in the period 2010–2016 (in %)
Item Year PESEL Number of
records %
1 2010 missing 8 705 1.91
2 2010 present 447 969 98.09
𝛴 456 674 100.00
3 2011 missing 8 032 1.95
4 2011 present 404 910 98.05
𝛴 412 942 100.00
5 2012 missing 3 980 0.98
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
25
6 2012 present 403 720 99.02
𝛴 407 700 100.00
7 2013 missing 3 981 1.10
8 2013 present 356 811 98.90
𝛴 360 792 100.00
9 2014 missing 4 015 0.14
10 2014 present 372 961 99.86
𝛴 376 976 100.00
11 2015 missing 3 102 0.82
12 2015 present 374 562 99.18
𝛴 377 664 100.00
13 2016 missing 3 184 0.82
14 2016 present 383 726 99.18
𝛴 386 910 100.00
Table 15. Completeness of PESEL variable in dataset on divorces in 2016 (in %)
Item Year PESEL Number of records %
1 2016 missing 86 869 68.40
2 2016 present 40 125 31.60
𝛴 126 994 100.00
Table 16. Completeness of PESEL variable in the dataset on separations in 2016 (in %)
Item Year PESEL Number of
records %
1 2016 missing 2 389 64.92
2 2016 present 1 291 35.08
𝛴 3 680 100.00
Table 17. Completeness of PESEL variable in datasets on births in the period 2010–2016 (in %)
Item Year PESEL Number of records %
1 2010 missing 27 474 3.32
2 2010 present 799 126 96.68
𝛴 826 600 100.00
3 2011 missing 23 829 3.07
4 2011 present 753 003 96.93
𝛴 776 832 100.00
5 2012 missing 18 689 2.42
6 2012 present 753 825 97.58
𝛴 772 514 100.00
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
26
7 2013 missing 17 246 2.33
8 2013 present 721 906 97.67
𝛴 739 152 100.00
9 2014 missing 16 781 2.24
10 2014 present 733 539 97.76
𝛴 750 320 100.00
11 2015 missing 17 781 2.41
12 2015 present 720 835 97.98
𝛴 738 616 100.00
13 2016 missing 15 408 2.02
14 2016 present 749 106 97.98
𝛴 764 514 100.00
Table 18. Completeness of widower’s/widow’s PESEL variable in dataset on deaths in 2016 (in %)
Item Year PESEL Number of records %
1 2016 missing 3 311 2.21
2 2016 present 146 427 97.79
𝛴 149 738 100.00
Analysis of the quality of the datasets, particularly as regards the variables essential for
linking, indicates that there are missing values in the PESEL variable, which serves as the
identifier enabling the linkage with the PESEL register. However, the number of missing
values in the analysed datasets varies. In the sets on marriages and births, the number of
missing values in the PESEL variable is small and does not exceed 4% of all observations for
in each set in a given year. The situation is much worse in the sets containing about
separations, divorces and deaths. In the case of the first two sets, out of the seven reference
years, only the sets for 2016 contain PESEL numbers, whereas in the case of the third set
(deaths) this variable is available for the years 2015–2016. This means that not all the sets
can be used in further works.
2.2.2. The National System for Monitoring Family Benefits and the Central Register of Data on Beneficiaries of the Alimony Fund
Analysis of data quality was also conducted for datasets obtained from the National System for
Monitoring of Family Benefits and the Central Register Data on Beneficiaries of the Alimony
Fund. The analysis focused on the key variables used for linking data with the PESEL dataset, i.e.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
27
the TERYT code of the commune of residence, locality name, street name, building/dwelling
number. In the case of deterministic linkage, involving the use of a specific linkage key, values
used for the purpose of linking must contain exactly the same sequence of characters in order
for records to be successfully matched. Even slightest deviations will prevent linkage, which, in
turn, will cause a loss of some information in the output. For this reason, the content of
variables which are used as the linkage key between the registers has to be analysed to identify
potential errors and then, if possible, correct them using available methods and tools. Formats
of the TERYT code and locality names in the different registers are presented in Section 2.1.1 in
Table 7 and Table 8.
Another aspect analysed in detail was information on the marital status of persons included in
the National System for Monitoring Family Benefits and the Central Register of Data on
Beneficiaries of the Alimony Fund. The marital status of the applicant can be verified using the
auxiliary variable describing the degree of kinship with a person applying for the benefit
defined as spouse - parent/guardian of the child. After comparing records of the marital status
variable with information about the marital status based on the degree of kinship, we found
cases where the same person had different marital statuses. Such records were classified as
incorrect and excluded from the integration procedure.
3. Quality assessment of the integrated data sets
3.1. Quality of survey data integration
3.1.1. Quality of the linkage between the PESEL register and the surveys
Table 19 contains a summary of information concerning the number of records which were
linked with the PESEL register. As indicated in the first chapter, probabilistic linkage was used,
which was based on the assumption that the street name could be incorrect or the respondent
could have provided an incorrect birth date.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
28
In the process of probabilistic linkage, each respondent is assigned a linkage weight1 from an
interval [0,1][0,1], which defines the likelihood that two records refer to the same person. The
threshold value adopted in the project was equal to 0.88, and was determined on the basis of
the analysis of linked records.
Table 19 contains two types of linkage, denoted A and B. The first type refers to respondents
who were assigned the weight of 1, which represents a perfect match (all variables match). The
second type of linkage refers to situations where the weight was less than 1, which means that
the likelihood of two records referring to the same person is less than 1.
In the case of EU-SILC, the percentage of linked records where all variables matched was very
high and amounted to 88% in 2015 and 70% in 2016, compared to 55.6% in 2015 and 58% in
2016 for the LFS. A possible cause of this difference is that households surveyed in 2015 did not
change their address, which was recorded for households in 2016.
Table 19. Results of probabilistic linkage of the PESEL register with LFS and EU-SILC
Survey Year Type of linkage
Records
linked Sample
LFS 2015 A 2 600 4 670
B 356 4 670
2016 A 4 762 8 135
B 573 8 135
EU-SILC 2015 A 511 704
B 19 704
2016 A 1 350 2 016
B 69 2 016
After including records with linkage weights of less than 1, we managed to link another 7% of
respondents from the LFS in both years; in the case of the EU-SILC, additional matches
accounted for only 2.5 % in 2015 and 3.4% in 2016. Generally, the number of uncertain records,
1 Note: this weight is not the same as the sampling weight or the final weight used to estimate characteristics of the target population.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
29
i.e. those with the likelihood ranging from [0,1)[0,1) was 1,017, compared to 9,223 with the
weight equal to 1.
3.1.2. Detailed analysis of linkage quality
Table 20 presents descriptive statistics for the linkage weight, reflecting the integration error,
without a distinction between the surveys. In keeping with the specified threshold, the lowest
weight value was 0.88, while the first decile was equal to 0.97, which means that 10% of
observations were assigned weights lower than 0.97. This indicates the presence of a linkage
error, which should be taken into account during the estimation. The smaller the linkage weight
is, the greater the uncertainty of results obtained on the basis of linked records.
Table 20. Descriptive statistics of linkage weights of respondents
Min Decile 1 Average Median Decile 9 Max Standard deviation
0.88 0.97 0.99 1.00 1.00 1.00 0.03
In the following steps, we verified hypotheses about the relationship between the uncertainty
of record linkage and other variables used for linkage. The purpose was to determine the causes
of uncertainty so that they could be controlled.
Table 21 contains information about the percentage of linkage weights that were less than 1 for
Wielkopolskie province by sex. The weight of less than 1 implies that for some records may have
been linked incorrectly, as a result of the wrong information about the respondent's date of
birth. The percentage of uncertain records was slightly higher for women (10.51%) than for men
(9.18%).
We used the 𝜒2 test with continuity correction for 2x2 tables 2 × 2in order to verify the null
hypothesis (H0)𝐻0 about the lack of correlation between the linkage weight of less than 1 and
sex. The p-value of the test statistic is smaller than the significance level (0.05), which suggests
that there is a relationship between these variables (𝜒2 statistic = 4.8526, df = 1, p-value =
0.02761). However, this result should be treated with caution, since there may be other
variables which correlate with one particular sex (e.g. age) and contribute to the rejection of the
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
30
null hypothesis. Nonetheless, this difference is worth noting, since it may be important from the
perspective of further analysis.
Table 21. Distribution of linkage weights by sex in Wielkopolskie province
Sex Linkage weight less than 1? %
Men No 90.82
Yes 9.18
Women No 89.49
Yes 10.51
In the next step we analysed the variation in weights for age groups: 15–25, 25–35, 35–45, 45–
55, 55–65 and 65+. Results are presented in Table 22. The largest percentage of weights lower
than 1 is observed for the group [25,35)[25,35) and for the oldest groups of respondents:
[55,65)[55,65) and 65+. In the case of the second group, this can be explained by older
respondents’ reluctance to provide exact information or by their memory problems. Once again,
the 𝜒2 test was performed to check if the variables are independent. The test statistic and the
p-value indicate that the null hypothesis𝐻0 should be rejected, which implies that differences
between groups are significant (𝜒2=26.858, df = 5, p-value <0.0001). This means that, most
probably, the uncertainty of record linkage results from the age of respondents.
Table 22. Distribution of linkage weights by age in Wielkopolskie province
Age group Linkage weight less than 1? %
[15,25) No 92.01
[15,25) Yes 7.99
[25,35) No 89.99
[25,35) Yes 10.01
[35,45) No 90.32
[35,45) Yes 9.68
[45,55) No 92.40
[45,55) Yes 7.60
[55,65) No 88.02
[55,65) Yes 11.98
65+ No 89.40
65+ Yes 10.60
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
31
Table 23. Distribution of linkage weights by marital status in Wielkopolskie province
Legal marital status Linkage weight less than 1? %
Single No 90.83
Yes 9.17
Married No 90.00
Yes 10.00
Divorced / in separation No 89.72
Yes 10.28
Widowed No 88.24
Yes 11.76
The share of weights lower than 1 differs slightly depending on the level of the legal marital
status variable. These fractions are presented in Table 23. As can be seen, the largest
percentage of such weights is found in the group of respondents classified as “divorced / in
separation” and “widowed” (this pattern is also correlated with age). It means that in the case
of these groups, the uncertainty arising from (survey) sampling increases and should be taken
into account when determining the percentage of persons with a given marital status (on the
basis of integrated data sources). After testing this contingency table with the 𝜒2 test, no
correlation between the percentage of weights smaller than 1 and the marital status was found
(𝜒2 = 3.1034, df = 3, p-value = 0.376). This may suggest that the linkage error is not informative
(is non-random), i.e. it does not depend on the variable of interest but on other variables that
can be controlled (is random), for example age.
During the last step of data analysis, we evaluated the spatial distribution of linkage weights
smaller than 1 across districts (LAU 1 level) of Wielkopolskie province. Results of this analysis are
presented in Figure 2. Detailed information is shown in Table 24, which contains percentages,
together with the number of records with linkage weights smaller than 1. The higher the
percentage is, the greater the uncertainty associated with the use of those linked records for
the purposes of estimation. The highest percentage was recorded for the following districts: the
city of Poznań (3,064), poznański (3,021), gnieźnieski (3,003), turecki (3,027) and wągrowiecki
(3,038).
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
32
A chi-squared test was conducted to check whether there was a statistical correlation between
the district variable and the linkage weight lower than 1. It was found that such a correlation𝜒2
does exist (𝜒2=914.64, p-value = 0.0005; 2,000 Monte Carlo iterations to estimate the p-value).
Figure 2.Distribution of linkage weights less than 1 for districts of Wielkopolskie province
Table 24 Percentage and number of records with linkage weights of less than 1 for districts of Wielkopolskie province
District code Number % District code Number %
3064 332 25.13 3002 9 3.40
3021 110 21.91 3005 9 6.16
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
33
3003 77 24.44 3014 6 5.94
3061 56 15.64 3006 5 1.50
3027 45 25.71 3020 5 1.90
3062 33 10.78 3030 5 1.36
3011 32 13.56 3008 4 1.57
3004 30 8.50 3029 4 1.45
3019 28 6.02 3031 4 2.33
3063 27 12.44 3015 3 1.29
3028 24 21.05 3007 2 0.86
3009 20 6.54 3012 2 0.81
3016 19 14.07 3013 2 1.20
3017 19 3.53 3001 1 0.84
3022 19 8.80 3018 1 0.62
3024 19 6.86
3023 18 12.50
3025 14 9.33
3026 14 6.57
3010 10 1.98
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
34
4. Estimation of the actual marital status using the integrated sets
4.1. Determination of the actual marital status
4.1.1. Description of the algorithm
The actual marital status was determined on the basis of 9 sources2, taking into account the
date of the last update or the interview date (in the case of surveys). Table 25 presents
information about the number of sources used and the number of source combinations, which
could be used to obtain information about the marital status. For example, if the number of
sources is 1, it means that the actual marital status could be determined on the basis of only
one source. If the number of sources was 2, the marital status was determined on the basis of
two sources, and there were 18 pairs of sources (e.g. PESEL + marriages, PESEL + deaths, etc.).
Table 25. Number of sources used to determine the actual marital status
Number of sources
Combinations of sources
1 9
2 18
3 29
4 29
5 15
6 2
Table 32 contains detailed information about the sources used. Selected data are presented in
Table 26. The column names represent: No = combination number, Mrg = marriages, Dvc =
divorces, Sep = separations, Bth = births, FA = Alimony Fund, SB = social benefits, Srv = LFS and
EU-SILC, N = domain size. 11 denotes that the actual marital status was observed in a given
source (regardless of its level), and empty fields denote the lack of marital status in a given
source (it was undetermined). For example, with respect to combination No. 1, for 2,473,131
persons the marital status was established only on the basis of the PESEL register (71.8% of all
2 The list of sources included: PESEL, marriages, divorces, separations, births, the Alimony Fund, social benefits, combined LFS
and EU-SILC surveys. These are the datasets indicated in the diagram of data linkage presented in Figure [linkage-scheme].
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
35
observations). In the case of combination No. 6, all columns contain empty fields. This means
that it was not possible to determine the legal marital status for 77,464 persons.
Table 26. Information about the coexistence of marital status information in the data sources
No PESEL Mrg Dvc Sep Bth Dth FA SB Srv N
1 1 2 817 123
2 1 1 346 453
3 1 1 139 716
4 1 1 99 728
5 1 1 1 95 986
6 77 464
7 1 1 1 63 408
8 1 1 1 1 47 828
9 1 1 1 28 320
10 1 1 16 703
...
Persons aged 15 to 21 years were classified as "single". This choice is the consequence of Polish
law, Article 10 § 1. of the Family and Guardianship Code: “No person younger under the age of
eighteen may enter into marriage. However, for important reasons a family court may permit a
sixteen-year-old woman to marry, when it is apparent that it will be consistent with the good of
the formed family. Additionally, after internal consultations between project members, a
decision was made to extend this limit up to the age of 21.
Table 27. Updated legal marital status obtained from the PESEL register (rows), and obtained after updating PESEL with information from other sources (columns)
PES
EL/I
nt.
sin
gle
man
sin
gle
wo
man
mar
ried
man
mar
ried
wo
man
div
orc
ed m
an
div
orc
ed
wo
man
wid
ow
er
wid
ow
un
det
erm
ined
∑
single (man) 414 684 - 1 774 - 74 - 31 - - 416 563
single (woman)
4 330 831
- 1 192 - 45 - 71 332 143
married 1 627 - 777 641
- 293 - 149 - - 779 710
married woman
- 6 198 - 804 860
- 790 - 583 - 812 431
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
36
divorced man
1 065 - 1 368 2 63 083 - 45 - 65 563
divorced woman
- 1 117 - 1 998 - 93 037 - 373 - 96 525
widower 98 - 425 - 7 - 37 930 - - 38 460
widow - 333 - 2 232 - 36 - 192 549 + 3
195 153
undetermined
10 969 10 292 551 637 12 61 23 146 57 884
80 575
∑ 428 270 348 717
781 598
810 833
63 441 93 949 38 172 193 722
57 884
2 817 123
Table 28. Updated actual and legal marital status
Actual /Legal single (man)
single (woman
)
married
man married woman
divorced man
divorced woman widower widow
undetermin
ed
single (man) 99.37 - - - - - - - -
single (woman)
- 98.86 - - - - - - -
married man - - 99.87 - - - - - -
married woman
- - - 99.80 - - - - -
divorced man - - - - 100 - - - -
divorced woman
- - - - - 100 - - -
widower - - - - - - 100 - -
widow - - - - - - - 100 -
Partner 0.63 - - - - - - - -
female partner
- 1.14 - - - - - - -
in separation - - 0.13 0.20 - - - - -
undetermined - - - - - - - - 100
Table 27 contains information on the updated legal marital status for the entire population of
persons in the PESEL register after it was integrated with the other sources of information. The
columns refer to the marital status according to PESEL, and rows to the marital status
established on the basis of the other sources. By integrating the register and survey data, it was
possible to determine (impute) the marital status for 22,691 persons who had an undetermined
status in the PESEL register. Table 28 presents information on the actual and legal marital
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
37
status. The integrated sources provided information about any changes in the legal marital
status from "single man/woman" to "male/female partner", and from "married man/woman" to
"in separation".
4.1.2. Analysis of the undetermined marital status
The next step was to analyse the category of undetermined marital status, which was present
for 2.07% of persons aged 15 + from wielkopolskie province. In the first place, we carried out a
spatial analysis at the level of districts to determine whether there was a spatial correlation with
missing data. Results are presented in Figure 3, while details concerning percentages are
presented in Table 31.
The largest percentage of people with undetermined marital status in the population aged 15+
was observed in pilecki district (7.78%), szamotulski district (7.58%), and in the city of Poznań
(4.92%). High values for pilecki and szamotulski districts result from very high percentages of
undetermined marital status in the following municipalities: Szamotuły - 21.66% and Piła -
11.69%. In other districts this percentage was below 4%. In contrast, the lowest percentages
(below 0.05%) were recorded in the following districts: the municipality of Konin, krotoszyński
district, średzki district, jarociński district and pleszewski district. Figure 3 suggests that persons
with undetermined marital status are clustered in the northern part of wielkopolskie province.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
38
Figure 3. Fraction of undetermined actual marital status in districts of wielkopolskie province (population aged 15+)
To facilitate a more in-depth analysis of the problem of determining the actual marital status,
Figure 4 presents single-year of age groups from 15 to 90 and above. Red denotes
undetermined actual marital status, whereas blue is represents categories of determined
marital status. For young persons (aged 21-25), there is a relatively high percentage of
undetermined actual marital status in comparison with other groups. This percentage is lower
for people aged 25–45 and then rises to the level of 4.4% for people aged 90+.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
39
The above information indicates that to take into account information about the undetermined
actual marital status, both a demographic and spatial factor should be considered in estimation.
Figure 4. Fraction of undetermined actual marital status by age in wielkopolskie province
This indicates that both the demographic and spatial factor of undetermined actual marital
status should be taken into account in the estimation.
4.2. Estimation of the actual marital status
4.2.1. Post-stratification of the actual marital status
For the purpose of estimating the actual marital status, the "undetermined" category was
treated as missing data. As a result, it was necessary to create artificial weights to enable
generalisation of the results. In the first place, the weight for each respondent was determined
according to the following formula (1):
𝑑𝑖 = {1 if actual marital status was other than undetermined,0 if actual marital status was undetermined,
where 𝑖 = 1, . . . , 𝑁 denotes the person’s number in the PESEL register, and 𝑁 is the size of
population of wielkopolskie province in the PESEL register. Naturally, the sum of weights was
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
40
not equal to the size of the population (∑ 𝑑𝑖𝑖 ≠ 𝑁), therefore we had to adjust them to match
the size of the population. Additionally, weights did not add up to the number of women and
men, persons at a given age or within districts. To solve this problem, post-stratification was
applied to ensure that domains obtained by cross-classifying the following variables were
consistent with the size of the general population:
district (35 categories),
sex (2 categories),
age (72 categories; 15, 16, ..., 90+).
In total, 5,320 strata were created by cross-classifying the three variables (36 × 2 × 72). Then a
correction factor for each domain was determined according to the formula (2):
𝑤𝑑𝑠𝑎 =𝑁𝑑𝑠𝑎
𝑛𝑑𝑠𝑎,
where 𝑛𝑑𝑠𝑎 = ∑ 𝑑𝑖,𝑑𝑠𝑎𝑖 is the size of a given domain, i.e. the sum of weights assigned to the
respondent in a given section/stratum, 𝑁𝑑𝑠𝑎 is the population size in a given domain, and the
subscripts refer, respectively, to: d – district, s – sex, and a – age. Such correction factors were
then multiplied by output counts for the respective domains. In this way, we obtained the size
of the entire population, i.e. persons with known marital status and those for whom it was
undetermined.
Table 29 presents descriptive statistics of weights obtained by applying the formula (2). The
distribution of weights is strongly right-skewed, with the median equal to 1.01, and the mean of
1.03. Figure 5 presents a histogram of the distribution of weights, which shows that weight
values for some domains are relatively high.
Table 29. Descriptive statistics of weights used to generalise the actual marital status to Wielkopolskie Province
Min Quartile 1 Median Mean Quartile 3 Max
1.00 1.00 1.01 1.02 1.01 1.35
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
41
Figure 5. Distribution of weights used to generalise the actual marital status in Wielkopolskie Province
Weights obtained after post-stratification were used in the next stange to estimate the
percentage and the number of persons in terms of the actual marital status at the level of
districts in Wielkopolskie Province.
4.2.2. Analysis of actual marital status estimates
Table 30 contains information about the actual marital status estimated for the population aged
15+ in Wielkopolskie Province on the basis of selected sources before post-stratification
(unweighted percentage column) and after post-stratification (weighted percentage column)
and in comparison with the Census 2011.
When one compares estimates for the categories "male/female partner" and "in separation"
with data from the Census 2011, we can notice a big difference, which is mostly due to the
imperfect quality of data sources used to determine the actual marital status of particular
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
42
persons (e.g. only the relationship with the head of the household). For this reason, the three
underestimated categories should be analysed with caution.
Only weighted percentages are taken into account in further analysis, since they are based on
weights used to generalise the results. In Wielkopolskie Province nearly 30% women and about
29% men were married. The largest difference, due to women’s life expectancy, can be seen in
for persons classified as "widower/widow". Other differences occur in the group of single
persons, where the share of single men (ca. 15.5%) was higher than that of single women (ca.
12.3%), and in the group of divorced persons. Other categories of the legal marital status were
represented by very small populations (below 1% in Wielkopolskie Province).
Table 30. Estimates of actual marital status for the population aged 15+ in Wielkopolskie Province and according to Census 2011
Actual marital status
Unweighted Weighted NSP 2011
Single man 15.43 15.35 15.72
Single woman 12.50 12.40 12.56
Male partner 0.10 0.10 0.93
Female partner 0.14 0.14 0.90
Married 28.29 28.42 28.35
Married woman 29.33 29.27 28.31
Divorced 2.30 2.32 1.42
Divorced woman 3.41 3.41 2.24
Widower 1.38 1.42 1.41
Widow 7.02 7.08 7.55
In separation 0.10 0.10 0.42
Undetermined 2.07 – 0.18
Total 2 891 600
Figures 6-8 show the spatial distribution of percentages for particular categories of marital
status based on information from the PESEL register and additional sources.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
43
Figure 6. Fraction of actual marital status: single man, single woman, married man and married woman for population aged 15+ across districts of Wielkopolskie Province
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
44
Figure 7. Fraction of actual marital status: divorced man, divorced woman, widower and widow for population aged 15+ across districts of Wielkopolskie Province
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
45
Figure 8. Fraction of actual marital status: male partner, female partner and persons in separation for the population aged 15 + across districts of Wielkopolskie Province
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
46
Summary
This report is a summary of the work done by two project teams from the Statistical Office in
Poznań: the Centre for Urban Statistics and the Centre for Small Area Estimation. The teams
completed the following tasks:
identified sources of data (registers and surveys), which can be used for estimating the
legal and actual marital status,
standardised data to enable the integration of sources,
integrated selected data from administrative registers,
integration selected data from two representative surveys (LFS and EU-SILC),
made a preliminary assessment of the integration of representative surveys and registers,
imputed values of legal and actual marital status based on registers and representative
surveys,
conducted a preliminary analysis of undetermined marital status, which was treated as
missing data that had to be imputed or calibrated,
conducted post-stratification (as a special case of calibration) to evaluate the actual marital
status at the level of districts in Wielkopolskie Province.
The most important conclusions concerning data integration for the purpose of estimating the
actual marital status include:
sets of answer options in questions concerning the legal marital status are inconsistent; in
additional, answer options in statistical surveys include the category of actual marital
status,
there are no questions concerning civil unions with persons from outside the household. In
the future, information from other surveys may be considered, e.g. Social Cohesion (Are
you currently in a close relationship (marriage or informal relationship), even without living
together?),
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
47
representative surveys are currently not suited for regular integration with administrative
registers for both legal and practical reasons (e.g. the problem of standardisation of
addresses of dwellings),
there is no single statistical identifier which can be used to link different sources of
information,
it is difficult to determine the final reference date for data from numerous sources.
The project can be treated as a proposal for developing the concept of a rolling census based on
different sources, updated using data from representative surveys.
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
48
List of figures
Figure 1. Schematic diagram of linking administrative registers and representative surveys, where PESEL
is the main register . Continuous lines denote deterministic linkage (using a linkage key), and dashed
lines denote linkage using an artificial key or probabilistic linkage ............................................................ 10
Figure 2.Distribution of linkage weights less than 1 for districts of Wielkopolskie province ..................... 32
Figure 3. Fraction of undetermined actual marital status in districts of wielkopolskie province (population
aged 15+) ..................................................................................................................................................... 38
Figure 4. Fraction of undetermined actual marital status by age in wielkopolskie province ..................... 39
Figure 5. Distribution of weights used to generalise the actual marital status in Wielkopolskie Province 41
Figure 6. Fraction of actual marital status: single man, single woman, married man and married woman
for population aged 15+ across districts of Wielkopolskie Province .......................................................... 43
Figure 7. Fraction of actual marital status: divorced man, divorced woman, widower and widow for
population aged 15+ across districts of Wielkopolskie Province ................................................................ 44
Figure 8. Fraction of actual marital status: male partner, female partner and persons in separation for the
population aged 15 + across districts of Wielkopolskie Province ............................................................... 45
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
49
List of Tables
Table 1. List of identified data sets containing information on the marital status. ...................................... 5
Table 2. List of surveys used for the purposes of the VIP ADMIN project .................................................... 8
Table 3. The KSMŚR database - created using data from the National System for Monitoring Family Benefits for the years 2015–2016 ............................................................................................................... 13
Table 4. The FA database - created using data from the Central Register of Data on Beneficiaries of the Alimony Fund for the years 2015–2016. ..................................................................................................... 14
Table 5. Realised sample size in Wielkopolskie Province in LFS and EU-SILC (unique records) .................. 15
Table 6. The TERYT commune code and the name of the locality in the PESEL register ............................ 19
Table 7. Standardised locality names and TERYT commune codes in the Central Register of Data on Beneficiaries of the Alimony Fund (the FA database) ................................................................................. 19
Table 8. Standardised locality names and TERYT commune codes in the National System for Monitoring Family Benefits (the KSMŚR database)........................................................................................................ 20
Table 9. List of localities observed only in the surveys (different TERYT code) .......................................... 21
Table 10. Coding of legal marital status in LFS and EU-SILC ........................................................................ 21
Table 11. Size of LFS and EU-SILC sample for Wielkopolskie province by sex and marital status (combined unweighted samples, before standardisation) ........................................................................................... 22
Table 12. Coding of the degree of kinship with the head of the household in LFS and EU-SILC ................ 23
Table 13. The size of LFS and EU-SILC samples for Wielkopolskie province by sex and information on whether the respondent’s partner is the household head (combined unweighted samples) ................... 23
Table 14. Completeness of PESEL variable in the datasets on marriages in the period 2010–2016 (in %) 24
Table 15. Completeness of PESEL variable in dataset on divorces in 2016 (in %) ...................................... 25
Table 16. Completeness of PESEL variable in the dataset on separations in 2016 (in %) ........................... 25
Table 17. Completeness of PESEL variable in datasets on births in the period 2010–2016 (in %) ............. 25
Table 18. Completeness of widower’s/widow’s PESEL variable in dataset on deaths in 2016 (in %) ........ 26
Table 19. Results of probabilistic linkage of the PESEL register with LFS and EU-SILC ............................... 28
Table 20. Descriptive statistics of linkage weights of respondents............................................................. 29
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
50
Table 21. Distribution of linkage weights by sex in Wielkopolskie province .............................................. 30
Table 22. Distribution of linkage weights by age in Wielkopolskie province .............................................. 30
Table 23. Distribution of linkage weights by marital status in Wielkopolskie province ............................. 31
Table 24 Percentage and number of records with linkage weights of less than 1 for districts of Wielkopolskie province ............................................................................................................................... 32
Table 25. Number of sources used to determine the actual marital status ............................................... 34
Table 26. Information about the coexistence of marital status information in the data sources .............. 35
Table 27. Updated legal marital status obtained from the PESEL register (rows), and obtained after updating PESEL with information from other sources (columns) ............................................................... 35
Table 28. Updated actual and legal marital status ...................................................................................... 36
Table 29. Descriptive statistics of weights used to generalise the actual marital status to Wielkopolskie Province ....................................................................................................................................................... 40
Table 30. Estimates of actual marital status for the population aged 15+ in Wielkopolskie Province and according to Census 2011 ........................................................................................................................... 42
Table 31. The number and the percentage of the undetermined actual marital status in districts of Wielkopolskie Province (population aged 15 +) .......................................................................................... 51
Table 32. Information about the coexistence of information about the legal and actual marital status in used data sources ........................................................................................................................................ 52
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
51
A Detailed tables - Information on the actual marital status
Comments to Table 31 Explanations: NO.=number of combination, Mał.=marriages,
Sep=divorces, Sep=separations, Urodz=births, FA=the Alimony Fund, Świad=social benefits,
Bad=LFS+ EU-SILC surveys, N=the size of a given section. Value 1 means that marital status
existed in a given source (regardless of the level). For example combination no. 1 with the size
of 2,473,131 means that for such a number of persons information on the marital status came
only from the PESEL register (71.8% of all observations).
On the other hand, combination no. 6 means that 77,464 persons (2.25%) have the
undetermined marital status (regardless of age).
Table 31. The number and the percentage of the undetermined actual marital status in districts of Wielkopolskie Province (population aged 15 +)
TERYT District Percentage Number
3019 Piła District 10.20 11538
3024 Szamotuły District 8.48 6195
3064 The City of Poznań 5.14 21768
3002 Czarnków-Trzcianka District 4.73 3440
3003 Gniezno District 4.35 5099
3004 Gostyń District 3.95 2484
3031 Złotów District 3.27 1882
3011 Kościan District 3.23 2110
3063 The City of Leszno 3.15 1659
3001 Chodzież District 2.77 1076
3016 Oborniki District 2.37 1117
3027 Turek District 2.14 1500
3022 Rawicz District 1.99 978
3013 Leszno District 1.83 815
3007 Kalisz District 1.60 1100
3010 Konin District 1.58 1681
3023 Słupca District 1.57 781
3018 Ostrzeszów District 1.47 681
3005 Grodzisk Wielkopolski District 1.43 589
3008 Kępno District 1.11 514
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
52
3009 Koło District 1.10 814
3026 Śrem District 0.98 481
3014 Międzychód District 0.92 283
3017 Ostrów Wielkopolski District 0.91 1221
3028 Wągrowiec District 0.88 496
3021 Poznań District 0.80 2277
3061 The City of Kalisz 0.67 569
3015 Nowy Tomyśl District 0.58 351
3029 Wolsztyn District 0.51 236
3030 Września District 0.48 301
3062 the City of Konin 0.47 303
3012 Krotoszyn District 0.38 244
3025 Środa Wielkopolska District 0.24 112
3006 Jarocin District 0.20 116
3020 Pleszew District 0.15 76
Table 32. Information about the coexistence of information about the legal and actual marital status in used data sources
NO PESEL Mrg Dvc Sep Bth Dth FA SB Srv N
1 1 2473131
2 1 1 346453
3 1 1 139716
4 1 1 99728
5 1 1 1 95986
6 77464
7 1 1 1 63408
8 1 1 1 1 47828
9 1 1 1 28320
10 1 1 16703
11 1 1 14231
12 1 7902
13 1 1 7655
14 1 1 1 1 4593
15 1 1 1 4422
16 1 1 1 1 1 1913
17 1 1 1 1340
18 1 1 1 1197
19 1 1 1119
20 1 1088
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
53
21 1 1 1 1 960
22 1 1 1 911
23 1 1 1 890
24 1 1 674
25 1 1 1 1 565
26 1 508
27 1 1 425
28 1 1 1 387
29 1 1 1 264
30 1 262
31 1 1 1 1 224
32 1 1 1 1 221
33 1 1 1 220
34 1 1 1 208
35 1 1 1 1 185
36 1 1 1 185
37 1 1 1 1 1 177
38 1 1 1 163
39 1 96
40 1 1 1 95
41 1 1 1 73
42 1 1 1 1 70
43 1 1 1 1 62
44 1 1 1 1 53
45 1 50
46 1 1 1 1 1 49
47 1 1 41
48 1 1 39
49 1 1 1 1 39
50 1 1 1 38
51 1 1 1 1 34
52 1 1 1 33
53 1 1 28
54 1 1 1 1 27
55 1 1 25
56 1 1 1 1 1 20
57 1 1 1 1 1 16
58 1 1 1 15
59 1 1 1 1 14
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
54
60 1 1 1 14
61 1 1 1 14
62 1 1 1 1 1 14
63 1 1 1 1 1 1 13
64 1 1 1 13
65 1 1 1 1 10
66 1 1 1 1 1 1 9
67 1 1 1 1 8
68 1 1 8
69 1 1 1 1 7
70 1 1 1 1 1 5
71 1 1 1 1 5
72 1 1 1 5
73 1 1 1 1 1 4
74 1 1 1 4
75 1 1 1 1 1 3
76 1 1 1 1 1 3
77 1 1 1 3
78 1 1 1 1 3
79 1 1 1 1 3
80 1 1 1 1 2
81 1 1 1 1 1 2
82 1 1 1 1 1 2
83 1 1 1 1 2
84 1 1 1 1 1 2
85 1 1 2
86 1 2
87 1 1 1 2
88 1 1 1 2
89 1 1 1 1 1 1
90 1 1 1 1 1
91 1 1 1 1 1 1
92 1 1 1 1 1
93 1 1 1 1 1
94 1 1 1 1 1
95 1 1 1 1
96 1 1 1 1 1
97 1 1 1 1 1
98 1 1 1 1 1
Improvement in the use of administrative data sources (ESS.VIP ADMIN WP6 Pilot surveys and their applications) Agreement no. 07112.2016.004-2016.595
Annex No.4
55
99 1 1 1 1
100 1 1 1
101 1 1 1
102 1 1 1